ImageMetaTag.db

This module contains a set of functions to create/write to/read and maintain an sqlite3 database of image files and their associated metadata.

In normal usage it is primarily used by ImageMetaTag.savefig() to create the database as figures are saved. Once the metadata database has been built up then the metadata can be loaded with ImageMetaTag.db.read().

(C) Crown copyright Met Office. All rights reserved. Released under BSD 3-Clause License. See LICENSE for more details.

Commonly used functions

For most use cases, the following functions provide the required functionality to use the database:

ImageMetaTag.db.write_img_to_dbfile(db_file, img_filename, img_info, add_strict=False, attempt_replace=False, timeout=6)[source]

Writes image metadata to a database.

Arguments:

  • db_file - the database file to write to. If it does not exist, it will be created.

  • img_filename - the filename of the image to which the metadata applies. Usually this is either the absolute path, or it is useful to make this the relative path, from the location of the database file.

  • img_info - a dictionary containing any number of {tag_name: value} pairs to be stored.

Options:

This is commonly used in ImageMetaTag.savefig()

ImageMetaTag.db.read(db_file, required_tags=None, tag_strings=None, db_timeout=6, db_attempts=20, n_samples=None)[source]

reads in the database written by write_img_to_dbfile

Options:
  • required_tags - a list of image tags to return, and to fail if not all are present

  • tag_strings - an input list that will be populated with the unique values of the image tags.

  • n_samples - if provided, only the given number of entries will be loaded from the database, at random. Must be an integer or None (default None)

Returns:
  • a list of filenames (payloads for the ImageMetaTag.ImageDict class )

  • a dictionary, by filename, containing a dictionary of the image metadata as tagname: value

If tag_strings is not supplied, then the returned dictionary will contain a large number of duplicated strings, which can be an inefficient use of memory with large databases. If tag_strings is supplied, it will be populated with a unique list of strings used as tags and the dictionary will only contain references to this list. This can reduce memory usage considerably, both for the dictionary itself but also of an ImageMetaTag.ImageDict produced with the dictionary.

Will return None, None if there is a problem.

In older versions, this was named read_img_info_from_dbfile which will still work.

ImageMetaTag.db.del_plots_from_dbfile(db_file, filenames, do_vacuum=True, allow_retries=True, db_timeout=6, db_attempts=20, skip_warning=False)[source]

deletes a list of files from a database file created by ImageMetaTag.db

  • do_vacuum - if True, the database will be restructured/cleaned after the delete

  • allow_retries - if True, retries will be allowed if the database is locked. If False there are no retries, but sleep commands try to avoid the need when doing a large number of deletes.

  • db_timeout - overide default database timeouts, if doing retries

  • db_attempts - overide default number of attempts, if doing retries

  • skip_warning - do not warn if a filename, that has been requested to be deleted, does not exist in the database

ImageMetaTag.db.select_dbfile_by_tags(db_file, select_tags)[source]

Selects from a database file the entries that match a dict of field names/acceptable values.

Returns the output, processed by ImageMetaTag.db.process_select_star_from()

ImageMetaTag.db.merge_db_files(main_db_file, add_db_file, delete_add_db=False, delete_added_entries=False, attempt_replace=False, add_strict=False, db_timeout=6, db_attempts=20)[source]

Merges two ImageMetaTag database files, with the contents of add_db_file added to the main_db_file. The databases should have the same tags within them for the merge to work.

Options:

  • add_strict - passed into ImageMetaTag.db.write_img_to_open_db()

  • attempt_replace - passed to ImageMetaTag.db.write_img_to_open_db()

  • delete_add_db - if True, the added file will be deleted afterwards

  • delete_added_entries - if delete_add_db is False, this will keep the add_db_file but remove the entries from it which were added to the main_db_file. This is useful if parallel processes are writing to the databases. Ignored if delete_add_db is True.

Functions for opening/creating db files

ImageMetaTag.db.open_or_create_db_file(db_file, img_info, restart_db=False, timeout=6)[source]

Opens a database file and sets up initial tables, then returns the connection and cursor.

Arguments:
  • db_file - the database file to open.

  • img_info - a dictionary of image metadata to be saved to the database.

Options:
  • restart_db - when Truem this deletes the current db file and starts again, if it already exists.

Returns an open database connection (dbcn) and cursor (dbcr)

ImageMetaTag.db.open_db_file(db_file, timeout=6)[source]

Just opens an existing db_file, using timeouts but no retries.

Returns an open database connection (dbcn) and cursor (dbcr)

ImageMetaTag.db.read_db_file_to_mem(db_file, timeout=6)[source]

Opens a pre-existing database file into a copy held in memory. This can be accessed much faster when doing extenstive work (a lot of select operations, for instance).

There is a time cost in doing this; it takes a few seconds to read in a large database, so it is only worth doing when doing a lot of operations.

Tests on selects on a large-ish database (250k rows) suggested it was worth doing for > 100 selects.

Returns an open database connection (dbcn) and cursor (dbcr)

Functions for working with open databases

ImageMetaTag.db.write_img_to_open_db(dbcr, filename, img_info, add_strict=False, attempt_replace=False)[source]

Does the work for write_img_to_dbfile to add an image to the open database cursor (dbcr)

  • add_strict: if True then it will report a ValueError if you try and include fields that aren’t defined in the table. If False, then adding a new metadata tag to the database will cause it be rewritten with the new item as a new column using ImageMetaTag.db.recrete_table_new_cols() All pre-existing images will have the new tag set to ‘None’. It is best to avoid using this functionality as it can be slow for large databases. Instead, all images should be ideally have all expected metadata tags included from the start but set to ‘None’ where they are not used.

  • attempt_replace: if True, then it will attempt to replace a database entry if the image is already present. Otherwise it will ignore it.

ImageMetaTag.db.read_img_info_from_dbcursor(dbcr, required_tags=None, tag_strings=None, n_samples=None)[source]

Reads from an open database cursor (dbcr) for ImageMetaTag.db.read() and other routines.

Options
  • required_tags - a list of image tags to return, and to fail if not all are present

  • tag_strings - an input list that will be populated with the unique values of the image tags

  • n_samples - if provided, only the given number of entries will be loaded from the database, at random. Must be an integer or None (default None)

ImageMetaTag.db.select_dbcr_by_tags(dbcr, select_tags)[source]

Selects from an open database cursor (dbcr) the entries that match a dict of field names & acceptable values.

Returns the output, processed by ImageMetaTag.db.process_select_star_from()

ImageMetaTag.db.recrete_table_new_cols(dbcr, current_cols, new_cols)[source]

for a given database cursor (bdcr) this recreates a new version of the ImageMetaTag database table with new columns.

This is a major change to a database, and takes place (deliberately) without any commit statements (otherwise the database file seen by other connections/processes will see an intermediate/incorrect database).

Because of this, this process is slow and should be avoided if at all possible.

Internal functions

ImageMetaTag.db.db_name_to_info_key(in_str)[source]

Inverse of info_key_to_db_name

ImageMetaTag.db.info_key_to_db_name(in_str)[source]

Consistently convert a name in the img_info dict database

ImageMetaTag.db.process_select_star_from(db_contents, dbcr, required_tags=None, tag_strings=None)[source]

Converts the output from a select * from …. command into a standard output format. Requires a database cursor (dbcr) to identify the field names.

Options:
  • required_tags - a list of image tags to return, and to fail if not all are present

  • tag_strings - an input list that will be populated with the unique values of the image tags

Returns:

Utility functions

The following functions may be very useful for specific occasions, but are nopt intended for regular use:

ImageMetaTag.db.scan_dir_for_db(basedir, db_file, img_tag_req=None, add_strict=False, subdir_excl_list=None, known_file_tags=None, verbose=False, no_file_ext=False, return_timings=False, restart_db=False)[source]

A useful utility that scans a directory on disk for images that can go into a database. This should only be used to build a database from a directory of tagged images that did not previously use a database, or where the database file has been deleted but the images have not.

For optimal performance, build the database as the plots are created (or do not delete the database by accident).

Arguments:
  • basedir - the directory to start scanning.

  • db_file - the database file to save the image metadata to. A pre-existing database file will fail unless restart_db is True

Options:
  • img_tag_req - a list of tag names that are to be applied/created. See add_strict for behaviour when tags are not present in an image.

  • add_strict - When True, images without all of the img_tag_req are ignored, when False, images will be used if they have at least one item in imt_tag_req. Images with none of the metadata items are assumed to be from a different source.

    Images that are used, with missing tags, will set those tags to ‘None’.

  • subdir_excl_list - a list of subdirectories that don’t need to be scanned. [‘thumbnail’] for instance, will prevent the image thumbnails being included.

  • no_file_ext - logical to exclude the file extension in the filenames saved to the database.

  • known_file_tags - if supplied, this is a dict (keyed by filename entry), contains a dictionary, structured: {filename: {tag name: value}} for the images that are already known (so you don’t need to read them from the files themselves as that is slow). This can be useful if you have a old backup of a database file that needs updating.

  • restart_db - if True, the db_file will be restarted from an empty database.

  • verbose - verbose output.