climind.data_manager package

The scripts and files in the data manager module, are used to create and process the metadata entities that underpin the dashboards.

There are basic metadata classes (BaseMetadata, CollectionMetadata and DatasetMetadata) which contain metadata and allow for simple tasks like checking whether the metadata they contain matches what’s in a dictionary.

Then there are classes (DataSet, DataCollection, DataArchive) that use the metadata classes to define data sets, collections of related data sets and data archives. These classes allow you to select subsets of data, download data sets and read them in.

The metadata themselves are stored in json files in the climind.metadata_files directory.

Submodules

climind.data_manager.metadata module

These metadata classes contain all the information about the datasets that are manipulated by the packages. The BaseMetadata class contains much of the functionality, with CollectionMetadata and DatasetMetadata inheriting that functionality and differing chiefly in the schemas used to validate their contents. The CombinedMetadata class comprises a CollectionMetadata object and a DatasetMetadata object.

class climind.data_manager.metadata.BaseMetadata(metadata: dict)

Bases: object

Simple class to store metadata and find matches. Metadata items can be set and recovered using a dictionary-like syntax:

metadata_object[‘key’] = value

value = metadata_object[‘key’]

And testing if a key-value pair exists is also dict-like

key in metadata_object

Create a BaseMetadata object from a dictionary containing the metaadata in key-value pairs.

Parameters

metadata (dict) – Dictionary containing the metadata

metadata

Contains the metadata information in key value pairs

Type

dict

fill_string(string_to_replace: str, replacement: str)

Replace string_to_replace with the replacement value in all elements of the metadata.

Parameters
  • string_to_replace (str) – string to be replaced in metadata elements

  • replacement (str) – replacement string

Return type

None

match_metadata(metadata_to_match: dict) bool

Check if metadata match contents of dictionary, metadata_to_match. Only definite non-matches are rejected. If a key is not found in the dictionary this is not counted as a non-match.

Parameters

metadata_to_match (dict) – Key-value or key-list pairs for match

class climind.data_manager.metadata.CollectionMetadata(metadata: dict)

Bases: BaseMetadata

Class to store collection-level metadata, containing information that refers to all data sets in the collection.

Create CollectionMetadata from a dictionary containing metadata. Metadata are validated using the metadata_schema.json file.

Parameters

metadata (dict) – Dictionary containing metadata in key value pairs.

class climind.data_manager.metadata.CombinedMetadata(dataset: DatasetMetadata, collection: CollectionMetadata)

Bases: object

CombinedMetadata combines DatasetMetadata and CollectionMetadata in one single object so that both sets of metadata elements are available in one container.

creation_message() None

Add a creation message to the dataset history and populate the wildcards in the metadata, such as AAAA (last modified/download time), YYYY (year), VVVV (version number).

Return type

None

match_metadata(metadata_to_match: dict) bool

Test to see if metadata matches metadata to match. Returns True unless there is a mismatch between the required metadata_to_match and the metadata.

Parameters

metadata_to_match (dict) – Dictionary of metadata terms to match

Returns

Return True unless an element in metadata_to_match conflicts with an entry in the metadata

Return type

bool

write_metadata(filename: Path) None

Write out the metadata in json format to a file specified by filename

Parameters

filename (Path) – Path of filename to be created

Return type

None

class climind.data_manager.metadata.DatasetMetadata(metadata: dict)

Bases: BaseMetadata

Class to store dataset-level metadata, containing information that refers specifically to a single data set.

Create DatasetMetadata from a dictionary containing metadata. Metadata are validated using the dataset_schema.json file.

Parameters

metadata (dict) – Dictionary containing metadata in key value pairs.

creation_message() None

Add creation message to the history.

Return type

None

climind.data_manager.metadata.list_match(list_to_match: list, attribute: str) bool

If attribute matches any item in list_to_match return True, otherwise False

Parameters
  • list_to_match (list) – List of metadata to match

  • attribute (str) – attribute to check against

Returns

Set to True if attribute matches element in list_to_match, False otherwise

Return type

bool

climind.data_manager.processing module

The classes and functions in this script describe groupings of metadata. The basic building block is a DataSet, which specifies a file (or files) which contains the data for a single data set. DataSet objects are grouped into DataCollection objects, which gather together all the individual data sets which are derived from a single product. For example, HadCRUT5 is a product and so it has a corresponding DataCollection made up of several DataSet objects. Finally, a DataArchive contains one or more DataCollection objects. All DataSet objects in a DataCollection will be the same variable. However, DataCollection objects in a DataArchive need not be the same variable.

class climind.data_manager.processing.DataArchive

Bases: object

A set of DataCollection objects. A class:DataArchive is the starting point for the analysis. Particular DataSet objects are selected from the class:DataArchive before plotting or summarising the data.

Create a DataArchive object, initially empty.

collections

A dictionary containing the DataCollection objects in the archive

Type

dict

add_collection(data_collection: DataCollection) None

Add a DataCollection to the archive

Parameters

data_collection (DataCollection) – DataCollection to be added to the DataArchive

Return type

None

download(out_dir: Path) None

Download all files in the DataArchive.

Parameters

out_dir (Path) – Directory to which the files should be downloaded

Return type

None

static from_directory(path_to_dir: Path)

Create a DataArchive from a directory of metadata. The directory should contain a set of json files each of which contains a set of metadata describing a DataCollection

Parameters

path_to_dir (Path) – Path to the directory containing the metadata files that will be used to populate the DataArchive

Returns

DataArchive containing all DataCollection objects described in the metadata files

Return type

DataArchive

read_datasets(out_dir: Path, **kwargs) list

Read all the datasets in the DataArchive.

Parameters

out_dir (Path) – Path of directory containing the data

Returns

List of datasets specified by metadata in the archive.

Return type

list

select(metadata_to_match: dict)

Select datasets from the DataArchive that meet the metadata requirements specified in the metadata_to_match dictionary.

Parameters

metadata_to_match (dict) – Metadata to be matched. For each requirement, there should be a key-value pair

Returns

Returns DataArchive containing only data that match the metadata_to_match

Return type

DataArchive

class climind.data_manager.processing.DataCollection(metadata: dict)

Bases: object

A grouping of DataSet objects derived from a single product or source. e.g. HadCRUT5. This could include, for example, monthly and annual time series along with the gridded data.

Create DataCollection from a metadata dictionary.

Parameters

metadata (dict) –

global_attributes

Metadata containing the attributes that apply to all DataSets in the DataCollection

Type

CollectionMetadata

datasets

List containing all the DataSet objects in this collection

Type

List[DataSet]

add_dataset(ds: DataSet) None

Add DataSet object to DataCollection

Parameters

ds (DataSet) – DataSet to be added

Return type

None

download(data_dir: Path) None

Download all the data sets described by DataSet objects in the DataCollection.

Parameters

data_dir (Path) – Location to which the datasets should be downloaded

Return type

None

static from_file(filename: Path)

Given a file path create the DataCollection from metadata in that file

Parameters

filename (Path) – Filename of the metadata file in json format

Returns

DataCollection containing all the DataSet objects specified by the metadata file

Return type

DataCollection

get_collection_dir(data_dir: Path) Path

Get the Path to the directory where the data for this DataCollection are stored. If the directory does not exist, then create it.

Parameters

data_dir (Path) – Path to the general data directory for managed data in the project

Returns

Path to the directory for this DataCollection.

Return type

Path

match_metadata(metadata_to_match: dict)

Given a dictionary of metadata keys and required values for each key, return a DataCollection which contains only data sets matching the specified metadata

Parameters

metadata_to_match (dict) – Dictionary containing key:value pairs that specify the data sets required in the output DataCollection

Returns

Return DataCollection that matches the metadata_to_match

Return type

DataCollection

read_datasets(out_dir: Path, **kwargs) list

Read all the datasets described by DataSet objects in the DataCollection

Parameters

out_dir (Path) – Directory in which the datasets are found

Returns

Return list of all data sets described in the DataCollection.

Return type

list

to_file(filename: Path) None

Write the DataCollection metadata to file in json format.

Parameters

filename (Path) – Path to the file to be written

Return type

None

class climind.data_manager.processing.DataSet(metadata: DatasetMetadata, global_metadata: CollectionMetadata)

Bases: object

A DataSet contains metadata for a single dataset (one that might be split across multiple files). For example, NSIDC monthly sea ice extent data is a single data set provided in 12 files, one for each month. In contrast, HadCRUT5 monthly global mean temperature is a single file. Both of these would be described by a DataSet. They can be used to read in the actual data.

Create a DataSet from DatasetMetadata and CollectionMetadata.

Parameters
name

Name of the data set

Type

str

metadata

Dictionary of attributes

Type

dict

global_metadata

Dictionary of global attributes inherited from collection

Type

dict

download(out_dir: Path) None

Download the data set using its “fetcher” function. Fetcher functions are contained in the fetchers package.

Parameters

out_dir (Path) – Directory to which the data set will be downloaded

Return type

None

match_metadata(metadata_to_match: dict) bool

Check if there is a mismatch between attributes of DataSet and the contents of a dictionary, metadata_to_match. Only items that are in the attributes are checked.

Parameters

metadata_to_match (dict) – Dictionary of key-value or key-list pairs to match. If a key-list is provided then each element of the list is checked and a mismatch only occurs if all of the items in the list cause a mismatch.

Returns

Return True unless there is a mismatch in which case return False

Return type

bool

read_dataset(out_dir: Path, **kwargs)

Read in the dataset and output an object of the appropriate type.

Parameters

out_dir (Path) – Directory in which the data are to be found (dictated by the Collection)

Return type

Object of the appropriate type

climind.data_manager.processing.get_function(module_path: str, script_name: str, function_name: str) Callable

For a particular module and script in that module, return the function with a specified name as a callable object.

Parameters
  • module_path (str) – The path to the module written using dot separation between directories

  • script_name (str) – The name of the script

  • function_name (str) – The name of the function in the script to be returned

Returns

Returns the function with the specified function name from the script with the specified script name in the specified module path

Return type

Callable