climind.data_manager package
Submodules
climind.data_manager.metadata module
These metadata classes contain all the information about the datasets that are manipulated
by the packages. The BaseMetadata class contains much of the functionality, with
CollectionMetadata and DatasetMetadata inheriting that functionality and
differing chiefly in the schemas used to validate their contents. The CombinedMetadata
class comprises a CollectionMetadata object and a DatasetMetadata object.
- class climind.data_manager.metadata.BaseMetadata(metadata: dict)[source]
Bases:
objectSimple class to store metadata and find matches. Metadata items can be set and recovered using a dictionary-like syntax:
metadata_object[‘key’] = value
value = metadata_object[‘key’]
And testing if a key-value pair exists is also dict-like
key in metadata_object
Create a
BaseMetadataobject from a dictionary containing the metaadata in key-value pairs.- Parameters:
metadata (dict) – Dictionary containing the metadata
- metadata
Contains the metadata information in key value pairs
- Type:
dict
- fill_string(string_to_replace: str, replacement: str)[source]
Replace string_to_replace with the replacement value in all elements of the metadata. This is used to replace placeholder substrings like “YYYY” with the year, or “MMMM” with the month, or “VVVV” with a version number.
- Parameters:
string_to_replace (str) – string to be replaced in metadata elements
replacement (str) – replacement string
- Return type:
None
- match_metadata(metadata_to_match: dict) bool[source]
Check if metadata match contents of dictionary, metadata_to_match. Only definite non-matches are rejected. If a key is not found in the dictionary this is not counted as a non-match.
- Parameters:
metadata_to_match (dict) – Key-value or key-list pairs for match
- class climind.data_manager.metadata.CollectionMetadata(metadata: dict)[source]
Bases:
BaseMetadataClass to store collection-level metadata, containing information that refers to all data sets in the collection.
Create
CollectionMetadatafrom a dictionary containing metadata. Metadata are validated using the metadata_schema.json file.- Parameters:
metadata (dict) – Dictionary containing metadata in key value pairs.
- class climind.data_manager.metadata.CombinedMetadata(dataset: DatasetMetadata, collection: CollectionMetadata)[source]
Bases:
objectCombinedMetadatacombinesDatasetMetadataandCollectionMetadatain one single object so that both sets of metadata elements are available in one container.- creation_message() None[source]
Add a creation message to the dataset history and populate the wildcards in the metadata, such as AAAA (last modified/download time), YYYY (year), VVVV (version number).
- Return type:
None
- match_metadata(metadata_to_match: dict) bool[source]
Test to see if metadata matches metadata to match. Returns True unless there is a mismatch between the required metadata_to_match and the metadata.
- Parameters:
metadata_to_match (dict) – Dictionary of metadata terms to match
- Returns:
Return True unless an element in metadata_to_match conflicts with an entry in the metadata
- Return type:
bool
- class climind.data_manager.metadata.DatasetMetadata(metadata: dict)[source]
Bases:
BaseMetadataClass to store dataset-level metadata, containing information that refers specifically to a single data set.
Create
DatasetMetadatafrom a dictionary containing metadata. Metadata are validated using the dataset_schema.json file.- Parameters:
metadata (dict) – Dictionary containing metadata in key value pairs.
- climind.data_manager.metadata.list_match(list_to_match: list, attribute: str) bool[source]
If attribute matches any item in list_to_match return True, otherwise False
- Parameters:
list_to_match (list) – List of metadata to match
attribute (str) – attribute to check against
- Returns:
Set to True if attribute matches element in list_to_match, False otherwise
- Return type:
bool
climind.data_manager.processing module
The classes and functions in this script describe groupings of metadata. The basic building
block is a DataSet, which specifies a file (or files) which contains the data for a single
data set. DataSet objects are grouped into DataCollection objects, which gather
together all the individual data sets which are derived from a single product. For example,
HadCRUT5 is a product and so it has a corresponding DataCollection made up of several DataSet
objects. Finally, a DataArchive contains one or more DataCollection objects. All
DataSet objects in a DataCollection will be the same variable. However, DataCollection
objects in a DataArchive need not be the same variable.
- class climind.data_manager.processing.DataArchive[source]
Bases:
objectA set of
DataCollectionobjects. A class:DataArchive is the starting point for the analysis. ParticularDataSetobjects are selected from the class:DataArchive before plotting or summarising the data.Create a
DataArchiveobject, initially empty.- collections
A dictionary containing the
DataCollectionobjects in the archive- Type:
dict
- add_collection(data_collection: DataCollection) None[source]
Add a
DataCollectionto the archive- Parameters:
data_collection (DataCollection) –
DataCollectionto be added to theDataArchive- Return type:
None
- download(out_dir: Path) None[source]
Download all files in the
DataArchive.- Parameters:
out_dir (Path) – Directory to which the files should be downloaded
- Return type:
None
- static from_directory(path_to_dir: List[Path] | Path)[source]
Create a
DataArchivefrom a directory of metadata. The directory should contain a set of json files each of which contains a set of metadata describing aDataCollection- Parameters:
path_to_dir (Path or List[Path]) – Path to the directory containing the metadata files that will be used to populate the
DataArchiveor a list of such Paths.- Returns:
DataArchivecontaining allDataCollectionobjects described in the metadata files- Return type:
- read_datasets(out_dir: Path, **kwargs) list[source]
Read all the datasets in the
DataArchive.- Parameters:
out_dir (Path) – Path of directory containing the data
- Returns:
List of datasets specified by metadata in the archive.
- Return type:
list
- select(metadata_to_match: dict)[source]
Select datasets from the
DataArchivethat meet the metadata requirements specified in the metadata_to_match dictionary.- Parameters:
metadata_to_match (dict) – Metadata to be matched. For each requirement, there should be a key-value pair
- Returns:
Returns
DataArchivecontaining only data that match the metadata_to_match- Return type:
- class climind.data_manager.processing.DataCollection(metadata: dict)[source]
Bases:
objectA grouping of
DataSetobjects derived from a single product or source. e.g. HadCRUT5. This could include, for example, monthly and annual time series along with the gridded data.Create
DataCollectionfrom a metadata dictionary.- Parameters:
metadata (dict)
- global_attributes
Metadata containing the attributes that apply to all DataSets in the
DataCollection- Type:
- add_dataset(ds: DataSet) None[source]
Add
DataSetobject toDataCollection- Parameters:
ds (DataSet) – DataSet to be added
- Return type:
None
- download(data_dir: Path) None[source]
Download all the data sets described by
DataSetobjects in theDataCollection.- Parameters:
data_dir (Path) – Location to which the datasets should be downloaded
- Return type:
None
- static from_file(filename: Path)[source]
Given a file path create the
DataCollectionfrom metadata in that file- Parameters:
filename (Path) – Filename of the metadata file in json format
- Returns:
DataCollection containing all the
DataSetobjects specified by the metadata file- Return type:
- get_collection_dir(data_dir: Path) Path[source]
Get the Path to the directory where the data for this
DataCollectionare stored. If the directory does not exist, then create it.- Parameters:
data_dir (Path) – Path to the general data directory for managed data in the project
- Returns:
Path to the directory for this
DataCollection.- Return type:
Path
- match_metadata(metadata_to_match: dict)[source]
Given a dictionary of metadata keys and required values for each key, return a
DataCollectionwhich contains only data sets matching the specified metadata- Parameters:
metadata_to_match (dict) – Dictionary containing key:value pairs that specify the data sets required in the output
DataCollection- Returns:
Return
DataCollectionthat matches the metadata_to_match- Return type:
- read_datasets(out_dir: Path | List[Path], **kwargs) list[source]
Read all the datasets described by
DataSetobjects in theDataCollection- Parameters:
out_dir (Path) – Directory in which the datasets are found
- Returns:
Return list of all data sets described in the
DataCollection.- Return type:
list
- to_file(filename: Path) None[source]
Write the
DataCollectionmetadata to file in json format.- Parameters:
filename (Path) – Path to the file to be written
- Return type:
None
- class climind.data_manager.processing.DataSet(metadata: DatasetMetadata, global_metadata: CollectionMetadata)[source]
Bases:
objectA
DataSetcontains metadata for a single dataset (one that might be split across multiple files). For example, NSIDC monthly sea ice extent data is a single data set provided in 12 files, one for each month. In contrast, HadCRUT5 monthly global mean temperature is a single file. Both of these would be described by aDataSet. They can be used to read in the actual data.Create a
DataSetfromDatasetMetadataandCollectionMetadata.- Parameters:
metadata (DatasetMetadata) –
DatasetMetadatacontaining the dataset metadata.global_metadata (CollectionMetadata) –
CollectionMetadatacontaining the global metadata
- name
Name of the data set
- Type:
str
- metadata
Dictionary of attributes
- Type:
dict
- global_metadata
Dictionary of global attributes inherited from collection
- Type:
dict
- download(out_dir: Path) None[source]
Download the data set using its “fetcher” function. Fetcher functions are contained in the fetchers package.
- Parameters:
out_dir (Path) – Directory to which the data set will be downloaded
- Return type:
None
- match_metadata(metadata_to_match: dict) bool[source]
Check if there is a mismatch between attributes of
DataSetand the contents of a dictionary, metadata_to_match. Only items that are in the attributes are checked.- Parameters:
metadata_to_match (dict) – Dictionary of key-value or key-list pairs to match. If a key-list is provided then each element of the list is checked and a mismatch only occurs if all of the items in the list cause a mismatch.
- Returns:
Return True unless there is a mismatch in which case return False
- Return type:
bool
- climind.data_manager.processing.get_function(module_path: str, script_name: str, function_name: str) Callable[source]
For a particular module and script in that module, return the function with a specified name as a callable object.
- Parameters:
module_path (str) – The path to the module written using dot separation between directories
script_name (str) – The name of the script
function_name (str) – The name of the function in the script to be returned
- Returns:
Returns the function with the specified function name from the script with the specified script name in the specified module path
- Return type:
Callable
Module contents
The scripts and files in the data manager module, are used to create and process the metadata entities that underpin the dashboards.
There are basic metadata classes (BaseMetadata, CollectionMetadata
and DatasetMetadata)
which contain metadata and allow for simple tasks like checking whether the metadata
they contain matches what’s in a dictionary.
Then there are classes (DataSet, DataCollection, DataArchive)
that use the metadata classes
to define data sets, collections of related data sets and data archives. These classes
allow you to select subsets of data, download data sets and read them in.
The metadata themselves are stored in json files in the climind.metadata_files directory.