climind.data_manager package
The scripts and files in the data manager module, are used to create and process the metadata entities that underpin the dashboards.
There are basic metadata classes (BaseMetadata
, CollectionMetadata
and DatasetMetadata
)
which contain metadata and allow for simple tasks like checking whether the metadata
they contain matches what’s in a dictionary.
Then there are classes (DataSet
, DataCollection
, DataArchive
)
that use the metadata classes
to define data sets, collections of related data sets and data archives. These classes
allow you to select subsets of data, download data sets and read them in.
The metadata themselves are stored in json files in the climind.metadata_files directory.
Submodules
climind.data_manager.metadata module
These metadata classes contain all the information about the datasets that are manipulated
by the packages. The BaseMetadata
class contains much of the functionality, with
CollectionMetadata
and DatasetMetadata
inheriting that functionality and
differing chiefly in the schemas used to validate their contents. The CombinedMetadata
class comprises a CollectionMetadata
object and a DatasetMetadata
object.
- class climind.data_manager.metadata.BaseMetadata(metadata: dict)
Bases:
object
Simple class to store metadata and find matches. Metadata items can be set and recovered using a dictionary-like syntax:
metadata_object[‘key’] = value
value = metadata_object[‘key’]
And testing if a key-value pair exists is also dict-like
key in metadata_object
Create a
BaseMetadata
object from a dictionary containing the metaadata in key-value pairs.- Parameters
metadata (dict) – Dictionary containing the metadata
- metadata
Contains the metadata information in key value pairs
- Type
dict
- fill_string(string_to_replace: str, replacement: str)
Replace string_to_replace with the replacement value in all elements of the metadata.
- Parameters
string_to_replace (str) – string to be replaced in metadata elements
replacement (str) – replacement string
- Return type
None
- match_metadata(metadata_to_match: dict) bool
Check if metadata match contents of dictionary, metadata_to_match. Only definite non-matches are rejected. If a key is not found in the dictionary this is not counted as a non-match.
- Parameters
metadata_to_match (dict) – Key-value or key-list pairs for match
- class climind.data_manager.metadata.CollectionMetadata(metadata: dict)
Bases:
BaseMetadata
Class to store collection-level metadata, containing information that refers to all data sets in the collection.
Create
CollectionMetadata
from a dictionary containing metadata. Metadata are validated using the metadata_schema.json file.- Parameters
metadata (dict) – Dictionary containing metadata in key value pairs.
- class climind.data_manager.metadata.CombinedMetadata(dataset: DatasetMetadata, collection: CollectionMetadata)
Bases:
object
CombinedMetadata
combinesDatasetMetadata
andCollectionMetadata
in one single object so that both sets of metadata elements are available in one container.- creation_message() None
Add a creation message to the dataset history and populate the wildcards in the metadata, such as AAAA (last modified/download time), YYYY (year), VVVV (version number).
- Return type
None
- match_metadata(metadata_to_match: dict) bool
Test to see if metadata matches metadata to match. Returns True unless there is a mismatch between the required metadata_to_match and the metadata.
- Parameters
metadata_to_match (dict) – Dictionary of metadata terms to match
- Returns
Return True unless an element in metadata_to_match conflicts with an entry in the metadata
- Return type
bool
- write_metadata(filename: Path) None
Write out the metadata in json format to a file specified by filename
- Parameters
filename (Path) – Path of filename to be created
- Return type
None
- class climind.data_manager.metadata.DatasetMetadata(metadata: dict)
Bases:
BaseMetadata
Class to store dataset-level metadata, containing information that refers specifically to a single data set.
Create
DatasetMetadata
from a dictionary containing metadata. Metadata are validated using the dataset_schema.json file.- Parameters
metadata (dict) – Dictionary containing metadata in key value pairs.
- creation_message() None
Add creation message to the history.
- Return type
None
- climind.data_manager.metadata.list_match(list_to_match: list, attribute: str) bool
If attribute matches any item in list_to_match return True, otherwise False
- Parameters
list_to_match (list) – List of metadata to match
attribute (str) – attribute to check against
- Returns
Set to True if attribute matches element in list_to_match, False otherwise
- Return type
bool
climind.data_manager.processing module
The classes and functions in this script describe groupings of metadata. The basic building
block is a DataSet
, which specifies a file (or files) which contains the data for a single
data set. DataSet
objects are grouped into DataCollection
objects, which gather
together all the individual data sets which are derived from a single product. For example,
HadCRUT5 is a product and so it has a corresponding DataCollection made up of several DataSet
objects. Finally, a DataArchive
contains one or more DataCollection
objects. All
DataSet
objects in a DataCollection
will be the same variable. However, DataCollection
objects in a DataArchive
need not be the same variable.
- class climind.data_manager.processing.DataArchive
Bases:
object
A set of
DataCollection
objects. A class:DataArchive is the starting point for the analysis. ParticularDataSet
objects are selected from the class:DataArchive before plotting or summarising the data.Create a
DataArchive
object, initially empty.- collections
A dictionary containing the
DataCollection
objects in the archive- Type
dict
- add_collection(data_collection: DataCollection) None
Add a
DataCollection
to the archive- Parameters
data_collection (DataCollection) –
DataCollection
to be added to theDataArchive
- Return type
None
- download(out_dir: Path) None
Download all files in the
DataArchive
.- Parameters
out_dir (Path) – Directory to which the files should be downloaded
- Return type
None
- static from_directory(path_to_dir: Path)
Create a
DataArchive
from a directory of metadata. The directory should contain a set of json files each of which contains a set of metadata describing aDataCollection
- Parameters
path_to_dir (Path) – Path to the directory containing the metadata files that will be used to populate the
DataArchive
- Returns
DataArchive
containing allDataCollection
objects described in the metadata files- Return type
- read_datasets(out_dir: Path, **kwargs) list
Read all the datasets in the
DataArchive
.- Parameters
out_dir (Path) – Path of directory containing the data
- Returns
List of datasets specified by metadata in the archive.
- Return type
list
- select(metadata_to_match: dict)
Select datasets from the
DataArchive
that meet the metadata requirements specified in the metadata_to_match dictionary.- Parameters
metadata_to_match (dict) – Metadata to be matched. For each requirement, there should be a key-value pair
- Returns
Returns
DataArchive
containing only data that match the metadata_to_match- Return type
- class climind.data_manager.processing.DataCollection(metadata: dict)
Bases:
object
A grouping of
DataSet
objects derived from a single product or source. e.g. HadCRUT5. This could include, for example, monthly and annual time series along with the gridded data.Create
DataCollection
from a metadata dictionary.- Parameters
metadata (dict) –
- global_attributes
Metadata containing the attributes that apply to all DataSets in the
DataCollection
- Type
- add_dataset(ds: DataSet) None
Add
DataSet
object toDataCollection
- Parameters
ds (DataSet) – DataSet to be added
- Return type
None
- download(data_dir: Path) None
Download all the data sets described by
DataSet
objects in theDataCollection
.- Parameters
data_dir (Path) – Location to which the datasets should be downloaded
- Return type
None
- static from_file(filename: Path)
Given a file path create the
DataCollection
from metadata in that file- Parameters
filename (Path) – Filename of the metadata file in json format
- Returns
DataCollection containing all the
DataSet
objects specified by the metadata file- Return type
- get_collection_dir(data_dir: Path) Path
Get the Path to the directory where the data for this
DataCollection
are stored. If the directory does not exist, then create it.- Parameters
data_dir (Path) – Path to the general data directory for managed data in the project
- Returns
Path to the directory for this
DataCollection
.- Return type
Path
- match_metadata(metadata_to_match: dict)
Given a dictionary of metadata keys and required values for each key, return a
DataCollection
which contains only data sets matching the specified metadata- Parameters
metadata_to_match (dict) – Dictionary containing key:value pairs that specify the data sets required in the output
DataCollection
- Returns
Return
DataCollection
that matches the metadata_to_match- Return type
- read_datasets(out_dir: Path, **kwargs) list
Read all the datasets described by
DataSet
objects in theDataCollection
- Parameters
out_dir (Path) – Directory in which the datasets are found
- Returns
Return list of all data sets described in the
DataCollection
.- Return type
list
- to_file(filename: Path) None
Write the
DataCollection
metadata to file in json format.- Parameters
filename (Path) – Path to the file to be written
- Return type
None
- class climind.data_manager.processing.DataSet(metadata: DatasetMetadata, global_metadata: CollectionMetadata)
Bases:
object
A
DataSet
contains metadata for a single dataset (one that might be split across multiple files). For example, NSIDC monthly sea ice extent data is a single data set provided in 12 files, one for each month. In contrast, HadCRUT5 monthly global mean temperature is a single file. Both of these would be described by aDataSet
. They can be used to read in the actual data.Create a
DataSet
fromDatasetMetadata
andCollectionMetadata
.- Parameters
metadata (DatasetMetadata) –
DatasetMetadata
containing the dataset metadata.global_metadata (CollectionMetadata) –
CollectionMetadata
containing the global metadata
- name
Name of the data set
- Type
str
- metadata
Dictionary of attributes
- Type
dict
- global_metadata
Dictionary of global attributes inherited from collection
- Type
dict
- download(out_dir: Path) None
Download the data set using its “fetcher” function. Fetcher functions are contained in the fetchers package.
- Parameters
out_dir (Path) – Directory to which the data set will be downloaded
- Return type
None
- match_metadata(metadata_to_match: dict) bool
Check if there is a mismatch between attributes of
DataSet
and the contents of a dictionary, metadata_to_match. Only items that are in the attributes are checked.- Parameters
metadata_to_match (dict) – Dictionary of key-value or key-list pairs to match. If a key-list is provided then each element of the list is checked and a mismatch only occurs if all of the items in the list cause a mismatch.
- Returns
Return True unless there is a mismatch in which case return False
- Return type
bool
- read_dataset(out_dir: Path, **kwargs)
Read in the dataset and output an object of the appropriate type.
- Parameters
out_dir (Path) – Directory in which the data are to be found (dictated by the Collection)
- Return type
Object of the appropriate type
- climind.data_manager.processing.get_function(module_path: str, script_name: str, function_name: str) Callable
For a particular module and script in that module, return the function with a specified name as a callable object.
- Parameters
module_path (str) – The path to the module written using dot separation between directories
script_name (str) – The name of the script
function_name (str) – The name of the function in the script to be returned
- Returns
Returns the function with the specified function name from the script with the specified script name in the specified module path
- Return type
Callable