climind.data_types package
There are two main data types implemented in this package: timeseries and grids. In each of those two cases, the data set consists of a data-carrying part and a metadata part. For timeseries, the data-carrying part is a pandas dataframe and for a grid, it’s an xarray dataset.
Submodules
climind.data_types.grid module
- class climind.data_types.grid.GridAnnual(input_data, metadata: CombinedMetadata)
Bases:
object
A
GridAnnual
combines an xarray Dataset with aCombinedMetadata
to bring together data and metadata in one object. It represents annual averages of data.Create an annual gridded data set from an xarray Dataset and
CombinedMetadata
object.- Parameters
input_data (xa.Dataset) – xarray dataset
metadata (CombinedMetadata) – CombinedMetadata object
- get_end_year() int
Get the last year in the dataset
- Returns
Last year in the data set
- Return type
int
- get_start_year() int
Get the first year in the dataset
- Returns
First year in the dataset
- Return type
int
- get_year_range(start_year: int, end_year: int)
Select a range of consecutive years from the data set.
- Parameters
start_year (int) – start year
end_year (int) – end year
- Returns
Returns a
GridAnnual
containing only data within the specified year range.- Return type
- rank()
Return a data set where the values are the ranks of each grid cell value.
- Returns
Return a
GridAnnual
containing the values as ranks from highest (1) to lowest.- Return type
- running_average(n_year: int)
Calculate an n_year running average of the data in the dataset
- Parameters
n_year (int) – Number of years for which the running average is calculated
- Returns
Annual gridded dataset which contains the running averages
- Return type
- select_year_range(start_year: int, end_year: int)
Select a particular range of consecutive years from the data set and throw away the rest.
- Parameters
start_year (int) – First year of selection
end_year (int) – Final year of selction
- Returns
Returns a
GridAnnual
containing only data within the specified year range.- Return type
- update_history(message: str) None
Update the history metadata
- Parameters
message (str) – Message to be added to history
- Return type
None
- write_grid(filename: Path, metadata_filename: Optional[Path] = None, name: Optional[str] = None) None
Write the grid to file.
- Parameters
filename (Path) – Filename to write grid to
metadata_filename (Path) – Filename to write metadata to
name (str) – Optional name to give the data set being written. Note that names should be unique in any data archive.
- Return type
None
- class climind.data_types.grid.GridMonthly(input_data: xarray.Dataset, metadata: CombinedMetadata)
Bases:
object
A
GridMonthly
combines an xarray Dataset with aCombinedMetadata
to bring together data and metadata in one object. It represents monthly averages of data on a regular grid.Create a :class:’.GridMonthly` object from an xarray Dataset and a
CombinedMetadata
object.- Parameters
input_data (xa.Dataset) – xarray dataset
metadata (CombinedMetadata) – CombinedMetadata object
- calculate_regional_average(regions, region_number, land_only=True) TimeSeriesMonthly
Calculate a regional average from the grid. The region is specified by a geopandas Geodataframe and the index (region_number) of the chosen shape. By default, the output is masked to land areas only, this can be switched off by setting land_only to False.
- Parameters
regions (Geodataframe) – geopandas Geodataframe specifying the region to be average over
region_number (int) – the index of the particular region in the Geodataframe
land_only (bool) – By defauly output is masked to land areas only, to calculate a full area average set land_only to False
- Returns
Returns time series of area averages.
- Return type
ts.TimeSeriesMonthly
- calculate_regional_average_missing(regions, region_number, threshold=0.3, land_only=True) TimeSeriesMonthly
Calculate a regional average from the grid. The region is specified by a geopandas Geodataframe and the index (region_number) of the chosen shape. By default, the output is masked to land areas only, this can be switched off by setting land_only to False.
- Parameters
regions (Geodataframe) – geopandas Geodataframe specifying the region to be average over
region_number (int) – the index of the particular region in the Geodataframe
threshold (float) – If the area covered by data in the region drops below this threshold then NaN is returned.
land_only (bool) – By defauly output is masked to land areas only, to calculate a full area average set land_only to False
- Returns
Returns time series of area averages.
- Return type
ts.TimeSeriesMonthly
- calculate_time_mean(cumulative=False)
Calculate the time mean of the map
- Returns
Returns a
GridMonthly
containing the time mean of the data.- Return type
- get_last_month() datetime
Get the date of the last month in the dataset
- Returns
Date of the last month in the dataset
- Return type
datetime
- make_annual()
Calculate an annual average from a monthly grid by taking the arithmetic mean of available monthly anomalies.
- Returns
Return annual average of the grid
- Return type
- rebaseline(first_year: int, final_year: int) xarray.Dataset
Change the baseline of the data to the period between first_year and final_year by subtracting the average of the available data between those two years (inclusive).
- Parameters
first_year (int) – First year of climatology period
final_year (int) – Final year of climatology period
- Returns
Changes the dataset in place, but also returns the dataset if needed
- Return type
xa.Dataset
- select_period(start_year: int, start_month: int, end_year: int, end_month: int)
Select a period from the grid specifed by start year and month and end year and month, inclusive.
- Parameters
start_year (int) – Year of start date
start_month (int) – Month of start date
end_year (int) – Year of end date
end_month (int) – Month of end date
- Returns
Returns a
GridMonthly
containing only data within the specified date range.- Return type
- select_year_and_month(year: int, month: int)
Select a particular month from the data set and throw away the rest.
- Parameters
year (int) – Year of selection
month (int) – Month of selection
- Returns
Returns a
GridMonthly
containing only data within the specified year range.- Return type
- update_history(message: str) None
Update the history metadata with a message.
- Parameters
message (str) – Message to be added to history
- Return type
None
- climind.data_types.grid.get_1d_transfer(zero_point_original: float, grid_space_original: float, zero_point_target: float, grid_space_target: float, index_in_original: int) tuple
Find the overlapping grid spacings for a new grid based on an index in the old grid
- Parameters
zero_point_original (float) – longitude or latitude of the zero-indexed grid cell
grid_space_original (float) – grid spacing in degrees
zero_point_target (float) – longitude or latitude of the zero-indexed grid cells in the targe grid
grid_space_target (float) – grid spacing in degrees of the target grid
index_in_original (int) – index of the gridcell in the original grid
- Returns
Returns, the longitude of the first grid cell in the new grid, the number of steps, and the first and last indices on the new grid.
- Return type
tuple
- climind.data_types.grid.get_start_and_end_year(all_datasets: List[GridAnnual]) Tuple[int, int]
Given a list of
GridAnnual
datasets, find the earliest start year and the latest end year- Parameters
all_datasets (List[GridAnnual]) – List of datasets for which we want to find the first and last year
- Return type
Tuple[int, int]
- climind.data_types.grid.make_standard_grid(out_grid: numpy.ndarray, start_date: datetime, freq: str, number_of_times: int) xarray.Dataset
Make the standard 5x5 grid from a numpy array, start date, temporal frequency and number of time steps.
- Parameters
out_grid (np.ndarray) – Numpy array containing the data. Shape should be (number_of_times, 36, 72)
start_date (datetime) – Date of the first time step
freq (str) – Temporal frequency
number_of_times (int) – Number of time steps, should match the first dimension of the out_grid
- Returns
xarray Dataset containing the data in out_grid with the specified temporal frequency and number of time steps
- Return type
xa.Dataset
- climind.data_types.grid.make_xarray(target_grid, times, latitudes, longitudes, variable: str = 'tas_mean') xarray.Dataset
Make a xarray Dataset for a regular lat-lon grid from a numpy grid (ntime, nlat, nlon), and arrays of time (ntime), latitude (nlat) and longitude (nlon).
- Parameters
target_grid (np.ndarray) – numpy array of shape (ntime, nlat, nlon)
times (np.ndarray) – Array of times, shape (ntime)
latitudes (np.ndarray) – Array of latitudes, shape (nlat)
longitudes (np.ndarray) – Array of longitudes, shape (nlon)
variable (str) – Variable name
- Returns
Dataset built from the input components
- Return type
xa.Dataset
- climind.data_types.grid.median_of_datasets(all_datasets: List[GridAnnual]) GridAnnual
Calculate the median of a list of
GridAnnual
data sets- Parameters
all_datasets (List[GridAnnual]) – List of
GridAnnual
datasets from which the medians will be calculated.- Return type
- climind.data_types.grid.process_datasets(all_datasets: List[GridAnnual], grid_type: str) GridAnnual
Calculate the median or range (depending on selected type) of a list of
GridAnnual
data sets. Medians are calculated on a grid cell by grid cell basis based on all available data in the list of data sets.- Parameters
all_datasets (List[GridAnnual]) – list of GridAnnual data sets
grid_type (str) – Either ‘median’ or ‘range’
- Returns
Data set containing the median (or half-range) values from all the data sets supplied
- Return type
- climind.data_types.grid.range_of_datasets(all_datasets: List[GridAnnual]) GridAnnual
Calculate the half-range of a list of
GridAnnual
data sets- Parameters
all_datasets (List[GridAnnual]) – List of
GridAnnual
datasets from which the ranges will be calculated.- Return type
- climind.data_types.grid.rank_array(in_array: numpy.ndarray) int
Rank array
- Parameters
in_array (np.ndarray) – Array to be ranked
- Return type
int
- climind.data_types.grid.simple_regrid(ingrid: numpy.ndarray, lon0: float, lat0: float, dx: float, target_dy: float) numpy.ndarray
Perform a simple regridding, using a simple average of grid cells from the original grid that fall within the target grid cell.
- Parameters
ingrid (np.ndarray) – Starting grid which we want to regrid
lon0 (float) – Longitude of zero-indexed grid cell in longitudinal direction
lat0 (float) – Latitude of zero-indexed grid cell in latitudinal direction
dx (float) – Grid spacing in degrees
target_dy (float) – Target grid spacing
- Returns
Returns regridded array.
- Return type
np.ndarray
climind.data_types.timeseries module
- class climind.data_types.timeseries.AveragesCollection(all_datasets)
Bases:
object
A simple class to perform specific tasks on lists of
TimeSeriesAnnual
- best_estimate()
- count()
- lower_range()
- range()
- upper_range()
- class climind.data_types.timeseries.TimeSeries(metadata: Optional[CombinedMetadata] = None)
Bases:
ABC
A base class for representing time series data sets. Note that this class should not generally be used and only its subclasses
TimeSeriesMonthly
,TimeSeriesAnnual
andTimeSeriesIrregular
should be used. This class contains shared functionality from these classes but does not work on its own.- add_offset(**kwargs)
- get_first_and_last_year() Tuple[int, int]
Get the first and last year in the series
- Returns
first and last year
- Return type
Tuple[int, int]
- abstract get_string_date_range() str
Create a string which specifies the date range covered by the time series
- Return type
str
- manually_set_baseline(**kwargs)
- select_year_range(**kwargs)
- update_history(message: str) None
Update the history metadata
- Parameters
message (str) – Message to be added to history
- Return type
None
- write_generic_csv(filename: Path, metadata_filename: Path, monthly: bool, uncertainty: bool, irregular: bool, columns_to_write: List[str]) None
Write the dataset out into csv format
- Parameters
filename (Path) – Path of the csv file to which the data will be written.
metadata_filename (Path) – Path of the json file to which the data will be written.
monthly (bool) – Set to True for monthly data
uncertainty (bool) – Set to True to print uncertainties
irregular (bool) – Set to True for irregular data
columns_to_write (List[str]) – List of the columns from the dataframe to be written to the data file
- Return type
None
- class climind.data_types.timeseries.TimeSeriesAnnual(years: list, data: list, metadata=None, uncertainty: Optional[list] = None)
Bases:
TimeSeries
A
TimeSeriesAnnual
combines a pandas Dataframe with aCombinedMetadata
to bring together data and metadata in one object. It represents annual averages of data.Create
TimeSeriesAnnual
object from its components.- Parameters
years (list) – List of years
data (list) – List of data values
metadata (CombinedMetadata) – Dictionary containing the metadata
- df
Pandas dataframe containing the time and data information
- Type
pd.DataFrame
- metadata
Dictionary containing the metadata. The only guaranteed entry is ‘history’
- Type
dict
- add_year(year: int, value: float, uncertainty: Optional[float] = None) None
Add a year of data.
- Parameters
year (int) – the year to be added
value (float) – the data value to be added
uncertainty – the uncertainty of the data value to be added (optional)
- Return type
None
- generate_dates(time_units: str) List[datetime]
Given a string specifying the required time units (something like days since 1800-01-01 00:00:00.0), generate a list of times from the time series corresponding to those units.
- Parameters
time_units (str) – String specifying the units to use for generating the times e.g. “days since 1800-01-01 00:00:00.0”
- Returns
List of dates
- Return type
List[datetime]
- get_rank_from_year(**kwargs)
- get_string_date_range() str
Create a string which specifies the date range covered by the
TimeSeriesAnnual
in the format YYYY-YYYY- Returns
String that specifies the date range covered
- Return type
str
- get_uncertainty_from_year(**kwargs)
- get_value_from_year(**kwargs)
- get_year_axis() List[float]
Return a year axis with dates represented as decimal years.
- Returns
List of dates as decimal years.
- Return type
List[float]
- get_year_from_rank(**kwargs)
- static make_from_df(df: pandas.DataFrame, metadata: CombinedMetadata)
Create a
TimeSeriesAnnual
from a pandas data frame.- Parameters
df (pd.DataFrame) – Pandas dataframe containing columns ‘year’ and ‘data’
metadata (dict) – Dictionary containing the metadata
- Returns
TimeSeriesAnnual
created from the elements in the dataframe and metadata.- Return type
- rebaseline(**kwargs)
- running_mean(**kwargs)
- running_stdev(**kwargs)
- select_decade(**kwargs)
- write_csv(filename, metadata_filename=None)
Write the timeseries to a csv file with the specified filename. The format used for writing is given by the BADC CSV format. This has a lot of upfront metadata before the data section. An option for writing a metadata file is also provided.
- Parameters
filename (Path) – Path of the filename to write the data to
metadata_filename (Path) – Path of the filename to write the metadata to
- Return type
None
- write_simple_csv(filename)
- class climind.data_types.timeseries.TimeSeriesIrregular(years: List[int], months: List[int], days: List[int], data: List[float], metadata: Optional[CombinedMetadata] = None, uncertainty: Optional[List[float]] = None)
Bases:
TimeSeries
A
TimeSeriesIrregular
combines a pandas Dataframe with aCombinedMetadata
to bring together data and metadata in one object. It represents non-monthly, non-annual averages of data such as weekly, or 5-day averages.Create
TimeSeriesIrregular
object.- Parameters
years (List[int]) – List of integers specifying the year of each data point
months (List[int]) – List of integers specifying the month of each data point
days (List[int]) – List of integers specifying the day of each data point
data (List[float]) – List of floats with the data values
metadata (CombinedMetadata) – CombinedMetadata object holding the metadata for the dataset
uncertainty (List[float]) – List of floats with the uncertainty values for each data point
- generate_dates(time_units: str) List[int]
Given a string specifying the required time units (something like days since 1800-01-01 00:00:00.0), generate a list of times from the time series corresponding to those units.
- Parameters
time_units (str) – String specifying the units to use for generating the times e.g. “days since 1800-01-01 00:00:00.0”
- Return type
List[int]
- get_start_and_end_dates() Tuple[datetime, datetime]
Get the first and last dates in the dataset
- Return type
Tuple[datetime, datetime]
- get_string_date_range() str
Create a string which specifies the date range covered by the
TimeSeriesIrregular
in the format YYYY.MM.DD-YYYY.MM.DD- Returns
String that specifies the date range covered
- Return type
str
- get_year_axis() List[float]
Return a year in which all dates are represented as decimal years. January 1st 1984 is 1984.00.
- Returns
List of dates represented as decimal years.
- Return type
List[float]
- make_monthly(**kwargs)
- rebaseline(**kwargs)
- write_csv(filename: Path, metadata_filename: Optional[Path] = None) None
Write the timeseries to a csv file with the specified filename. The format used for writing is given by the BADC CSV format. This has a lot of upfront metadata before the data section. An option for writing a metadata file is also provided.
- Parameters
filename (Path) – Path of the filename to write the data to
metadata_filename (Path) – Path of the filename to write the metadata to
- Return type
None
- class climind.data_types.timeseries.TimeSeriesMonthly(years: List[int], months: List[int], data: List[float], metadata: Optional[CombinedMetadata] = None, uncertainty: Optional[List[float]] = None)
Bases:
TimeSeries
A
TimeSeriesMonthly
combines a pandas Dataframe with aCombinedMetadata
to bring together data and metadata in one object. It represents monthly averages of data.Create
TimeSeriesMonthly
object.- Parameters
years (List[int]) – List of years
months (List[int]) – List of months
data (List[float]) – List of data values
metadata (CombinedMetadata) – CombinedMetadata object containing the metadata
uncertainty (Optional[List[float]]) –
- df
Pandas dataframe used to contain the time and data information.
- Type
pd.DataFrame
- metadata
Dictionary containing metadata. The only guaranteed entry is “history”
- Type
dict
- generate_dates(time_units: str) List[int]
Given a string specifying the required time units (something like days since 1800-01-01 00:00:00.0), generate a list of times from the time series corresponding to those units.
- Parameters
time_units (str) – String specifying the units to use for generating the times e.g. “days since 1800-01-01 00:00:00.0”
- Return type
List[int]
- get_rank_from_year_and_month(**kwargs)
- get_start_and_end_dates() Tuple[datetime, datetime]
Get the first and last dates in the dataset
- Returns
Start and end dates.
- Return type
Tuple[datetime, datetime]
- get_string_date_range() str
Create a string which specifies the date range covered by the
TimeSeriesMonthly
in the format YYYY.MM-YYYY.MM- Returns
String that specifies the date range covered
- Return type
str
- get_uncertainty(year: int, month: int) Optional[float]
Get the current uncertainty for a particular year and month
- Parameters
year (int) – Year for which the uncertainty is required.
month (int) – Month for which the uncertainty is required.
- Returns
Value for the specified year and month or None if it does not exist
- Return type
Optional[float]
- get_value(year: int, month: int) Optional[float]
Get the current value for a particular year and month
- Parameters
year (int) – Year for which the value is required.
month (int) – Month for which the value is required.
- Returns
Value for the specified year and month or None if it does not exist
- Return type
Optional[float]
- get_year_axis() List[float]
Return a year axis as decimal year. 1st January 1984 is 1984.00.
- Returns
List of dates expressed as a decimal year.
- Return type
List[float]
- make_annual(**kwargs)
- make_annual_by_selecting_month(**kwargs)
- static make_from_df(df: pandas.DataFrame, metadata: CombinedMetadata)
Create a
TimeSeriesMonthly
from a pandas data frame.- Parameters
df (pd.DataFrame) – Pandas dataframe containing columns ‘year’ ‘month’ and ‘data’ (optionally ‘uncertainty’)
metadata (dict) – Dictionary containing the metadata
- Returns
TimeSeriesMonthly
built from input components.- Return type
- rebaseline(**kwargs)
- write_csv(filename: Path, metadata_filename: Optional[Path] = None) None
Write the
TimeSeriesMonthly
to a csv file with the specified filename. The format used for writing is given by the BADC CSV format. This has a lot of upfront metadata before the data section. An option for writing a metadata file is also provided.- Parameters
filename (Path) – Path of the filename to write the data to
metadata_filename (Path) – Path of the filename to write the metadata to
- Return type
None
- zero_on_month(**kwargs)
- climind.data_types.timeseries.create_common_dataframe(dataframes: List[pandas.DataFrame], monthly: bool = False, annual: bool = False, irregular: bool = False) pandas.DataFrame
Given a list of dataframes make a single dataframe which has rows corresponding to all time steps in the input dataframes
- Parameters
dataframes (List[pd.DataFrame]) – List of dataframes which are to be used as the basis for the common data frame
monthly (bool) – Set to true for monthly data
annual (bool) – Set to true for annual data
irregular (bool) – Set to true for daily/irregular data
- Returns
Pandas dataframe with one row for each row in the input dataframes
- Return type
pd.DataFrame
- climind.data_types.timeseries.equalise_datasets(all_datasets: List[Union[TimeSeriesAnnual, TimeSeriesMonthly, TimeSeriesIrregular]]) pandas.DataFrame
Given a list of datasets
- Parameters
all_datasets (List[Union[TimeSeriesAnnual, TimeSeriesMonthly, TimeSeriesIrregular]]) – List of time series datasets whose data is to be combined in a single data frame. The data column from each data set will be combined into a single data from with each data column becoming a column identified by the “name” of the data set from its metadata.
- Returns
Pandas dataframe containing the data columns from all the input datasets.
- Return type
pd.DataFrame
- climind.data_types.timeseries.get_list_of_unique_variables(all_datasets: List[TimeSeriesAnnual]) List[str]
Given a list of
TimeSeriesAnnual
, get a list of the unique variable names represented in that list.- Parameters
all_datasets (List[TimeSeriesAnnual]) –
- Returns
List of the unique variable names.
- Return type
List[str]
- climind.data_types.timeseries.get_start_and_end_year(all_datasets: List[TimeSeriesAnnual]) Tuple[Optional[int], Optional[int]]
Given a list of
TimeSeriesAnnual
, extract the first year in any of the data sets and the last year in any of the data sets.- Parameters
all_datasets (List[TimeSeriesAnnual]) – List of datasets from which to extract the earliest first year and latest final year.
- Returns
Return the first and last years in the list of data sets
- Return type
Tuple[Optional[int], Optional[int]]
- climind.data_types.timeseries.log_activity(in_function: Callable) Callable
Decorator function to log name of function run and with which arguments. This aims to provide some traceability in the output.
- Parameters
in_function (Callable) – The function to be decorated
- Return type
Callable
- climind.data_types.timeseries.make_combined_series(all_datasets: List[TimeSeriesAnnual]) TimeSeriesAnnual
Combine a list of datasets into a single
TimeSeriesAnnual
by taking the arithmetic mean of all available datasets for each year. Merges the metadata for all the input time series.- Parameters
all_datasets (List[TimeSeriesAnnual]) – List of datasets to be combined
- Returns
TimeSeriesAnnual
which is the mean of all availabale datasets in each year.- Return type
- climind.data_types.timeseries.superset_dataset_list(all_datasets: List[TimeSeriesAnnual], variables: List[str]) List[List[TimeSeriesAnnual]]
Given a list of variables, create a list where each entry is a list of all
TimeSeriesAnnual
objects corresponding to the variable in that index position.- Parameters
all_datasets (List[TimeSeriesAnnual]) – List of datasets
variables (List[str]) – List of variable names
- Returns
List of lists of
TimeSeriesAnnual
.- Return type
List[List[TimeSeriesAnnual]]
- climind.data_types.timeseries.write_dataset_summary_file(all_datasets, csv_filename)
- climind.data_types.timeseries.write_dataset_summary_file_with_metadata(all_datasets: List[Union[TimeSeriesAnnual, TimeSeriesMonthly, TimeSeriesIrregular]], csv_filename: Union[str, Path]) None
Given a list of time series data sets, write them out in a single BADC CSV format csv file with complete metadata.
- Parameters
all_datasets (List[Union[TimeSeriesAnnual, TimeSeriesMonthly, TimeSeriesIrregular]]) – A list of time series which are going to be equalised
csv_filename (str or Path) – The name of the file to which the summary will be written.
- Return type
None