climind.data_types package
Submodules
climind.data_types.grid module
- class climind.data_types.grid.GridAnnual(input_data, metadata: CombinedMetadata)[source]
Bases:
objectA
GridAnnualcombines an xarray Dataset with aCombinedMetadatato bring together data and metadata in one object. It represents annual averages of data.Create an annual gridded data set from an xarray Dataset and
CombinedMetadataobject.- Parameters:
input_data (xa.Dataset) – xarray dataset
metadata (CombinedMetadata) – CombinedMetadata object
- get_end_year() int[source]
Get the last year in the dataset
- Returns:
Last year in the data set
- Return type:
int
- get_start_year() int[source]
Get the first year in the dataset
- Returns:
First year in the dataset
- Return type:
int
- get_year_range(start_year: int, end_year: int)[source]
Select a range of consecutive years from the data set.
- Parameters:
start_year (int) – start year
end_year (int) – end year
- Returns:
Returns a
GridAnnualcontaining only data within the specified year range.- Return type:
- rank()[source]
Return a data set where the values are the ranks of each grid cell value.
- Returns:
Return a
GridAnnualcontaining the values as ranks from highest (1) to lowest.- Return type:
- running_average(n_year: int)[source]
Calculate an n_year running average of the data in the dataset
- Parameters:
n_year (int) – Number of years for which the running average is calculated
- Returns:
Annual gridded dataset which contains the running averages
- Return type:
- select_year_range(start_year: int, end_year: int)[source]
Select a particular range of consecutive years from the data set and throw away the rest.
- Parameters:
start_year (int) – First year of selection
end_year (int) – Final year of selction
- Returns:
Returns a
GridAnnualcontaining only data within the specified year range.- Return type:
- update_history(message: str) None[source]
Update the history metadata
- Parameters:
message (str) – Message to be added to history
- Return type:
None
- write_grid(filename: Path, metadata_filename: Path = None, name: str = None) None[source]
Write the grid to file.
- Parameters:
filename (Path) – Filename to write grid to
metadata_filename (Path) – Filename to write metadata to
name (str) – Optional name to give the data set being written. Note that names should be unique in any data archive.
- Return type:
None
- class climind.data_types.grid.GridMonthly(input_data: xarray.Dataset, metadata: CombinedMetadata)[source]
Bases:
objectA
GridMonthlycombines an xarray Dataset with aCombinedMetadatato bring together data and metadata in one object. It represents monthly averages of data on a regular grid.Create a :class:’.GridMonthly` object from an xarray Dataset and a
CombinedMetadataobject.- Parameters:
input_data (xa.Dataset) – xarray dataset
metadata (CombinedMetadata) – CombinedMetadata object
- calculate_regional_average(regions, region_number, land_only=True) TimeSeriesMonthly[source]
Calculate a regional average from the grid. The region is specified by a geopandas Geodataframe and the index (region_number) of the chosen shape. By default, the output is masked to land areas only, this can be switched off by setting land_only to False.
- Parameters:
regions (Geodataframe) – geopandas Geodataframe specifying the region to be average over
region_number (int) – the index of the particular region in the Geodataframe
land_only (bool) – By defauly output is masked to land areas only, to calculate a full area average set land_only to False
- Returns:
Returns time series of area averages.
- Return type:
ts.TimeSeriesMonthly
- calculate_regional_average_missing(regions, region_number, threshold=0.3, land_only=True, ocean_only=False) TimeSeriesMonthly[source]
Calculate a regional average from the grid. The region is specified by a geopandas Geodataframe and the index (region_number) of the chosen shape. By default, the output is masked to land areas only, this can be switched off by setting land_only to False.
- Parameters:
regions (Geodataframe) – geopandas Geodataframe specifying the region to be average over
region_number (int) – the index of the particular region in the Geodataframe
threshold (float) – If the area covered by data in the region drops below this threshold then NaN is returned.
land_only (bool) – By defauly output is masked to land areas only, to calculate a full area average set land_only to False
- Returns:
Returns time series of area averages.
- Return type:
ts.TimeSeriesMonthly
- calculate_time_mean(cumulative=False)[source]
Calculate the time mean of the map
- Returns:
Returns a
GridMonthlycontaining the time mean of the data.- Return type:
- get_end_year() int[source]
Get the last year in the dataset
- Returns:
Last year in the dataset
- Return type:
int
- get_last_month() datetime[source]
Get the date of the last month in the dataset
- Returns:
Date of the last month in the dataset
- Return type:
datetime
- get_start_year() int[source]
Get the first year in the dataset
- Returns:
First year in the dataset
- Return type:
int
- make_annual()[source]
Calculate an annual average from a monthly grid by taking the arithmetic mean of available monthly anomalies.
- Returns:
Return annual average of the grid
- Return type:
- rebaseline(first_year: int, final_year: int) xarray.Dataset[source]
Change the baseline of the data to the period between first_year and final_year by subtracting the average of the available data between those two years (inclusive).
- Parameters:
first_year (int) – First year of climatology period
final_year (int) – Final year of climatology period
- Returns:
Changes the dataset in place, but also returns the dataset if needed
- Return type:
xa.Dataset
- select_period(start_year: int, start_month: int, end_year: int, end_month: int)[source]
Select a period from the grid specifed by start year and month and end year and month, inclusive.
- Parameters:
start_year (int) – Year of start date
start_month (int) – Month of start date
end_year (int) – Year of end date
end_month (int) – Month of end date
- Returns:
Returns a
GridMonthlycontaining only data within the specified date range.- Return type:
- select_year_and_month(year: int, month: int)[source]
Select a particular month from the data set and throw away the rest.
- Parameters:
year (int) – Year of selection
month (int) – Month of selection
- Returns:
Returns a
GridMonthlycontaining only data within the specified year range.- Return type:
- climind.data_types.grid.get_1d_transfer(zero_point_original: float, grid_space_original: float, zero_point_target: float, grid_space_target: float, index_in_original: int) tuple[source]
Find the overlapping grid spacings for a new grid based on an index in the old grid
- Parameters:
zero_point_original (float) – longitude or latitude of the zero-indexed grid cell
grid_space_original (float) – grid spacing in degrees
zero_point_target (float) – longitude or latitude of the zero-indexed grid cells in the targe grid
grid_space_target (float) – grid spacing in degrees of the target grid
index_in_original (int) – index of the gridcell in the original grid
- Returns:
Returns, the longitude of the first grid cell in the new grid, the number of steps, and the first and last indices on the new grid.
- Return type:
tuple
- climind.data_types.grid.get_start_and_end_year(all_datasets: List[GridAnnual]) Tuple[int, int][source]
Given a list of
GridAnnualdatasets, find the earliest start year and the latest end year- Parameters:
all_datasets (List[GridAnnual]) – List of datasets for which we want to find the first and last year
- Return type:
Tuple[int, int]
- climind.data_types.grid.make_standard_grid(out_grid: numpy.ndarray, start_date: datetime, freq: str, number_of_times: int) xarray.Dataset[source]
Make the standard 5x5 grid from a numpy array, start date, temporal frequency and number of time steps.
- Parameters:
out_grid (np.ndarray) – Numpy array containing the data. Shape should be (number_of_times, 36, 72)
start_date (datetime) – Date of the first time step
freq (str) – Temporal frequency
number_of_times (int) – Number of time steps, should match the first dimension of the out_grid
- Returns:
xarray Dataset containing the data in out_grid with the specified temporal frequency and number of time steps
- Return type:
xa.Dataset
- climind.data_types.grid.make_xarray(target_grid, times, latitudes, longitudes, variable: str = 'tas_mean') xarray.Dataset[source]
Make a xarray Dataset for a regular lat-lon grid from a numpy grid (ntime, nlat, nlon), and arrays of time (ntime), latitude (nlat) and longitude (nlon).
- Parameters:
target_grid (np.ndarray) – numpy array of shape (ntime, nlat, nlon)
times (np.ndarray) – Array of times, shape (ntime)
latitudes (np.ndarray) – Array of latitudes, shape (nlat)
longitudes (np.ndarray) – Array of longitudes, shape (nlon)
variable (str) – Variable name
- Returns:
Dataset built from the input components
- Return type:
xa.Dataset
- climind.data_types.grid.median_of_datasets(all_datasets: List[GridAnnual]) GridAnnual[source]
Calculate the median of a list of
GridAnnualdata sets- Parameters:
all_datasets (List[GridAnnual]) – List of
GridAnnualdatasets from which the medians will be calculated.- Return type:
- climind.data_types.grid.process_datasets(all_datasets: List[GridAnnual], grid_type: str) GridAnnual[source]
Calculate the median or range (depending on selected type) of a list of
GridAnnualdata sets. Medians are calculated on a grid cell by grid cell basis based on all available data in the list of data sets.- Parameters:
all_datasets (List[GridAnnual]) – list of GridAnnual data sets
grid_type (str) – Either ‘median’ or ‘range’
- Returns:
Data set containing the median (or half-range) values from all the data sets supplied
- Return type:
- climind.data_types.grid.range_of_datasets(all_datasets: List[GridAnnual]) GridAnnual[source]
Calculate the half-range of a list of
GridAnnualdata sets- Parameters:
all_datasets (List[GridAnnual]) – List of
GridAnnualdatasets from which the ranges will be calculated.- Return type:
- climind.data_types.grid.rank_array(in_array: numpy.ndarray) int[source]
Rank array
- Parameters:
in_array (np.ndarray) – Array to be ranked
- Return type:
int
- climind.data_types.grid.simple_regrid(ingrid: numpy.ndarray, lon0: float, lat0: float, dx: float, target_dy: float) numpy.ndarray[source]
Perform a simple regridding, using a simple average of grid cells from the original grid that fall within the target grid cell.
- Parameters:
ingrid (np.ndarray) – Starting grid which we want to regrid
lon0 (float) – Longitude of zero-indexed grid cell in longitudinal direction
lat0 (float) – Latitude of zero-indexed grid cell in latitudinal direction
dx (float) – Grid spacing in degrees
target_dy (float) – Target grid spacing
- Returns:
Returns regridded array.
- Return type:
np.ndarray
climind.data_types.timeseries module
- class climind.data_types.timeseries.AveragesCollection(all_datasets)[source]
Bases:
objectA simple class to perform specific tasks on lists of
TimeSeriesAnnual
- class climind.data_types.timeseries.TimeSeries(metadata: CombinedMetadata = None)[source]
Bases:
ABCA base class for representing time series data sets. Note that this class should not generally be used and only its subclasses
TimeSeriesMonthly,TimeSeriesAnnualandTimeSeriesIrregularshould be used. This class contains shared functionality from these classes but does not work on its own.- add_offset(offset: float) None[source]
Add an offset to the data set.
- Parameters:
offset (float) – offset to be added to all values in the data set.
- Return type:
None
- get_first_and_last_year() Tuple[int, int][source]
Get the first and last year in the series
- Returns:
first and last year
- Return type:
Tuple[int, int]
- abstractmethod get_string_date_range() str[source]
Create a string which specifies the date range covered by the time series
- Return type:
str
- manually_set_baseline(baseline_start_year: int, baseline_end_year: int) None[source]
Manually set baseline. This changes the baseline in the metadata, but does not change the data themselves.
- Parameters:
baseline_start_year (int) – Start of baseline period
baseline_end_year (int) – End of baseline period
- Return type:
None
- select_year_range(start_year: int, end_year: int)[source]
Select consecutive years in the specified range and throw away the rest.
- Parameters:
start_year (int) – First year in the selected range
end_year (int) – Final year in the selected range
- Returns:
Return time series which only contains years in the specified range
- Return type:
- update_history(message: str) None[source]
Update the history metadata
- Parameters:
message (str) – Message to be added to history
- Return type:
None
- write_generic_csv(filename: Path, metadata_filename: Path, monthly: bool, uncertainty: bool, irregular: bool, columns_to_write: List[str]) None[source]
Write the dataset out into csv format
- Parameters:
filename (Path) – Path of the csv file to which the data will be written.
metadata_filename (Path) – Path of the json file to which the data will be written.
monthly (bool) – Set to True for monthly data
uncertainty (bool) – Set to True to print uncertainties
irregular (bool) – Set to True for irregular data
columns_to_write (List[str]) – List of the columns from the dataframe to be written to the data file
- Return type:
None
- class climind.data_types.timeseries.TimeSeriesAnnual(years: list, data: list, metadata=None, uncertainty: list | None = None)[source]
Bases:
TimeSeriesA
TimeSeriesAnnualcombines a pandas Dataframe with aCombinedMetadatato bring together data and metadata in one object. It represents annual averages of data.Create
TimeSeriesAnnualobject from its components.- Parameters:
years (list) – List of years
data (list) – List of data values
metadata (CombinedMetadata) – Dictionary containing the metadata
- df
Pandas dataframe containing the time and data information
- Type:
pd.DataFrame
- metadata
Dictionary containing the metadata. The only guaranteed entry is ‘history’
- Type:
dict
- add_year(year: int, value: float, uncertainty: float = None) None[source]
Add a year of data.
- Parameters:
year (int) – the year to be added
value (float) – the data value to be added
uncertainty – the uncertainty of the data value to be added (optional)
- Return type:
None
- generate_dates(time_units: str) List[datetime][source]
Given a string specifying the required time units (something like days since 1800-01-01 00:00:00.0), generate a list of times from the time series corresponding to those units.
- Parameters:
time_units (str) – String specifying the units to use for generating the times e.g. “days since 1800-01-01 00:00:00.0”
- Returns:
List of dates
- Return type:
List[datetime]
- get_rank_from_year(year: int) int | None[source]
Given a year, extract the rank of the data for that year. Ties are given the same rank, which is the lowest rank of the group.
- Parameters:
year (int) – Year for which we want the rank
- Returns:
Rank of specified year or None if year is not available.
- Return type:
Optional[int]
- get_string_date_range() str[source]
Create a string which specifies the date range covered by the
TimeSeriesAnnualin the format YYYY-YYYY- Returns:
String that specifies the date range covered
- Return type:
str
- get_uncertainty_from_year(year: int) float | None[source]
Get the data value for a specified year.
- Parameters:
year (int) – Year for which a value is desired
- Returns:
Uncertainty for the year, or None if year is not in the data set
- Return type:
Optional[float]
- get_value_from_year(year: int) float | None[source]
Get the data value for a specified year.
- Parameters:
year (int) – Year for which a value is desired
- Returns:
Value for the year, or None if year is not in the data set
- Return type:
Optional[float]
- get_year_axis() List[float][source]
Return a year axis with dates represented as decimal years.
- Returns:
List of dates as decimal years.
- Return type:
List[float]
- get_year_from_rank(rank: int) List[int][source]
Given a particular rank, extract a list of years which match that rank. Returns a list because years can (theoretically) be tied with each other. Rank 1 corresponds to the highest value in the dataset.
- Parameters:
rank (int) – Rank for which we want the year which has that rank
- Returns:
List of years that have the specified rank
- Return type:
List[int]
- lowess(number_of_points: int = 10)[source]
Lowess smooth the series
- Parameters:
number_of_points (int) – Number of points to use in the lowess smoother
- static make_from_df(df: pandas.DataFrame, metadata: CombinedMetadata)[source]
Create a
TimeSeriesAnnualfrom a pandas data frame.- Parameters:
df (pd.DataFrame) – Pandas dataframe containing columns ‘year’ and ‘data’
metadata (dict) – Dictionary containing the metadata
- Returns:
TimeSeriesAnnualcreated from the elements in the dataframe and metadata.- Return type:
- rebaseline(baseline_start_year: int, baseline_end_year: int) None[source]
Shift the
TimeSeriesAnnualto a new baseline, specified by start and end years (inclusive).- Parameters:
baseline_start_year (int) – First year of the climatology period
baseline_end_year (int) – Last year of the climatology period
- Returns:
Action occurs in place.
- Return type:
None
- running_lowess(number_of_points: int = 10)[source]
Lowess smooth time point t by running a lowess smoother from t=0 to t=t. For a regular lowess smoother see method lowess.
- Parameters:
number_of_points (int) – Number of points to use in the lowess smoother
- running_mean(run_length: int, centred: bool = False)[source]
Calculate running mean of the data for a specified run length
- Parameters:
run_length (int) – length of the run
centred (bool) – Set to True to centre the times associated to the data points, otherwise the time used will be the last time in the n-year run.
- Returns:
TimeSeriesAnnualcontaining running averages of length run_length. Where there are too few years to calculate a running average, np.nan appears in the data column of the data frame- Return type:
- running_stdev(run_length: int, centred: bool = False)[source]
Calculate running standard deviation of the data for a specified run length
- Parameters:
run_length (int) – length of the run
centred (bool) – Set to True to centre the times associated to the data points, otherwise the time used will be the last time in the n-year run.
- Returns:
TimeSeriesAnnualcontaining running standard deviation of length run_length. Where there are too few years to calculate a running average, np.nan appears in the data column of the data frame- Return type:
- running_trend(run_length: int)[source]
Calculate a smoothed series by fitting a straight line to the past 30 years of data and taking the final point as the data value instead
- Parameters:
run_length (int) – Number of years for which the trend should be calculated
- Returns:
TimeSeriesAnnualcontaining the end point of trends of length run_length. Where there are too few years to calculate a trend, np.nan appears in the data column of the data frame- Return type:
- select_decade(end_year: int = 0)[source]
Select every tenth year from the
TimesSeriesAnnual, the last digit of the years can be selected using the end_year keyword argument. The default is to select all years ending in 0, e.g. 1850, 1860, 1870… 2020.- Parameters:
end_year (int) – Last digit of the years to be selected. e.g. set to 0 to pick 1850, 1860… 2010, 2020 etc.
- Returns:
TimeSeriesAnnualcontaining every tenth year- Return type:
- write_csv(filename, metadata_filename=None)[source]
Write the timeseries to a csv file with the specified filename. The format used for writing is given by the BADC CSV format. This has a lot of upfront metadata before the data section. An option for writing a metadata file is also provided.
- Parameters:
filename (Path) – Path of the filename to write the data to
metadata_filename (Path) – Path of the filename to write the metadata to
- Return type:
None
- class climind.data_types.timeseries.TimeSeriesIrregular(years: List[int], months: List[int], days: List[int], data: List[float], metadata: CombinedMetadata = None, uncertainty: List[float] | None = None)[source]
Bases:
TimeSeriesA
TimeSeriesIrregularcombines a pandas Dataframe with aCombinedMetadatato bring together data and metadata in one object. It represents non-monthly, non-annual averages of data such as weekly, or 5-day averages.Create
TimeSeriesIrregularobject.- Parameters:
years (List[int]) – List of integers specifying the year of each data point
months (List[int]) – List of integers specifying the month of each data point
days (List[int]) – List of integers specifying the day of each data point
data (List[float]) – List of floats with the data values
metadata (CombinedMetadata) – CombinedMetadata object holding the metadata for the dataset
uncertainty (List[float]) – List of floats with the uncertainty values for each data point
- fill_daily() None[source]
Ensure that a daily time series has data for every day between the start and end years.
- Return type:
None
- generate_dates(time_units: str) List[int][source]
Given a string specifying the required time units (something like days since 1800-01-01 00:00:00.0), generate a list of times from the time series corresponding to those units.
- Parameters:
time_units (str) – String specifying the units to use for generating the times e.g. “days since 1800-01-01 00:00:00.0”
- Return type:
List[int]
- get_start_and_end_dates() Tuple[datetime, datetime][source]
Get the first and last dates in the dataset
- Return type:
Tuple[datetime, datetime]
- get_string_date_range() str[source]
Create a string which specifies the date range covered by the
TimeSeriesIrregularin the format YYYY.MM.DD-YYYY.MM.DD- Returns:
String that specifies the date range covered
- Return type:
str
- get_year_axis() List[float][source]
Return a year in which all dates are represented as decimal years. January 1st 1984 is 1984.00.
- Returns:
List of dates represented as decimal years.
- Return type:
List[float]
- lowess(number_of_points: int = 60)[source]
Lowess smooth the series
- Parameters:
number_of_points (int) – Number of points to use in the lowess smoother
- make_monthly()[source]
Calculate a
TimeSeriesMonthlyfrom theTimeSeriesIrregular. The monthly average is calculated from the mean of values within the month.- Returns:
Return a
TimeSeriesMonthlycontaining the monthly averages.- Return type:
- rebaseline(baseline_start_year, baseline_end_year) None[source]
Shift the time series to a new baseline, specified by start and end years (inclusive). Each day is rebaselined separately, allowing for changes in seasonality. If years are incomplete, this might give a different result to the annual and monthly versions.
- Parameters:
baseline_start_year (int) – The first year of the climatology period
baseline_end_year (int) – The last year of the climatology period
- Returns:
Action occurs in place
- Return type:
None
- write_csv(filename: Path, metadata_filename: Path = None) None[source]
Write the timeseries to a csv file with the specified filename. The format used for writing is given by the BADC CSV format. This has a lot of upfront metadata before the data section. An option for writing a metadata file is also provided.
- Parameters:
filename (Path) – Path of the filename to write the data to
metadata_filename (Path) – Path of the filename to write the metadata to
- Return type:
None
- class climind.data_types.timeseries.TimeSeriesMonthly(years: List[int], months: List[int], data: List[float], metadata: CombinedMetadata = None, uncertainty: List[float] | None = None)[source]
Bases:
TimeSeriesA
TimeSeriesMonthlycombines a pandas Dataframe with aCombinedMetadatato bring together data and metadata in one object. It represents monthly averages of data.Create
TimeSeriesMonthlyobject.- Parameters:
years (List[int]) – List of years
months (List[int]) – List of months
data (List[float]) – List of data values
metadata (CombinedMetadata) – CombinedMetadata object containing the metadata
uncertainty (Optional[List[float]])
- df
Pandas dataframe used to contain the time and data information.
- Type:
pd.DataFrame
- metadata
Dictionary containing metadata. The only guaranteed entry is “history”
- Type:
dict
- generate_dates(time_units: str) List[int][source]
Given a string specifying the required time units (something like days since 1800-01-01 00:00:00.0), generate a list of times from the time series corresponding to those units.
- Parameters:
time_units (str) – String specifying the units to use for generating the times e.g. “days since 1800-01-01 00:00:00.0”
- Return type:
List[int]
- get_rank_from_year_and_month(year: int, month: int, versus_all_months=False) int | None[source]
Given a year and month, extract the rank of the data for that month. Ties are given the same rank, which is the lowest rank of the group. Default behaviour is to rank the month against the same month in all other years. Setting all to True as a keyword argument ranks the month against all other months in all other years.
- Parameters:
year (int) – Year of year-month pair for which we want the rank
month (int) – Month of year-month pair for which we want the rank
versus_all_months (bool) – If set then the ranking is done for the monthly value relative to all other months.
- Returns:
Returns the rank of the specified year-month pair as compared to the same month in all other years. If “versus_all_months” is set then returns rank of the anomaly for a particular year and month ranked against all other years and months.
- Return type:
int
- get_start_and_end_dates() Tuple[datetime, datetime][source]
Get the first and last dates in the dataset
- Returns:
Start and end dates.
- Return type:
Tuple[datetime, datetime]
- get_string_date_range() str[source]
Create a string which specifies the date range covered by the
TimeSeriesMonthlyin the format YYYY.MM-YYYY.MM- Returns:
String that specifies the date range covered
- Return type:
str
- get_uncertainty(year: int, month: int) float | None[source]
Get the current uncertainty for a particular year and month
- Parameters:
year (int) – Year for which the uncertainty is required.
month (int) – Month for which the uncertainty is required.
- Returns:
Value for the specified year and month or None if it does not exist
- Return type:
Optional[float]
- get_value(year: int, month: int) float | None[source]
Get the current value for a particular year and month
- Parameters:
year (int) – Year for which the value is required.
month (int) – Month for which the value is required.
- Returns:
Value for the specified year and month or None if it does not exist
- Return type:
Optional[float]
- get_year_axis() List[float][source]
Return a year axis as decimal year. 1st January 1984 is 1984.00.
- Returns:
List of dates expressed as a decimal year.
- Return type:
List[float]
- lowess(number_of_points: int = 60)[source]
Lowess smooth the series
- Parameters:
number_of_points (int) – Number of points to use in the lowess smoother
- make_annual(cumulative: bool = False)[source]
Calculate a
TimeSeriesAnnualfrom theTimeSeriesMonthly. The annual average is calculated from the mean of available monthly values- Parameters:
cumulative (bool) – Set to true to sum rather than average the monthly values to get the annual value.
- Returns:
Return a
TimeSeriesAnnualobject containing the annual averages.- Return type:
- make_annual_by_selecting_month(month: int)[source]
Calculate a
TimeSeriesAnnualfrom theTimeSeriesMonthly. The annual value is taken from one of the monthly values specified by the user.- Returns:
Return a
TimeSeriesAnnualobject containing only the selected month from each year.- Return type:
- static make_from_df(df: pandas.DataFrame, metadata: CombinedMetadata)[source]
Create a
TimeSeriesMonthlyfrom a pandas data frame.- Parameters:
df (pd.DataFrame) – Pandas dataframe containing columns ‘year’ ‘month’ and ‘data’ (optionally ‘uncertainty’)
metadata (dict) – Dictionary containing the metadata
- Returns:
TimeSeriesMonthlybuilt from input components.- Return type:
- rebaseline(baseline_start_year, baseline_end_year) None[source]
Shift the time series to a new baseline, specified by start and end years (inclusive). Each month is rebaselined separately, allowing for changes in seasonality. If years are incomplete, this might give a different result to the annual version.
- Parameters:
baseline_start_year (int) – The first year of the climatology period
baseline_end_year (int) – The last year of the climatology period
- Returns:
Action occurs in place
- Return type:
None
- running_mean(run_length: int, centred: bool = False)[source]
Calculate running mean of the data for a specified run length
- Parameters:
run_length (int) – length of the run
centred (bool) – Set to True to centre the times associated to the data points, otherwise the time used will be the last time in the n-year run.
- Returns:
TimeSeriesMonthlycontaining running averages of length run_length. Where there are too few years to calculate a running average, np.nan appears in the data column of the data frame- Return type:
- write_csv(filename: Path, metadata_filename: Path = None) None[source]
Write the
TimeSeriesMonthlyto a csv file with the specified filename. The format used for writing is given by the BADC CSV format. This has a lot of upfront metadata before the data section. An option for writing a metadata file is also provided.- Parameters:
filename (Path) – Path of the filename to write the data to
metadata_filename (Path) – Path of the filename to write the metadata to
- Return type:
None
- zero_on_month(year: int, month: int) None[source]
Zero data set on the value for a single month in a single year by substracting the value for that month from all values in the dataset.
- Parameters:
year (int) – Year of the month on which the data will be zeroed.
month (int) – Month of the month on which the data will be zeroed.
- Return type:
None
- climind.data_types.timeseries.create_common_dataframe(dataframes: List[pandas.DataFrame], monthly: bool = False, annual: bool = False, irregular: bool = False) pandas.DataFrame[source]
Given a list of dataframes make a single dataframe which has rows corresponding to all time steps in the input dataframes
- Parameters:
dataframes (List[pd.DataFrame]) – List of dataframes which are to be used as the basis for the common data frame
monthly (bool) – Set to true for monthly data
annual (bool) – Set to true for annual data
irregular (bool) – Set to true for daily/irregular data
- Returns:
Pandas dataframe with one row for each row in the input dataframes
- Return type:
pd.DataFrame
- climind.data_types.timeseries.equalise_datasets(all_datasets: List[TimeSeriesAnnual | TimeSeriesMonthly | TimeSeriesIrregular], uncertainty: bool = False) pandas.DataFrame[source]
Given a list of datasets
- Parameters:
all_datasets (List[Union[TimeSeriesAnnual, TimeSeriesMonthly, TimeSeriesIrregular]]) – List of time series datasets whose data is to be combined in a single data frame. The data column from each data set will be combined into a single data from with each data column becoming a column identified by the “name” of the data set from its metadata.
- Returns:
Pandas dataframe containing the data columns from all the input datasets.
- Return type:
pd.DataFrame
- climind.data_types.timeseries.get_list_of_unique_variables(all_datasets: List[TimeSeriesAnnual]) List[str][source]
Given a list of
TimeSeriesAnnual, get a list of the unique variable names represented in that list.- Parameters:
all_datasets (List[TimeSeriesAnnual])
- Returns:
List of the unique variable names.
- Return type:
List[str]
- climind.data_types.timeseries.get_start_and_end_year(all_datasets: List[TimeSeriesAnnual]) Tuple[int | None, int | None][source]
Given a list of
TimeSeriesAnnual, extract the first year in any of the data sets and the last year in any of the data sets.- Parameters:
all_datasets (List[TimeSeriesAnnual]) – List of datasets from which to extract the earliest first year and latest final year.
- Returns:
Return the first and last years in the list of data sets
- Return type:
Tuple[Optional[int], Optional[int]]
- climind.data_types.timeseries.log_activity(in_function: Callable) Callable[source]
Decorator function to log name of function run and with which arguments. This aims to provide some traceability in the output.
- Parameters:
in_function (Callable) – The function to be decorated
- Return type:
Callable
- climind.data_types.timeseries.make_combined_series(all_datasets: List[TimeSeriesAnnual], augmented_uncertainty=True) TimeSeriesAnnual[source]
Combine a list of datasets into a single
TimeSeriesAnnualby taking the arithmetic mean of all available datasets for each year. Merges the metadata for all the input time series.- Parameters:
all_datasets (List[TimeSeriesAnnual]) – List of datasets to be combined
augmented_uncertainty (bool) – Set to True if you want to add an additional uncertainty from the baseline
- Returns:
TimeSeriesAnnualwhich is the mean of all availabale datasets in each year.- Return type:
- climind.data_types.timeseries.superset_dataset_list(all_datasets: List[TimeSeriesAnnual], variables: List[str]) List[List[TimeSeriesAnnual]][source]
Given a list of variables, create a list where each entry is a list of all
TimeSeriesAnnualobjects corresponding to the variable in that index position.- Parameters:
all_datasets (List[TimeSeriesAnnual]) – List of datasets
variables (List[str]) – List of variable names
- Returns:
List of lists of
TimeSeriesAnnual.- Return type:
List[List[TimeSeriesAnnual]]
- climind.data_types.timeseries.write_dataset_summary_file_with_metadata(all_datasets: List[TimeSeriesAnnual | TimeSeriesMonthly | TimeSeriesIrregular], csv_filename: str | Path) None[source]
Given a list of time series data sets, write them out in a single BADC CSV format csv file with complete metadata.
- Parameters:
all_datasets (List[Union[TimeSeriesAnnual, TimeSeriesMonthly, TimeSeriesIrregular]]) – A list of time series which are going to be equalised
csv_filename (str or Path) – The name of the file to which the summary will be written.
- Return type:
None
Module contents
There are two main data types implemented in this package: timeseries and grids. In each of those two cases, the data set consists of a data-carrying part and a metadata part. For timeseries, the data-carrying part is a pandas dataframe and for a grid, it’s an xarray dataset.