climind.data_types package

Submodules

climind.data_types.grid module

class climind.data_types.grid.GridAnnual(input_data, metadata: CombinedMetadata)[source]

Bases: object

A GridAnnual combines an xarray Dataset with a CombinedMetadata to bring together data and metadata in one object. It represents annual averages of data.

Create an annual gridded data set from an xarray Dataset and CombinedMetadata object.

Parameters:
  • input_data (xa.Dataset) – xarray dataset

  • metadata (CombinedMetadata) – CombinedMetadata object

get_end_year() int[source]

Get the last year in the dataset

Returns:

Last year in the data set

Return type:

int

get_start_year() int[source]

Get the first year in the dataset

Returns:

First year in the dataset

Return type:

int

get_year_range(start_year: int, end_year: int)[source]

Select a range of consecutive years from the data set.

Parameters:
  • start_year (int) – start year

  • end_year (int) – end year

Returns:

Returns a GridAnnual containing only data within the specified year range.

Return type:

GridAnnual

rank()[source]

Return a data set where the values are the ranks of each grid cell value.

Returns:

Return a GridAnnual containing the values as ranks from highest (1) to lowest.

Return type:

GridAnnual

running_average(n_year: int)[source]

Calculate an n_year running average of the data in the dataset

Parameters:

n_year (int) – Number of years for which the running average is calculated

Returns:

Annual gridded dataset which contains the running averages

Return type:

GridAnnual

select_year_range(start_year: int, end_year: int)[source]

Select a particular range of consecutive years from the data set and throw away the rest.

Parameters:
  • start_year (int) – First year of selection

  • end_year (int) – Final year of selction

Returns:

Returns a GridAnnual containing only data within the specified year range.

Return type:

GridAnnual

update_history(message: str) None[source]

Update the history metadata

Parameters:

message (str) – Message to be added to history

Return type:

None

write_grid(filename: Path, metadata_filename: Path = None, name: str = None) None[source]

Write the grid to file.

Parameters:
  • filename (Path) – Filename to write grid to

  • metadata_filename (Path) – Filename to write metadata to

  • name (str) – Optional name to give the data set being written. Note that names should be unique in any data archive.

Return type:

None

class climind.data_types.grid.GridMonthly(input_data: xarray.Dataset, metadata: CombinedMetadata)[source]

Bases: object

A GridMonthly combines an xarray Dataset with a CombinedMetadata to bring together data and metadata in one object. It represents monthly averages of data on a regular grid.

Create a :class:’.GridMonthly` object from an xarray Dataset and a CombinedMetadata object.

Parameters:
  • input_data (xa.Dataset) – xarray dataset

  • metadata (CombinedMetadata) – CombinedMetadata object

calculate_regional_average(regions, region_number, land_only=True) TimeSeriesMonthly[source]

Calculate a regional average from the grid. The region is specified by a geopandas Geodataframe and the index (region_number) of the chosen shape. By default, the output is masked to land areas only, this can be switched off by setting land_only to False.

Parameters:
  • regions (Geodataframe) – geopandas Geodataframe specifying the region to be average over

  • region_number (int) – the index of the particular region in the Geodataframe

  • land_only (bool) – By defauly output is masked to land areas only, to calculate a full area average set land_only to False

Returns:

Returns time series of area averages.

Return type:

ts.TimeSeriesMonthly

calculate_regional_average_missing(regions, region_number, threshold=0.3, land_only=True, ocean_only=False) TimeSeriesMonthly[source]

Calculate a regional average from the grid. The region is specified by a geopandas Geodataframe and the index (region_number) of the chosen shape. By default, the output is masked to land areas only, this can be switched off by setting land_only to False.

Parameters:
  • regions (Geodataframe) – geopandas Geodataframe specifying the region to be average over

  • region_number (int) – the index of the particular region in the Geodataframe

  • threshold (float) – If the area covered by data in the region drops below this threshold then NaN is returned.

  • land_only (bool) – By defauly output is masked to land areas only, to calculate a full area average set land_only to False

Returns:

Returns time series of area averages.

Return type:

ts.TimeSeriesMonthly

calculate_time_mean(cumulative=False)[source]

Calculate the time mean of the map

Returns:

Returns a GridMonthly containing the time mean of the data.

Return type:

GridMonthly

get_end_year() int[source]

Get the last year in the dataset

Returns:

Last year in the dataset

Return type:

int

get_last_month() datetime[source]

Get the date of the last month in the dataset

Returns:

Date of the last month in the dataset

Return type:

datetime

get_start_year() int[source]

Get the first year in the dataset

Returns:

First year in the dataset

Return type:

int

make_annual()[source]

Calculate an annual average from a monthly grid by taking the arithmetic mean of available monthly anomalies.

Returns:

Return annual average of the grid

Return type:

GridAnnual

rebaseline(first_year: int, final_year: int) xarray.Dataset[source]

Change the baseline of the data to the period between first_year and final_year by subtracting the average of the available data between those two years (inclusive).

Parameters:
  • first_year (int) – First year of climatology period

  • final_year (int) – Final year of climatology period

Returns:

Changes the dataset in place, but also returns the dataset if needed

Return type:

xa.Dataset

select_period(start_year: int, start_month: int, end_year: int, end_month: int)[source]

Select a period from the grid specifed by start year and month and end year and month, inclusive.

Parameters:
  • start_year (int) – Year of start date

  • start_month (int) – Month of start date

  • end_year (int) – Year of end date

  • end_month (int) – Month of end date

Returns:

Returns a GridMonthly containing only data within the specified date range.

Return type:

GridMonthly

select_year_and_month(year: int, month: int)[source]

Select a particular month from the data set and throw away the rest.

Parameters:
  • year (int) – Year of selection

  • month (int) – Month of selection

Returns:

Returns a GridMonthly containing only data within the specified year range.

Return type:

GridMonthly

update_history(message: str) None[source]

Update the history metadata with a message.

Parameters:

message (str) – Message to be added to history

Return type:

None

climind.data_types.grid.get_1d_transfer(zero_point_original: float, grid_space_original: float, zero_point_target: float, grid_space_target: float, index_in_original: int) tuple[source]

Find the overlapping grid spacings for a new grid based on an index in the old grid

Parameters:
  • zero_point_original (float) – longitude or latitude of the zero-indexed grid cell

  • grid_space_original (float) – grid spacing in degrees

  • zero_point_target (float) – longitude or latitude of the zero-indexed grid cells in the targe grid

  • grid_space_target (float) – grid spacing in degrees of the target grid

  • index_in_original (int) – index of the gridcell in the original grid

Returns:

Returns, the longitude of the first grid cell in the new grid, the number of steps, and the first and last indices on the new grid.

Return type:

tuple

climind.data_types.grid.get_start_and_end_year(all_datasets: List[GridAnnual]) Tuple[int, int][source]

Given a list of GridAnnual datasets, find the earliest start year and the latest end year

Parameters:

all_datasets (List[GridAnnual]) – List of datasets for which we want to find the first and last year

Return type:

Tuple[int, int]

climind.data_types.grid.make_standard_grid(out_grid: numpy.ndarray, start_date: datetime, freq: str, number_of_times: int) xarray.Dataset[source]

Make the standard 5x5 grid from a numpy array, start date, temporal frequency and number of time steps.

Parameters:
  • out_grid (np.ndarray) – Numpy array containing the data. Shape should be (number_of_times, 36, 72)

  • start_date (datetime) – Date of the first time step

  • freq (str) – Temporal frequency

  • number_of_times (int) – Number of time steps, should match the first dimension of the out_grid

Returns:

xarray Dataset containing the data in out_grid with the specified temporal frequency and number of time steps

Return type:

xa.Dataset

climind.data_types.grid.make_xarray(target_grid, times, latitudes, longitudes, variable: str = 'tas_mean') xarray.Dataset[source]

Make a xarray Dataset for a regular lat-lon grid from a numpy grid (ntime, nlat, nlon), and arrays of time (ntime), latitude (nlat) and longitude (nlon).

Parameters:
  • target_grid (np.ndarray) – numpy array of shape (ntime, nlat, nlon)

  • times (np.ndarray) – Array of times, shape (ntime)

  • latitudes (np.ndarray) – Array of latitudes, shape (nlat)

  • longitudes (np.ndarray) – Array of longitudes, shape (nlon)

  • variable (str) – Variable name

Returns:

Dataset built from the input components

Return type:

xa.Dataset

climind.data_types.grid.median_of_datasets(all_datasets: List[GridAnnual]) GridAnnual[source]

Calculate the median of a list of GridAnnual data sets

Parameters:

all_datasets (List[GridAnnual]) – List of GridAnnual datasets from which the medians will be calculated.

Return type:

GridAnnual

climind.data_types.grid.process_datasets(all_datasets: List[GridAnnual], grid_type: str) GridAnnual[source]

Calculate the median or range (depending on selected type) of a list of GridAnnual data sets. Medians are calculated on a grid cell by grid cell basis based on all available data in the list of data sets.

Parameters:
  • all_datasets (List[GridAnnual]) – list of GridAnnual data sets

  • grid_type (str) – Either ‘median’ or ‘range’

Returns:

Data set containing the median (or half-range) values from all the data sets supplied

Return type:

GridAnnual

climind.data_types.grid.range_of_datasets(all_datasets: List[GridAnnual]) GridAnnual[source]

Calculate the half-range of a list of GridAnnual data sets

Parameters:

all_datasets (List[GridAnnual]) – List of GridAnnual datasets from which the ranges will be calculated.

Return type:

GridAnnual

climind.data_types.grid.rank_array(in_array: numpy.ndarray) int[source]

Rank array

Parameters:

in_array (np.ndarray) – Array to be ranked

Return type:

int

climind.data_types.grid.simple_regrid(ingrid: numpy.ndarray, lon0: float, lat0: float, dx: float, target_dy: float) numpy.ndarray[source]

Perform a simple regridding, using a simple average of grid cells from the original grid that fall within the target grid cell.

Parameters:
  • ingrid (np.ndarray) – Starting grid which we want to regrid

  • lon0 (float) – Longitude of zero-indexed grid cell in longitudinal direction

  • lat0 (float) – Latitude of zero-indexed grid cell in latitudinal direction

  • dx (float) – Grid spacing in degrees

  • target_dy (float) – Target grid spacing

Returns:

Returns regridded array.

Return type:

np.ndarray

climind.data_types.timeseries module

class climind.data_types.timeseries.AveragesCollection(all_datasets)[source]

Bases: object

A simple class to perform specific tasks on lists of TimeSeriesAnnual

best_estimate()[source]
count()[source]
lower_range()[source]
range()[source]
upper_range()[source]
class climind.data_types.timeseries.TimeSeries(metadata: CombinedMetadata = None)[source]

Bases: ABC

A base class for representing time series data sets. Note that this class should not generally be used and only its subclasses TimeSeriesMonthly, TimeSeriesAnnual and TimeSeriesIrregular should be used. This class contains shared functionality from these classes but does not work on its own.

add_offset(offset: float) None[source]

Add an offset to the data set.

Parameters:

offset (float) – offset to be added to all values in the data set.

Return type:

None

get_first_and_last_year() Tuple[int, int][source]

Get the first and last year in the series

Returns:

first and last year

Return type:

Tuple[int, int]

abstractmethod get_string_date_range() str[source]

Create a string which specifies the date range covered by the time series

Return type:

str

manually_set_baseline(baseline_start_year: int, baseline_end_year: int) None[source]

Manually set baseline. This changes the baseline in the metadata, but does not change the data themselves.

Parameters:
  • baseline_start_year (int) – Start of baseline period

  • baseline_end_year (int) – End of baseline period

Return type:

None

select_year_range(start_year: int, end_year: int)[source]

Select consecutive years in the specified range and throw away the rest.

Parameters:
  • start_year (int) – First year in the selected range

  • end_year (int) – Final year in the selected range

Returns:

Return time series which only contains years in the specified range

Return type:

TimeSeries

update_history(message: str) None[source]

Update the history metadata

Parameters:

message (str) – Message to be added to history

Return type:

None

write_generic_csv(filename: Path, metadata_filename: Path, monthly: bool, uncertainty: bool, irregular: bool, columns_to_write: List[str]) None[source]

Write the dataset out into csv format

Parameters:
  • filename (Path) – Path of the csv file to which the data will be written.

  • metadata_filename (Path) – Path of the json file to which the data will be written.

  • monthly (bool) – Set to True for monthly data

  • uncertainty (bool) – Set to True to print uncertainties

  • irregular (bool) – Set to True for irregular data

  • columns_to_write (List[str]) – List of the columns from the dataframe to be written to the data file

Return type:

None

class climind.data_types.timeseries.TimeSeriesAnnual(years: list, data: list, metadata=None, uncertainty: list | None = None)[source]

Bases: TimeSeries

A TimeSeriesAnnual combines a pandas Dataframe with a CombinedMetadata to bring together data and metadata in one object. It represents annual averages of data.

Create TimeSeriesAnnual object from its components.

Parameters:
  • years (list) – List of years

  • data (list) – List of data values

  • metadata (CombinedMetadata) – Dictionary containing the metadata

df

Pandas dataframe containing the time and data information

Type:

pd.DataFrame

metadata

Dictionary containing the metadata. The only guaranteed entry is ‘history’

Type:

dict

add_year(year: int, value: float, uncertainty: float = None) None[source]

Add a year of data.

Parameters:
  • year (int) – the year to be added

  • value (float) – the data value to be added

  • uncertainty – the uncertainty of the data value to be added (optional)

Return type:

None

generate_dates(time_units: str) List[datetime][source]

Given a string specifying the required time units (something like days since 1800-01-01 00:00:00.0), generate a list of times from the time series corresponding to those units.

Parameters:

time_units (str) – String specifying the units to use for generating the times e.g. “days since 1800-01-01 00:00:00.0”

Returns:

List of dates

Return type:

List[datetime]

get_rank_from_year(year: int) int | None[source]

Given a year, extract the rank of the data for that year. Ties are given the same rank, which is the lowest rank of the group.

Parameters:

year (int) – Year for which we want the rank

Returns:

Rank of specified year or None if year is not available.

Return type:

Optional[int]

get_string_date_range() str[source]

Create a string which specifies the date range covered by the TimeSeriesAnnual in the format YYYY-YYYY

Returns:

String that specifies the date range covered

Return type:

str

get_uncertainty_from_year(year: int) float | None[source]

Get the data value for a specified year.

Parameters:

year (int) – Year for which a value is desired

Returns:

Uncertainty for the year, or None if year is not in the data set

Return type:

Optional[float]

get_value_from_year(year: int) float | None[source]

Get the data value for a specified year.

Parameters:

year (int) – Year for which a value is desired

Returns:

Value for the year, or None if year is not in the data set

Return type:

Optional[float]

get_year_axis() List[float][source]

Return a year axis with dates represented as decimal years.

Returns:

List of dates as decimal years.

Return type:

List[float]

get_year_from_rank(rank: int) List[int][source]

Given a particular rank, extract a list of years which match that rank. Returns a list because years can (theoretically) be tied with each other. Rank 1 corresponds to the highest value in the dataset.

Parameters:

rank (int) – Rank for which we want the year which has that rank

Returns:

List of years that have the specified rank

Return type:

List[int]

lowess(number_of_points: int = 10)[source]

Lowess smooth the series

Parameters:

number_of_points (int) – Number of points to use in the lowess smoother

static make_from_df(df: pandas.DataFrame, metadata: CombinedMetadata)[source]

Create a TimeSeriesAnnual from a pandas data frame.

Parameters:
  • df (pd.DataFrame) – Pandas dataframe containing columns ‘year’ and ‘data’

  • metadata (dict) – Dictionary containing the metadata

Returns:

TimeSeriesAnnual created from the elements in the dataframe and metadata.

Return type:

TimeSeriesAnnual

rebaseline(baseline_start_year: int, baseline_end_year: int) None[source]

Shift the TimeSeriesAnnual to a new baseline, specified by start and end years (inclusive).

Parameters:
  • baseline_start_year (int) – First year of the climatology period

  • baseline_end_year (int) – Last year of the climatology period

Returns:

Action occurs in place.

Return type:

None

record_margins()[source]
running_lowess(number_of_points: int = 10)[source]

Lowess smooth time point t by running a lowess smoother from t=0 to t=t. For a regular lowess smoother see method lowess.

Parameters:

number_of_points (int) – Number of points to use in the lowess smoother

running_mean(run_length: int, centred: bool = False)[source]

Calculate running mean of the data for a specified run length

Parameters:
  • run_length (int) – length of the run

  • centred (bool) – Set to True to centre the times associated to the data points, otherwise the time used will be the last time in the n-year run.

Returns:

TimeSeriesAnnual containing running averages of length run_length. Where there are too few years to calculate a running average, np.nan appears in the data column of the data frame

Return type:

TimeSeriesAnnual

running_stdev(run_length: int, centred: bool = False)[source]

Calculate running standard deviation of the data for a specified run length

Parameters:
  • run_length (int) – length of the run

  • centred (bool) – Set to True to centre the times associated to the data points, otherwise the time used will be the last time in the n-year run.

Returns:

TimeSeriesAnnual containing running standard deviation of length run_length. Where there are too few years to calculate a running average, np.nan appears in the data column of the data frame

Return type:

TimeSeriesAnnual

running_trend(run_length: int)[source]

Calculate a smoothed series by fitting a straight line to the past 30 years of data and taking the final point as the data value instead

Parameters:

run_length (int) – Number of years for which the trend should be calculated

Returns:

TimeSeriesAnnual containing the end point of trends of length run_length. Where there are too few years to calculate a trend, np.nan appears in the data column of the data frame

Return type:

TimeSeriesAnnual

select_decade(end_year: int = 0)[source]

Select every tenth year from the TimesSeriesAnnual, the last digit of the years can be selected using the end_year keyword argument. The default is to select all years ending in 0, e.g. 1850, 1860, 1870… 2020.

Parameters:

end_year (int) – Last digit of the years to be selected. e.g. set to 0 to pick 1850, 1860… 2010, 2020 etc.

Returns:

TimeSeriesAnnual containing every tenth year

Return type:

TimeSeriesAnnual

time_average(start_year, end_year) float[source]
write_csv(filename, metadata_filename=None)[source]

Write the timeseries to a csv file with the specified filename. The format used for writing is given by the BADC CSV format. This has a lot of upfront metadata before the data section. An option for writing a metadata file is also provided.

Parameters:
  • filename (Path) – Path of the filename to write the data to

  • metadata_filename (Path) – Path of the filename to write the metadata to

Return type:

None

write_simple_csv(filename)[source]
class climind.data_types.timeseries.TimeSeriesIrregular(years: List[int], months: List[int], days: List[int], data: List[float], metadata: CombinedMetadata = None, uncertainty: List[float] | None = None)[source]

Bases: TimeSeries

A TimeSeriesIrregular combines a pandas Dataframe with a CombinedMetadata to bring together data and metadata in one object. It represents non-monthly, non-annual averages of data such as weekly, or 5-day averages.

Create TimeSeriesIrregular object.

Parameters:
  • years (List[int]) – List of integers specifying the year of each data point

  • months (List[int]) – List of integers specifying the month of each data point

  • days (List[int]) – List of integers specifying the day of each data point

  • data (List[float]) – List of floats with the data values

  • metadata (CombinedMetadata) – CombinedMetadata object holding the metadata for the dataset

  • uncertainty (List[float]) – List of floats with the uncertainty values for each data point

fill_daily() None[source]

Ensure that a daily time series has data for every day between the start and end years.

Return type:

None

generate_dates(time_units: str) List[int][source]

Given a string specifying the required time units (something like days since 1800-01-01 00:00:00.0), generate a list of times from the time series corresponding to those units.

Parameters:

time_units (str) – String specifying the units to use for generating the times e.g. “days since 1800-01-01 00:00:00.0”

Return type:

List[int]

get_climatology(climatology_start_year, climatology_end_year)[source]
get_start_and_end_dates() Tuple[datetime, datetime][source]

Get the first and last dates in the dataset

Return type:

Tuple[datetime, datetime]

get_string_date_range() str[source]

Create a string which specifies the date range covered by the TimeSeriesIrregular in the format YYYY.MM.DD-YYYY.MM.DD

Returns:

String that specifies the date range covered

Return type:

str

get_year_axis() List[float][source]

Return a year in which all dates are represented as decimal years. January 1st 1984 is 1984.00.

Returns:

List of dates represented as decimal years.

Return type:

List[float]

lowess(number_of_points: int = 60)[source]

Lowess smooth the series

Parameters:

number_of_points (int) – Number of points to use in the lowess smoother

make_monthly()[source]

Calculate a TimeSeriesMonthly from the TimeSeriesIrregular. The monthly average is calculated from the mean of values within the month.

Returns:

Return a TimeSeriesMonthly containing the monthly averages.

Return type:

TimeSeriesMonthly

rebaseline(baseline_start_year, baseline_end_year) None[source]

Shift the time series to a new baseline, specified by start and end years (inclusive). Each day is rebaselined separately, allowing for changes in seasonality. If years are incomplete, this might give a different result to the annual and monthly versions.

Parameters:
  • baseline_start_year (int) – The first year of the climatology period

  • baseline_end_year (int) – The last year of the climatology period

Returns:

Action occurs in place

Return type:

None

write_csv(filename: Path, metadata_filename: Path = None) None[source]

Write the timeseries to a csv file with the specified filename. The format used for writing is given by the BADC CSV format. This has a lot of upfront metadata before the data section. An option for writing a metadata file is also provided.

Parameters:
  • filename (Path) – Path of the filename to write the data to

  • metadata_filename (Path) – Path of the filename to write the metadata to

Return type:

None

zero_on_year(baseline_year)[source]
class climind.data_types.timeseries.TimeSeriesMonthly(years: List[int], months: List[int], data: List[float], metadata: CombinedMetadata = None, uncertainty: List[float] | None = None)[source]

Bases: TimeSeries

A TimeSeriesMonthly combines a pandas Dataframe with a CombinedMetadata to bring together data and metadata in one object. It represents monthly averages of data.

Create TimeSeriesMonthly object.

Parameters:
  • years (List[int]) – List of years

  • months (List[int]) – List of months

  • data (List[float]) – List of data values

  • metadata (CombinedMetadata) – CombinedMetadata object containing the metadata

  • uncertainty (Optional[List[float]])

df

Pandas dataframe used to contain the time and data information.

Type:

pd.DataFrame

metadata

Dictionary containing metadata. The only guaranteed entry is “history”

Type:

dict

calculate_climatology(baseline_start_year, baseline_end_year)[source]
change_end_month(year, month)[source]
generate_dates(time_units: str) List[int][source]

Given a string specifying the required time units (something like days since 1800-01-01 00:00:00.0), generate a list of times from the time series corresponding to those units.

Parameters:

time_units (str) – String specifying the units to use for generating the times e.g. “days since 1800-01-01 00:00:00.0”

Return type:

List[int]

get_rank_from_year_and_month(year: int, month: int, versus_all_months=False) int | None[source]

Given a year and month, extract the rank of the data for that month. Ties are given the same rank, which is the lowest rank of the group. Default behaviour is to rank the month against the same month in all other years. Setting all to True as a keyword argument ranks the month against all other months in all other years.

Parameters:
  • year (int) – Year of year-month pair for which we want the rank

  • month (int) – Month of year-month pair for which we want the rank

  • versus_all_months (bool) – If set then the ranking is done for the monthly value relative to all other months.

Returns:

Returns the rank of the specified year-month pair as compared to the same month in all other years. If “versus_all_months” is set then returns rank of the anomaly for a particular year and month ranked against all other years and months.

Return type:

int

get_start_and_end_dates() Tuple[datetime, datetime][source]

Get the first and last dates in the dataset

Returns:

Start and end dates.

Return type:

Tuple[datetime, datetime]

get_string_date_range() str[source]

Create a string which specifies the date range covered by the TimeSeriesMonthly in the format YYYY.MM-YYYY.MM

Returns:

String that specifies the date range covered

Return type:

str

get_uncertainty(year: int, month: int) float | None[source]

Get the current uncertainty for a particular year and month

Parameters:
  • year (int) – Year for which the uncertainty is required.

  • month (int) – Month for which the uncertainty is required.

Returns:

Value for the specified year and month or None if it does not exist

Return type:

Optional[float]

get_value(year: int, month: int) float | None[source]

Get the current value for a particular year and month

Parameters:
  • year (int) – Year for which the value is required.

  • month (int) – Month for which the value is required.

Returns:

Value for the specified year and month or None if it does not exist

Return type:

Optional[float]

get_year_axis() List[float][source]

Return a year axis as decimal year. 1st January 1984 is 1984.00.

Returns:

List of dates expressed as a decimal year.

Return type:

List[float]

lowess(number_of_points: int = 60)[source]

Lowess smooth the series

Parameters:

number_of_points (int) – Number of points to use in the lowess smoother

make_annual(cumulative: bool = False)[source]

Calculate a TimeSeriesAnnual from the TimeSeriesMonthly. The annual average is calculated from the mean of available monthly values

Parameters:

cumulative (bool) – Set to true to sum rather than average the monthly values to get the annual value.

Returns:

Return a TimeSeriesAnnual object containing the annual averages.

Return type:

TimeSeriesAnnual

make_annual_by_selecting_month(month: int)[source]

Calculate a TimeSeriesAnnual from the TimeSeriesMonthly. The annual value is taken from one of the monthly values specified by the user.

Returns:

Return a TimeSeriesAnnual object containing only the selected month from each year.

Return type:

TimeSeriesAnnual

static make_from_df(df: pandas.DataFrame, metadata: CombinedMetadata)[source]

Create a TimeSeriesMonthly from a pandas data frame.

Parameters:
  • df (pd.DataFrame) – Pandas dataframe containing columns ‘year’ ‘month’ and ‘data’ (optionally ‘uncertainty’)

  • metadata (dict) – Dictionary containing the metadata

Returns:

TimeSeriesMonthly built from input components.

Return type:

TimeSeriesMonthly

rebaseline(baseline_start_year, baseline_end_year) None[source]

Shift the time series to a new baseline, specified by start and end years (inclusive). Each month is rebaselined separately, allowing for changes in seasonality. If years are incomplete, this might give a different result to the annual version.

Parameters:
  • baseline_start_year (int) – The first year of the climatology period

  • baseline_end_year (int) – The last year of the climatology period

Returns:

Action occurs in place

Return type:

None

running_mean(run_length: int, centred: bool = False)[source]

Calculate running mean of the data for a specified run length

Parameters:
  • run_length (int) – length of the run

  • centred (bool) – Set to True to centre the times associated to the data points, otherwise the time used will be the last time in the n-year run.

Returns:

TimeSeriesMonthly containing running averages of length run_length. Where there are too few years to calculate a running average, np.nan appears in the data column of the data frame

Return type:

TimeSeriesMonthly

write_csv(filename: Path, metadata_filename: Path = None) None[source]

Write the TimeSeriesMonthly to a csv file with the specified filename. The format used for writing is given by the BADC CSV format. This has a lot of upfront metadata before the data section. An option for writing a metadata file is also provided.

Parameters:
  • filename (Path) – Path of the filename to write the data to

  • metadata_filename (Path) – Path of the filename to write the metadata to

Return type:

None

zero_on_month(year: int, month: int) None[source]

Zero data set on the value for a single month in a single year by substracting the value for that month from all values in the dataset.

Parameters:
  • year (int) – Year of the month on which the data will be zeroed.

  • month (int) – Month of the month on which the data will be zeroed.

Return type:

None

climind.data_types.timeseries.create_common_dataframe(dataframes: List[pandas.DataFrame], monthly: bool = False, annual: bool = False, irregular: bool = False) pandas.DataFrame[source]

Given a list of dataframes make a single dataframe which has rows corresponding to all time steps in the input dataframes

Parameters:
  • dataframes (List[pd.DataFrame]) – List of dataframes which are to be used as the basis for the common data frame

  • monthly (bool) – Set to true for monthly data

  • annual (bool) – Set to true for annual data

  • irregular (bool) – Set to true for daily/irregular data

Returns:

Pandas dataframe with one row for each row in the input dataframes

Return type:

pd.DataFrame

climind.data_types.timeseries.equalise_datasets(all_datasets: List[TimeSeriesAnnual | TimeSeriesMonthly | TimeSeriesIrregular], uncertainty: bool = False) pandas.DataFrame[source]

Given a list of datasets

Parameters:

all_datasets (List[Union[TimeSeriesAnnual, TimeSeriesMonthly, TimeSeriesIrregular]]) – List of time series datasets whose data is to be combined in a single data frame. The data column from each data set will be combined into a single data from with each data column becoming a column identified by the “name” of the data set from its metadata.

Returns:

Pandas dataframe containing the data columns from all the input datasets.

Return type:

pd.DataFrame

climind.data_types.timeseries.get_list_of_unique_variables(all_datasets: List[TimeSeriesAnnual]) List[str][source]

Given a list of TimeSeriesAnnual, get a list of the unique variable names represented in that list.

Parameters:

all_datasets (List[TimeSeriesAnnual])

Returns:

List of the unique variable names.

Return type:

List[str]

climind.data_types.timeseries.get_start_and_end_year(all_datasets: List[TimeSeriesAnnual]) Tuple[int | None, int | None][source]

Given a list of TimeSeriesAnnual, extract the first year in any of the data sets and the last year in any of the data sets.

Parameters:

all_datasets (List[TimeSeriesAnnual]) – List of datasets from which to extract the earliest first year and latest final year.

Returns:

Return the first and last years in the list of data sets

Return type:

Tuple[Optional[int], Optional[int]]

climind.data_types.timeseries.log_activity(in_function: Callable) Callable[source]

Decorator function to log name of function run and with which arguments. This aims to provide some traceability in the output.

Parameters:

in_function (Callable) – The function to be decorated

Return type:

Callable

climind.data_types.timeseries.make_combined_series(all_datasets: List[TimeSeriesAnnual], augmented_uncertainty=True) TimeSeriesAnnual[source]

Combine a list of datasets into a single TimeSeriesAnnual by taking the arithmetic mean of all available datasets for each year. Merges the metadata for all the input time series.

Parameters:
  • all_datasets (List[TimeSeriesAnnual]) – List of datasets to be combined

  • augmented_uncertainty (bool) – Set to True if you want to add an additional uncertainty from the baseline

Returns:

TimeSeriesAnnual which is the mean of all availabale datasets in each year.

Return type:

TimeSeriesAnnual

climind.data_types.timeseries.superset_dataset_list(all_datasets: List[TimeSeriesAnnual], variables: List[str]) List[List[TimeSeriesAnnual]][source]

Given a list of variables, create a list where each entry is a list of all TimeSeriesAnnual objects corresponding to the variable in that index position.

Parameters:
  • all_datasets (List[TimeSeriesAnnual]) – List of datasets

  • variables (List[str]) – List of variable names

Returns:

List of lists of TimeSeriesAnnual.

Return type:

List[List[TimeSeriesAnnual]]

climind.data_types.timeseries.write_dataset_summary_file(all_datasets, csv_filename)[source]
climind.data_types.timeseries.write_dataset_summary_file_with_metadata(all_datasets: List[TimeSeriesAnnual | TimeSeriesMonthly | TimeSeriesIrregular], csv_filename: str | Path) None[source]

Given a list of time series data sets, write them out in a single BADC CSV format csv file with complete metadata.

Parameters:
Return type:

None

Module contents

There are two main data types implemented in this package: timeseries and grids. In each of those two cases, the data set consists of a data-carrying part and a metadata part. For timeseries, the data-carrying part is a pandas dataframe and for a grid, it’s an xarray dataset.