climind.fetchers package

Submodules

climind.fetchers.fetcher_cds module

Fetcher which uses the Copernicus Climate Data Store to download ERA5 gridded data. The first time it is run, it will download all data. This will take a while.

climind.fetchers.fetcher_cds.fetch(url: str, outdir: Path, filename: str) None[source]

Fetch all data in the range 1979 to 2022.

Parameters:
  • url (str) – url of the file

  • outdir (Path) – Path to the directory to which the data will be written.

Return type:

None

climind.fetchers.fetcher_cds.fetch_to_year(out_dir: Path, year: int, variable: str = 'tas') None[source]

Fetch a specified year of data and write it to the outdir. If the year is incomplete, only recover available months.

Parameters:
  • out_dir (Path) – directory to which the data will be written

  • year (int) – the year of data we want

  • variable (str) – Variable to be extracted - either tas or sealevel

Return type:

None

climind.fetchers.fetcher_cds.pick_months(year: int, now: datetime) List[str][source]

For a given year, return a list of strings containing the months which are available in the CDS for ERA5. For years before the current year, return all months. For the current year return all months up to last month, but only include last month if the day of the month is after the 7th

Parameters:
  • year (int) – Year for which we want to pick months

  • now (datetime) – Today, used to assess how incomplete is the current year.

Returns:

List of the months to download for the specified year.

Return type:

List[str]

climind.fetchers.fetcher_cds_ensemble module

Fetcher which uses the Copernicus Climate Data Store to download ERA5 gridded data. The first time it is run, it will download all data. This will take a while.

climind.fetchers.fetcher_cds_ensemble.fetch(url: str, outdir: Path, filename: str) None[source]

Fetch all data in the range 1979 to 2022.

Parameters:
  • url (str) – url of the file

  • outdir (Path) – Path to the directory to which the data will be written.

Return type:

None

climind.fetchers.fetcher_cds_ensemble.fetch_year(out_dir: Path, year: int, month: int, day: int, variable: str = 'tas') None[source]

Fetch a specified year of data and write it to the outdir. If the year is incomplete, only recover available months.

Parameters:
  • out_dir (Path) – directory to which the data will be written

  • year (int) – the year of data we want

  • variable (str) – Variable to be extracted - either tas or sealevel

Return type:

None

climind.fetchers.fetcher_cmems_ftp module

climind.fetchers.fetcher_cmems_ftp.fetch(url: str, out_dir: Path, _) None[source]

Fetch data from the CMEMS ftp system. Credentials required are

  • username, specified by entry in .env CMEMS_USER

  • password, specified by entry in .env CMEMS_PSWD

Parameters:
  • url (str) – The URL of the file to be downloaded

  • out_dir (Path) – Path of the directory to which the files will be saved

Return type:

None

climind.fetchers.fetcher_ersst module

climind.fetchers.fetcher_ersst.fetch(url: str, out_dir: Path, _) None[source]

Fetch ERSST gridded dataset from NOAA. There is one file per month. Only files that have not already been downloaded will be downloaded.

Parameters:
  • url (str) – URL of the file

  • out_dir (Path) – Path of the directory to which output will be written

Return type:

None

climind.fetchers.fetcher_ftp module

climind.fetchers.fetcher_ftp.fetch(url: str, out_dir: Path, filename: str) None[source]

Generic fetcher for ftp files

Parameters:
  • url (str) – URL of the file

  • out_dir (Path) – Path of the directory to which the output will be written.

  • filename (str) – Filename to save file to

Return type:

None

climind.fetchers.fetcher_gpcc module

climind.fetchers.fetcher_gpcc.fetch(url: str, outdir: Path, _) None[source]
climind.fetchers.fetcher_gpcc.fetch_year(url: str, outdir: Path, year: int)[source]

climind.fetchers.fetcher_gpcc_quantile module

climind.fetchers.fetcher_gpcc_quantile.fetch(url: str, outdir: Path, _) None[source]

Fetch GPCC quantile data. The script scrapes the directory specified in the URL for a file that matches the pattern specified in the URL.

Parameters:
  • url (str) – URL of the file to be downloaded, containing wildcards for information that needs to be matched on a case by case basis.

  • outdir (Path) – Path of the directory to which the output will be written

Return type:

None

climind.fetchers.fetcher_gpcc_quantile.get_file(filled_url, out_path)[source]
climind.fetchers.fetcher_gpcc_quantile.get_time_span(filled_url)[source]

climind.fetchers.fetcher_grace module

climind.fetchers.fetcher_grace.fetch(url: str, outdir: Path, _) None[source]

Fetch files from the PODAAC website. Note that the API URL base is: API_url = “https://podaac-tools.jpl.nasa.gov/drive/files

Requires the credentials:

  • username, specified by entry in .env PODAAC_USER

  • password, specified by entry in .env PODAAC_PSWD

Parameters:
  • url (str) – URL for the file

  • outdir (Path) – directory to which the file will be written.

Return type:

None

climind.fetchers.fetcher_grace_aws module

climind.fetchers.fetcher_jaxa module

climind.fetchers.fetcher_jaxa.fetch(url: str, out_dir: Path, filename: str) None[source]

Generic fetcher for ftp files

Parameters:
  • url (str) – URL of the file

  • out_dir (Path) – Path of the directory to which the output will be written.

  • filename (str) – Filename to save file to

Return type:

None

climind.fetchers.fetcher_jra3q_grid module

Set of scripts to download the JRA-55 gridded data. Adapted from the scripts provided by UCAR. Data are stored by year up till a certain point and by month for near-real time data thereafter. Credentials are needed (see fetch function)

climind.fetchers.fetcher_jra3q_grid.download_file(filename: str, file_base: str, process: bool) None[source]

Download a file.

Parameters:
  • filename (str) – URL of the file to be downloaded

  • file_base (str) – Name of the output file to which the data will be written

Return type:

None

climind.fetchers.fetcher_jra3q_grid.fetch(_, out_dir: Path, _filename) None[source]

Get JRA-55 files from UCAR. Requires the credentials:

  • username, specified by entry in .env UCAR_USER

  • password, specified by entry in .env UCAR_PSWD

Parameters:
  • _ – dummy input to match interface.

  • out_dir (Path) – Path of the directory to which the output will be written.

  • _filename (str) – Unused filename argument

Return type:

None

climind.fetchers.fetcher_jra3q_grid.get_files(filelist: List[str], web_path: str, process: bool = False, output_filelist=None) None[source]

For each file in a file list, check if it already exists on the system and if it does not, attempt to download it.

Parameters:
  • filelist (List[str]) – List of files to be downloaded

  • web_path (str) – URL of the directory that contains the files.

Return type:

None

climind.fetchers.fetcher_jra3q_grid.make_file_list(first_year, final_year) List[str][source]

Make a list of annual archived filenames between the two specified years.

Parameters:
  • first_year (int) – Year to start generation

  • final_year (int) – Year to end generation

Returns:

List of filenames for archived data between the specified years

Return type:

List[str]

climind.fetchers.fetcher_jra3q_grid.make_realtime_file_list(first_year: int, final_year: int) List[str][source]

Make a list of monthly real-time filenames between the two specified years.

Parameters:
  • first_year (int) – Year to start generation

  • final_year (int) – Year to end generation

Returns:

List of filenames for real-time data between the specified years

Return type:

List[str]

climind.fetchers.fetcher_jra3q_grid.process_file(file_base: str) None[source]

Read in the monthly files and take a time mean

Parameters:

file_base (filename of file to be processed)

Return type:

None

climind.fetchers.fetcher_jra55_grid module

Set of scripts to download the JRA-55 gridded data. Adapted from the scripts provided by UCAR. Data are stored by year up till a certain point and by month for near-real time data thereafter. Credentials are needed (see fetch function)

climind.fetchers.fetcher_jra55_grid.download_file(filename: str, file_base: str) None[source]

Download a file.

Parameters:
  • filename (str) – URL of the file to be downloaded

  • file_base (str) – Name of the output file to which the data will be written

Return type:

None

climind.fetchers.fetcher_jra55_grid.fetch(_, out_dir: Path, _filename) None[source]

Get JRA-55 files from UCAR. Requires the credentials:

  • username, specified by entry in .env UCAR_USER

  • password, specified by entry in .env UCAR_PSWD

Parameters:
  • _ – dummy input to match interface.

  • out_dir (Path) – Path of the directory to which the output will be written.

  • _filename (str) – Unused filename argument

Return type:

None

climind.fetchers.fetcher_jra55_grid.get_files(filelist: List[str], web_path: str) None[source]

For each file in a file list, check if it already exists on the system and if it does not, attempt to download it.

Parameters:
  • filelist (List[str]) – List of files to be downloaded

  • web_path (str) – URL of the directory that contains the files.

Return type:

None

climind.fetchers.fetcher_jra55_grid.make_file_list(first_year, final_year) List[str][source]

Make a list of annual archived filenames between the two specified years.

Parameters:
  • first_year (int) – Year to start generation

  • final_year (int) – Year to end generation

Returns:

List of filenames for archived data between the specified years

Return type:

List[str]

climind.fetchers.fetcher_jra55_grid.make_realtime_file_list(first_year: int, final_year: int) List[str][source]

Make a list of monthly real-time filenames between the two specified years.

Parameters:
  • first_year (int) – Year to start generation

  • final_year (int) – Year to end generation

Returns:

List of filenames for real-time data between the specified years

Return type:

List[str]

climind.fetchers.fetcher_no_url module

climind.fetchers.fetcher_no_url.fetch(url, out_dir, filename)[source]

A stub so that there is a fetcher for datasets which don’t exist online.

climind.fetchers.fetcher_noaaglobaltemp module

climind.fetchers.fetcher_noaaglobaltemp.fetch(url: str, outdir: Path, _) None[source]

Fetch NOAAGlobalTemp data. The script scrapes the directory specified in the URL for a file that matches the pattern specified in the URL.

Parameters:
  • url (str) – URL of the file to be downloaded, containing wildcards for information that needs to be matched on a case by case basis.

  • outdir (Path) – Path of the directory to which the output will be written

Return type:

None

climind.fetchers.fetcher_promice module

climind.fetchers.fetcher_promice.fetch(url: str, outdir: Path, _)[source]

Fetch Greenland mass balance data. The script scrapes a webpage in order to find the specific URLs of the latest version of the dataset (these change daily). These files are then downloaded. There should be two files: a daily file and an annual file.

Parameters:
  • url (srt) – URL of the directory which contains the files to be downloaded

  • outdir (Path) – Path of the directory to which the output will be written

Return type:

None

climind.fetchers.fetcher_standard_url module

climind.fetchers.fetcher_standard_url.fetch(url: str, outdir: Path, filename: str) None[source]

Fetcher for a standard URL that can be accessed without restrictions, credentials, or any other tomfoolery.

Parameters:
  • url (str) – URL of the file to be downloaded.

  • outdir (Path) – Path of the directory to which the output will be written

  • filename (str) – Filename to save file as locally

Return type:

None

climind.fetchers.fetcher_standard_url_with_rename module

climind.fetchers.fetcher_standard_url_with_rename.fetch(url: str, outdir: Path, filename: str) None[source]

Fetcher for a standard URL that can be accessed without restrictions, credentials, or any other tomfoolery.

Parameters:
  • url (str) – URL of the file to be downloaded.

  • outdir (Path) – Path of the directory to which the output will be written

  • filename (str) – Filename to save file as locally

Return type:

None

climind.fetchers.fetcher_url_with_backsearch module

climind.fetchers.fetcher_url_with_backsearch.fetch(url: str, out_dir: Path, _) None[source]

Fetch file but using a backsearch. Backsearching starts with the most recent month, creates a filename using that month to fill the year (YYYY) and month (MMMM) placeholders in the specified URL and then tries to download that file. Search proceeds backwards for 24 months from today’s date.

Parameters:
  • url (str) – URL of the file containing placeholders for the year (YYYY) and month (MMMM)

  • out_dir (Path) – Path to which the output will be written

Return type:

None

climind.fetchers.fetcher_url_with_backsearch.filename_from_url(url: str) str[source]

Extract just the filename from a URL.

Parameters:

url (str) – URL of a file

Returns:

The filename of the file specified by the URL

Return type:

str

climind.fetchers.fetcher_utils module

Contains a set of routines used by the fetchers to perform various standard tasks such as extracting the filename from a URL.

climind.fetchers.fetcher_utils.dir_and_filename_from_url(url: str) Tuple[str, str][source]

Get the filename and url up to, but not including the filename

Parameters:

url (str)

Returns:

Return directory name and filename

Return type:

str, str

climind.fetchers.fetcher_utils.filename_from_url(url: str) str[source]

Given an url, return the filename or an empty string if there is no filename

Parameters:

url (str) – URL to be parsed

Returns:

Return the filename part of the URL

Return type:

str

climind.fetchers.fetcher_utils.fill_year_month(instr, y, m)[source]
climind.fetchers.fetcher_utils.get_ftp_host_and_directory_from_url(url: str) Tuple[str, List[str]][source]

From a url, extract the host name and the directory, the directory being broken down into a list of subdirectories

Parameters:

url (str) – URL to extract information from

Returns:

  • str – The host name

  • list – A list of directories

climind.fetchers.fetcher_utils.get_n_months_back(y: int, m: int, back: int = 12) Tuple[int, int][source]

Get the year and month that, including the specified month, makes n months

Parameters:
  • y (int) – Year

  • m (int) – Month

  • back (int) – Number of months to include

Return type:

Tuple(int, int)

climind.fetchers.fetcher_utils.time_tag_string(instr)[source]
climind.fetchers.fetcher_utils.url_from_filename(url: str, filename: str) str[source]

Given an url and filename, replace the filename in the URL with the input filename

Parameters:
  • url (str) – URL specifying a file, for which the filename will be changed.

  • filename (str) – New filename to be use in output URL

Returns:

Returns the URL with the new filename

Return type:

str

Module contents

The fetchers package contains all the scripts needed to download the data sets. There are specific fetchers for regular URLs, ftp sites and datasets which are not available online.

In addition, some websites and data sets have peculiarities that mean a standard reader isn’t adequate. For example, NOAAGlobalTemp has a fixed directory, but the filename changes unpredictably as it contains the date of its creation. The fetcher for NOAAGlobalTemp therefore has to parse the webpage to find the appropriate file and then download that.

Other datasets require credentials, which are stored in a .env file which is in also in the fetchers directory, but not part of the package.