climind.fetchers package

The fetchers package contains all the scripts needed to download the data sets. There are specific fetchers for regular URLs, ftp sites and datasets which are not available online.

In addition, some websites and data sets have peculiarities that mean a standard reader isn’t adequate. For example, NOAAGlobalTemp has a fixed directory, but the filename changes unpredictably as it contains the date of its creation. The fetcher for NOAAGlobalTemp therefore has to parse the webpage to find the appropriate file and then download that.

Other datasets require credentials, which are stored in a .env file which is in also in the fetchers directory, but not part of the package.

Submodules

climind.fetchers.fetcher_cds module

Fetcher which uses the Copernicus Climate Data Store to download ERA5 gridded data. The first time it is run, it will download all data. This will take a while.

climind.fetchers.fetcher_cds.fetch(url: str, outdir: Path, _) None

Fetch all data in the range 1979 to 2022.

Parameters
  • url (str) – url of the file

  • outdir (Path) – Path to the directory to which the data will be written.

Return type

None

climind.fetchers.fetcher_cds.fetch_year(out_dir: Path, year: int) None

Fetch a specified year of data and write it to the outdir. If the year is incomplete, only recover available months.

Parameters
  • out_dir (Path) – directory to which the data will be written

  • year (int) – the year of data we want

Return type

None

climind.fetchers.fetcher_cds.pick_months(year: int, now: datetime) List[str]

For a given year, return a list of strings containing the months which are available in the CDS for ERA5. For years before the current year, return all months. For the current year return all months up to last month, but only include last month if the day of the month is after the 7th

Parameters
  • year (int) – Year for which we want to pick months

  • now (datetime) – Today, used to assess how incomplete is the current year.

Returns

List of the months to download for the specified year.

Return type

List[str]

climind.fetchers.fetcher_cmems_ftp module

climind.fetchers.fetcher_cmems_ftp.fetch(url: str, out_dir: Path, _) None

Fetch data from the CMEMS ftp system. Credentials required are

  • username, specified by entry in .env CMEMS_USER

  • password, specified by entry in .env CMEMS_PSWD

Parameters
  • url (str) – The URL of the file to be downloaded

  • out_dir (Path) – Path of the directory to which the files will be saved

Return type

None

climind.fetchers.fetcher_ersst module

climind.fetchers.fetcher_ersst.fetch(url: str, out_dir: Path, _) None

Fetch ERSST gridded dataset from NOAA. There is one file per month. Only files that have not already been downloaded will be downloaded.

Parameters
  • url (str) – URL of the file

  • out_dir (Path) – Path of the directory to which output will be written

Return type

None

climind.fetchers.fetcher_ftp module

climind.fetchers.fetcher_ftp.fetch(url: str, out_dir: Path, filename: str) None

Generic fetcher for ftp files

Parameters
  • url (str) – URL of the file

  • out_dir (Path) – Path of the directory to which the output will be written.

  • filename (str) – Filename to save file to

Return type

None

climind.fetchers.fetcher_grace module

climind.fetchers.fetcher_grace.fetch(url: str, outdir: Path, _) None

Fetch files from the PODAAC website. Note that the API URL base is: API_url = “https://podaac-tools.jpl.nasa.gov/drive/files

Requires the credentials:

  • username, specified by entry in .env PODAAC_USER

  • password, specified by entry in .env PODAAC_PSWD

Parameters
  • url (str) – URL for the file

  • outdir (Path) – directory to which the file will be written.

Return type

None

climind.fetchers.fetcher_jra55_grid module

Set of scripts to download the JRA-55 gridded data. Adapted from the scripts provided by UCAR. Data are stored by year up till a certain point and by month for near-real time data thereafter. Credentials are needed (see fetch function)

climind.fetchers.fetcher_jra55_grid.check_file_status(file_path, file_size) None

Writes a status bar on the download. Not used.

Parameters
  • file_path (str) – Path of the file

  • file_size (float) – Size of the file

Return type

None

climind.fetchers.fetcher_jra55_grid.download_file(filename: str, file_base: str, ret) None

Download a file.

Parameters
  • filename (str) – URL of the file to be downloaded

  • file_base (str) – Name of the output file to which the data will be written

  • ret – Authentication information

Return type

None

climind.fetchers.fetcher_jra55_grid.fetch(_, out_dir: Path, _filename) None

Get JRA-55 files from UCAR. Requires the credentials:

  • username, specified by entry in .env UCAR_USER

  • password, specified by entry in .env UCAR_PSWD

Parameters
  • _ – dummy input to match interface.

  • out_dir (Path) – Path of the directory to which the output will be written.

  • _filename (str) – Unused filename argument

Return type

None

climind.fetchers.fetcher_jra55_grid.get_files(filelist: List[str], web_path: str, ret) None

For each file in a file list, check if it already exists on the system and if it does not, attempt to download it.

Parameters
  • filelist (List[str]) – List of files to be downloaded

  • web_path (str) – URL of the directory that contains the files.

  • ret – Authentication information

Return type

None

climind.fetchers.fetcher_jra55_grid.make_file_list(first_year, final_year) List[str]

Make a list of annual archived filenames between the two specified years.

Parameters
  • first_year (int) – Year to start generation

  • final_year (int) – Year to end generation

Returns

List of filenames for archived data between the specified years

Return type

List[str]

climind.fetchers.fetcher_jra55_grid.make_realtime_file_list(first_year: int, final_year: int) List[str]

Make a list of monthly real-time filenames between the two specified years.

Parameters
  • first_year (int) – Year to start generation

  • final_year (int) – Year to end generation

Returns

List of filenames for real-time data between the specified years

Return type

List[str]

climind.fetchers.fetcher_no_url module

climind.fetchers.fetcher_no_url.fetch(url, out_dir, filename)

A stub so that there is a fetcher for datasets which don’t exist online.

climind.fetchers.fetcher_noaaglobaltemp module

climind.fetchers.fetcher_noaaglobaltemp.fetch(url: str, outdir: Path, _) None

Fetch NOAAGlobalTemp data. The script scrapes the directory specified in the URL for a file that matches the pattern specified in the URL.

Parameters
  • url (srt) – URL of the file to be downloaded, containing wildcards for information that needs to be matched on a case by case basis.

  • outdir (Path) – Path of the directory to which the output will be written

Return type

None

climind.fetchers.fetcher_promice module

climind.fetchers.fetcher_promice.fetch(url: str, outdir: Path, _)

Fetch Greenland mass balance data. The script scrapes a webpage in order to find the specific URLs of the latest version of the dataset (these change daily). These files are then downloaded. There should be two files: a daily file and an annual file.

Parameters
  • url (srt) – URL of the directory which contains the files to be downloaded

  • outdir (Path) – Path of the directory to which the output will be written

Return type

None

climind.fetchers.fetcher_standard_url module

climind.fetchers.fetcher_standard_url.fetch(url: str, outdir: Path, filename: str) None

Fetcher for a standard URL that can be accessed without restrictions, credentials, or any other tomfoolery.

Parameters
  • url (str) – URL of the file to be downloaded.

  • outdir (Path) – Path of the directory to which the output will be written

  • filename (str) – Filename to save file as locally

Return type

None

climind.fetchers.fetcher_url_with_backsearch module

climind.fetchers.fetcher_url_with_backsearch.fetch(url: str, out_dir: Path, _) None

Fetch file but using a backsearch. Backsearching starts with the most recent month, creates a filename using that month to fill the year (YYYY) and month (MMMM) placeholders in the specified URL and then tries to download that file. Search proceeds backwards for 24 months from today’s date.

Parameters
  • url (str) – URL of the file containing placeholders for the year (YYYY) and month (MMMM)

  • out_dir (Path) – Path to which the output will be written

Return type

None

climind.fetchers.fetcher_url_with_backsearch.filename_from_url(url: str) str

Extract just the filename from a URL.

Parameters

url (str) – URL of a file

Returns

The filename of the file specified by the URL

Return type

str

climind.fetchers.fetcher_utils module

Contains a set of routines used by the fetchers to perform various standard tasks such as extracting the filename from a URL.

climind.fetchers.fetcher_utils.dir_and_filename_from_url(url: str) Tuple[str, str]

Get the filename and url up to, but not including the filename

Parameters

url (str) –

Returns

Return directory name and filename

Return type

str, str

climind.fetchers.fetcher_utils.filename_from_url(url: str) str

Given an url, return the filename or an empty string if there is no filename

Parameters

url (str) – URL to be parsed

Returns

Return the filename part of the URL

Return type

str

climind.fetchers.fetcher_utils.get_ftp_host_and_directory_from_url(url: str) Tuple[str, List[str]]

From a url, extract the host name and the directory, the directory being broken down into a list of subdirectories

Parameters

url (str) – URL to extract information from

Returns

  • str – The host name

  • list – A list of directories

climind.fetchers.fetcher_utils.url_from_filename(url: str, filename: str) str

Given an url and filename, replace the filename in the URL with the input filename

Parameters
  • url (str) – URL specifying a file, for which the filename will be changed.

  • filename (str) – New filename to be use in output URL

Returns

Returns the URL with the new filename

Return type

str