climind.fetchers package

The fetchers package contains all the scripts needed to download the data sets. There are specific fetchers for regular URLs, ftp sites and datasets which are not available online.

In addition, some websites and data sets have peculiarities that mean a standard reader isn’t adequate. For example, NOAAGlobalTemp has a fixed directory, but the filename changes unpredictably as it contains the date of its creation. The fetcher for NOAAGlobalTemp therefore has to parse the webpage to find the appropriate file and then download that.

Other datasets require credentials, which are stored in a .env file which is in also in the fetchers directory, but not part of the package.

Submodules

climind.fetchers.fetcher_cds module

Fetcher which uses the Copernicus Climate Data Store to download ERA5 gridded data. The first time it is run, it will download all data. This will take a while.

climind.fetchers.fetcher_cds.fetch(url: str, outdir: Path, _) → None

Fetch all data in the range 1979 to 2022.

Parameters

url (str) – url of the file
outdir (Path) – Path to the directory to which the data will be written.

Return type

None

climind.fetchers.fetcher_cds.fetch_year(out_dir: Path, year: int) → None

Fetch a specified year of data and write it to the outdir. If the year is incomplete, only recover available months.

Parameters

out_dir (Path) – directory to which the data will be written
year (int) – the year of data we want

Return type

None

climind.fetchers.fetcher_cds.pick_months(year: int, now: datetime) → List[str]

For a given year, return a list of strings containing the months which are available in the CDS for ERA5. For years before the current year, return all months. For the current year return all months up to last month, but only include last month if the day of the month is after the 7th

Parameters

year (int) – Year for which we want to pick months
now (datetime) – Today, used to assess how incomplete is the current year.

Returns

List of the months to download for the specified year.

Return type

List[str]

climind.fetchers.fetcher_cmems_ftp module

climind.fetchers.fetcher_cmems_ftp.fetch(url: str, out_dir: Path, _) → None

Fetch data from the CMEMS ftp system. Credentials required are

username, specified by entry in .env CMEMS_USER
password, specified by entry in .env CMEMS_PSWD

Parameters

url (str) – The URL of the file to be downloaded
out_dir (Path) – Path of the directory to which the files will be saved

Return type

None

climind.fetchers.fetcher_ersst module

climind.fetchers.fetcher_ersst.fetch(url: str, out_dir: Path, _) → None

Fetch ERSST gridded dataset from NOAA. There is one file per month. Only files that have not already been downloaded will be downloaded.

Parameters

url (str) – URL of the file
out_dir (Path) – Path of the directory to which output will be written

Return type

None

climind.fetchers.fetcher_ftp module

climind.fetchers.fetcher_ftp.fetch(url: str, out_dir: Path, filename: str) → None

Generic fetcher for ftp files

Parameters

url (str) – URL of the file
out_dir (Path) – Path of the directory to which the output will be written.
filename (str) – Filename to save file to

Return type

None

climind.fetchers.fetcher_grace module

climind.fetchers.fetcher_grace.fetch(url: str, outdir: Path, _) → None

Fetch files from the PODAAC website. Note that the API URL base is: API_url = “https://podaac-tools.jpl.nasa.gov/drive/files”

Requires the credentials:

username, specified by entry in .env PODAAC_USER
password, specified by entry in .env PODAAC_PSWD

Parameters

url (str) – URL for the file
outdir (Path) – directory to which the file will be written.

Return type

None

climind.fetchers.fetcher_jra55_grid module

Set of scripts to download the JRA-55 gridded data. Adapted from the scripts provided by UCAR. Data are stored by year up till a certain point and by month for near-real time data thereafter. Credentials are needed (see fetch function)

climind.fetchers.fetcher_jra55_grid.check_file_status(file_path, file_size) → None

Writes a status bar on the download. Not used.

Parameters

file_path (str) – Path of the file
file_size (float) – Size of the file

Return type

None

climind.fetchers.fetcher_jra55_grid.download_file(filename: str, file_base: str, ret) → None

Download a file.

Parameters

filename (str) – URL of the file to be downloaded
file_base (str) – Name of the output file to which the data will be written
ret – Authentication information

Return type

None

climind.fetchers.fetcher_jra55_grid.fetch(_, out_dir: Path, _filename) → None

Get JRA-55 files from UCAR. Requires the credentials:

username, specified by entry in .env UCAR_USER
password, specified by entry in .env UCAR_PSWD

Parameters

_ – dummy input to match interface.
out_dir (Path) – Path of the directory to which the output will be written.
_filename (str) – Unused filename argument

Return type

None

climind.fetchers.fetcher_jra55_grid.get_files(filelist: List[str], web_path: str, ret) → None

For each file in a file list, check if it already exists on the system and if it does not, attempt to download it.

Parameters

filelist (List[str]) – List of files to be downloaded
web_path (str) – URL of the directory that contains the files.
ret – Authentication information

Return type

None

climind.fetchers.fetcher_jra55_grid.make_file_list(first_year, final_year) → List[str]

Make a list of annual archived filenames between the two specified years.

Parameters

first_year (int) – Year to start generation
final_year (int) – Year to end generation

Returns

List of filenames for archived data between the specified years

Return type

List[str]

climind.fetchers.fetcher_jra55_grid.make_realtime_file_list(first_year: int, final_year: int) → List[str]

Make a list of monthly real-time filenames between the two specified years.

Parameters

first_year (int) – Year to start generation
final_year (int) – Year to end generation

Returns

List of filenames for real-time data between the specified years

Return type

List[str]

climind.fetchers.fetcher_no_url module

climind.fetchers.fetcher_no_url.fetch(url, out_dir, filename): A stub so that there is a fetcher for datasets which don’t exist online.

climind.fetchers.fetcher_noaaglobaltemp module

climind.fetchers.fetcher_noaaglobaltemp.fetch(url: str, outdir: Path, _) → None

Fetch NOAAGlobalTemp data. The script scrapes the directory specified in the URL for a file that matches the pattern specified in the URL.

Parameters

url (srt) – URL of the file to be downloaded, containing wildcards for information that needs to be matched on a case by case basis.
outdir (Path) – Path of the directory to which the output will be written

Return type

None

climind.fetchers.fetcher_promice module

climind.fetchers.fetcher_promice.fetch(url: str, outdir: Path, _)

Fetch Greenland mass balance data. The script scrapes a webpage in order to find the specific URLs of the latest version of the dataset (these change daily). These files are then downloaded. There should be two files: a daily file and an annual file.

Parameters

url (srt) – URL of the directory which contains the files to be downloaded
outdir (Path) – Path of the directory to which the output will be written

Return type

None

climind.fetchers.fetcher_standard_url module

climind.fetchers.fetcher_standard_url.fetch(url: str, outdir: Path, filename: str) → None

Fetcher for a standard URL that can be accessed without restrictions, credentials, or any other tomfoolery.

Parameters

url (str) – URL of the file to be downloaded.
outdir (Path) – Path of the directory to which the output will be written
filename (str) – Filename to save file as locally

Return type

None

climind.fetchers.fetcher_url_with_backsearch module

climind.fetchers.fetcher_url_with_backsearch.fetch(url: str, out_dir: Path, _) → None

Fetch file but using a backsearch. Backsearching starts with the most recent month, creates a filename using that month to fill the year (YYYY) and month (MMMM) placeholders in the specified URL and then tries to download that file. Search proceeds backwards for 24 months from today’s date.

Parameters

url (str) – URL of the file containing placeholders for the year (YYYY) and month (MMMM)
out_dir (Path) – Path to which the output will be written

Return type

None

climind.fetchers.fetcher_url_with_backsearch.filename_from_url(url: str) → str

Extract just the filename from a URL.

Parameters: url (str) – URL of a file
Returns: The filename of the file specified by the URL
Return type: str

climind.fetchers.fetcher_utils module

Contains a set of routines used by the fetchers to perform various standard tasks such as extracting the filename from a URL.

climind.fetchers.fetcher_utils.dir_and_filename_from_url(url: str) → Tuple[str, str]

Get the filename and url up to, but not including the filename

Parameters: url (str) –
Returns: Return directory name and filename
Return type: str, str

climind.fetchers.fetcher_utils.filename_from_url(url: str) → str

Given an url, return the filename or an empty string if there is no filename

Parameters: url (str) – URL to be parsed
Returns: Return the filename part of the URL
Return type: str

climind.fetchers.fetcher_utils.get_ftp_host_and_directory_from_url(url: str) → Tuple[str, List[str]]

From a url, extract the host name and the directory, the directory being broken down into a list of subdirectories

Parameters

url (str) – URL to extract information from

Returns

str – The host name
list – A list of directories

climind.fetchers.fetcher_utils.url_from_filename(url: str, filename: str) → str

Given an url and filename, replace the filename in the URL with the input filename

Parameters

url (str) – URL specifying a file, for which the filename will be changed.
filename (str) – New filename to be use in output URL

Returns

Returns the URL with the new filename

Return type

str