climind.fetchers package
The fetchers package contains all the scripts needed to download the data sets. There are specific fetchers for regular URLs, ftp sites and datasets which are not available online.
In addition, some websites and data sets have peculiarities that mean a standard reader isn’t adequate. For example, NOAAGlobalTemp has a fixed directory, but the filename changes unpredictably as it contains the date of its creation. The fetcher for NOAAGlobalTemp therefore has to parse the webpage to find the appropriate file and then download that.
Other datasets require credentials, which are stored in a .env file which is in also in the fetchers directory, but not part of the package.
Submodules
climind.fetchers.fetcher_cds module
Fetcher which uses the Copernicus Climate Data Store to download ERA5 gridded data. The first time it is run, it will download all data. This will take a while.
- climind.fetchers.fetcher_cds.fetch(url: str, outdir: Path, _) None
Fetch all data in the range 1979 to 2022.
- Parameters
url (str) – url of the file
outdir (Path) – Path to the directory to which the data will be written.
- Return type
None
- climind.fetchers.fetcher_cds.fetch_year(out_dir: Path, year: int) None
Fetch a specified year of data and write it to the outdir. If the year is incomplete, only recover available months.
- Parameters
out_dir (Path) – directory to which the data will be written
year (int) – the year of data we want
- Return type
None
- climind.fetchers.fetcher_cds.pick_months(year: int, now: datetime) List[str]
For a given year, return a list of strings containing the months which are available in the CDS for ERA5. For years before the current year, return all months. For the current year return all months up to last month, but only include last month if the day of the month is after the 7th
- Parameters
year (int) – Year for which we want to pick months
now (datetime) – Today, used to assess how incomplete is the current year.
- Returns
List of the months to download for the specified year.
- Return type
List[str]
climind.fetchers.fetcher_cmems_ftp module
- climind.fetchers.fetcher_cmems_ftp.fetch(url: str, out_dir: Path, _) None
Fetch data from the CMEMS ftp system. Credentials required are
username, specified by entry in .env CMEMS_USER
password, specified by entry in .env CMEMS_PSWD
- Parameters
url (str) – The URL of the file to be downloaded
out_dir (Path) – Path of the directory to which the files will be saved
- Return type
None
climind.fetchers.fetcher_ersst module
- climind.fetchers.fetcher_ersst.fetch(url: str, out_dir: Path, _) None
Fetch ERSST gridded dataset from NOAA. There is one file per month. Only files that have not already been downloaded will be downloaded.
- Parameters
url (str) – URL of the file
out_dir (Path) – Path of the directory to which output will be written
- Return type
None
climind.fetchers.fetcher_ftp module
- climind.fetchers.fetcher_ftp.fetch(url: str, out_dir: Path, filename: str) None
Generic fetcher for ftp files
- Parameters
url (str) – URL of the file
out_dir (Path) – Path of the directory to which the output will be written.
filename (str) – Filename to save file to
- Return type
None
climind.fetchers.fetcher_grace module
- climind.fetchers.fetcher_grace.fetch(url: str, outdir: Path, _) None
Fetch files from the PODAAC website. Note that the API URL base is: API_url = “https://podaac-tools.jpl.nasa.gov/drive/files”
Requires the credentials:
username, specified by entry in .env PODAAC_USER
password, specified by entry in .env PODAAC_PSWD
- Parameters
url (str) – URL for the file
outdir (Path) – directory to which the file will be written.
- Return type
None
climind.fetchers.fetcher_jra55_grid module
Set of scripts to download the JRA-55 gridded data. Adapted from the scripts provided by UCAR. Data are stored by year up till a certain point and by month for near-real time data thereafter. Credentials are needed (see fetch function)
- climind.fetchers.fetcher_jra55_grid.check_file_status(file_path, file_size) None
Writes a status bar on the download. Not used.
- Parameters
file_path (str) – Path of the file
file_size (float) – Size of the file
- Return type
None
- climind.fetchers.fetcher_jra55_grid.download_file(filename: str, file_base: str, ret) None
Download a file.
- Parameters
filename (str) – URL of the file to be downloaded
file_base (str) – Name of the output file to which the data will be written
ret – Authentication information
- Return type
None
- climind.fetchers.fetcher_jra55_grid.fetch(_, out_dir: Path, _filename) None
Get JRA-55 files from UCAR. Requires the credentials:
username, specified by entry in .env UCAR_USER
password, specified by entry in .env UCAR_PSWD
- Parameters
_ – dummy input to match interface.
out_dir (Path) – Path of the directory to which the output will be written.
_filename (str) – Unused filename argument
- Return type
None
- climind.fetchers.fetcher_jra55_grid.get_files(filelist: List[str], web_path: str, ret) None
For each file in a file list, check if it already exists on the system and if it does not, attempt to download it.
- Parameters
filelist (List[str]) – List of files to be downloaded
web_path (str) – URL of the directory that contains the files.
ret – Authentication information
- Return type
None
- climind.fetchers.fetcher_jra55_grid.make_file_list(first_year, final_year) List[str]
Make a list of annual archived filenames between the two specified years.
- Parameters
first_year (int) – Year to start generation
final_year (int) – Year to end generation
- Returns
List of filenames for archived data between the specified years
- Return type
List[str]
- climind.fetchers.fetcher_jra55_grid.make_realtime_file_list(first_year: int, final_year: int) List[str]
Make a list of monthly real-time filenames between the two specified years.
- Parameters
first_year (int) – Year to start generation
final_year (int) – Year to end generation
- Returns
List of filenames for real-time data between the specified years
- Return type
List[str]
climind.fetchers.fetcher_no_url module
- climind.fetchers.fetcher_no_url.fetch(url, out_dir, filename)
A stub so that there is a fetcher for datasets which don’t exist online.
climind.fetchers.fetcher_noaaglobaltemp module
- climind.fetchers.fetcher_noaaglobaltemp.fetch(url: str, outdir: Path, _) None
Fetch NOAAGlobalTemp data. The script scrapes the directory specified in the URL for a file that matches the pattern specified in the URL.
- Parameters
url (srt) – URL of the file to be downloaded, containing wildcards for information that needs to be matched on a case by case basis.
outdir (Path) – Path of the directory to which the output will be written
- Return type
None
climind.fetchers.fetcher_promice module
- climind.fetchers.fetcher_promice.fetch(url: str, outdir: Path, _)
Fetch Greenland mass balance data. The script scrapes a webpage in order to find the specific URLs of the latest version of the dataset (these change daily). These files are then downloaded. There should be two files: a daily file and an annual file.
- Parameters
url (srt) – URL of the directory which contains the files to be downloaded
outdir (Path) – Path of the directory to which the output will be written
- Return type
None
climind.fetchers.fetcher_standard_url module
- climind.fetchers.fetcher_standard_url.fetch(url: str, outdir: Path, filename: str) None
Fetcher for a standard URL that can be accessed without restrictions, credentials, or any other tomfoolery.
- Parameters
url (str) – URL of the file to be downloaded.
outdir (Path) – Path of the directory to which the output will be written
filename (str) – Filename to save file as locally
- Return type
None
climind.fetchers.fetcher_url_with_backsearch module
- climind.fetchers.fetcher_url_with_backsearch.fetch(url: str, out_dir: Path, _) None
Fetch file but using a backsearch. Backsearching starts with the most recent month, creates a filename using that month to fill the year (YYYY) and month (MMMM) placeholders in the specified URL and then tries to download that file. Search proceeds backwards for 24 months from today’s date.
- Parameters
url (str) – URL of the file containing placeholders for the year (YYYY) and month (MMMM)
out_dir (Path) – Path to which the output will be written
- Return type
None
- climind.fetchers.fetcher_url_with_backsearch.filename_from_url(url: str) str
Extract just the filename from a URL.
- Parameters
url (str) – URL of a file
- Returns
The filename of the file specified by the URL
- Return type
str
climind.fetchers.fetcher_utils module
Contains a set of routines used by the fetchers to perform various standard tasks such as extracting the filename from a URL.
- climind.fetchers.fetcher_utils.dir_and_filename_from_url(url: str) Tuple[str, str]
Get the filename and url up to, but not including the filename
- Parameters
url (str) –
- Returns
Return directory name and filename
- Return type
str, str
- climind.fetchers.fetcher_utils.filename_from_url(url: str) str
Given an url, return the filename or an empty string if there is no filename
- Parameters
url (str) – URL to be parsed
- Returns
Return the filename part of the URL
- Return type
str
- climind.fetchers.fetcher_utils.get_ftp_host_and_directory_from_url(url: str) Tuple[str, List[str]]
From a url, extract the host name and the directory, the directory being broken down into a list of subdirectories
- Parameters
url (str) – URL to extract information from
- Returns
str – The host name
list – A list of directories
- climind.fetchers.fetcher_utils.url_from_filename(url: str, filename: str) str
Given an url and filename, replace the filename in the URL with the input filename
- Parameters
url (str) – URL specifying a file, for which the filename will be changed.
filename (str) – New filename to be use in output URL
- Returns
Returns the URL with the new filename
- Return type
str