climind.fetchers package
Submodules
climind.fetchers.fetcher_cds module
Fetcher which uses the Copernicus Climate Data Store to download ERA5 gridded data. The first time it is run, it will download all data. This will take a while.
- climind.fetchers.fetcher_cds.fetch(url: str, outdir: Path, filename: str) None[source]
Fetch all data in the range 1979 to 2022.
- Parameters:
url (str) – url of the file
outdir (Path) – Path to the directory to which the data will be written.
- Return type:
None
- climind.fetchers.fetcher_cds.fetch_to_year(out_dir: Path, year: int, variable: str = 'tas') None[source]
Fetch a specified year of data and write it to the outdir. If the year is incomplete, only recover available months.
- Parameters:
out_dir (Path) – directory to which the data will be written
year (int) – the year of data we want
variable (str) – Variable to be extracted - either tas or sealevel
- Return type:
None
- climind.fetchers.fetcher_cds.pick_months(year: int, now: datetime) List[str][source]
For a given year, return a list of strings containing the months which are available in the CDS for ERA5. For years before the current year, return all months. For the current year return all months up to last month, but only include last month if the day of the month is after the 7th
- Parameters:
year (int) – Year for which we want to pick months
now (datetime) – Today, used to assess how incomplete is the current year.
- Returns:
List of the months to download for the specified year.
- Return type:
List[str]
climind.fetchers.fetcher_cds_ensemble module
Fetcher which uses the Copernicus Climate Data Store to download ERA5 gridded data. The first time it is run, it will download all data. This will take a while.
- climind.fetchers.fetcher_cds_ensemble.fetch(url: str, outdir: Path, filename: str) None[source]
Fetch all data in the range 1979 to 2022.
- Parameters:
url (str) – url of the file
outdir (Path) – Path to the directory to which the data will be written.
- Return type:
None
- climind.fetchers.fetcher_cds_ensemble.fetch_year(out_dir: Path, year: int, month: int, day: int, variable: str = 'tas') None[source]
Fetch a specified year of data and write it to the outdir. If the year is incomplete, only recover available months.
- Parameters:
out_dir (Path) – directory to which the data will be written
year (int) – the year of data we want
variable (str) – Variable to be extracted - either tas or sealevel
- Return type:
None
climind.fetchers.fetcher_cmems_ftp module
- climind.fetchers.fetcher_cmems_ftp.fetch(url: str, out_dir: Path, _) None[source]
Fetch data from the CMEMS ftp system. Credentials required are
username, specified by entry in .env CMEMS_USER
password, specified by entry in .env CMEMS_PSWD
- Parameters:
url (str) – The URL of the file to be downloaded
out_dir (Path) – Path of the directory to which the files will be saved
- Return type:
None
climind.fetchers.fetcher_ersst module
- climind.fetchers.fetcher_ersst.fetch(url: str, out_dir: Path, _) None[source]
Fetch ERSST gridded dataset from NOAA. There is one file per month. Only files that have not already been downloaded will be downloaded.
- Parameters:
url (str) – URL of the file
out_dir (Path) – Path of the directory to which output will be written
- Return type:
None
climind.fetchers.fetcher_ftp module
climind.fetchers.fetcher_gpcc module
climind.fetchers.fetcher_gpcc_quantile module
- climind.fetchers.fetcher_gpcc_quantile.fetch(url: str, outdir: Path, _) None[source]
Fetch GPCC quantile data. The script scrapes the directory specified in the URL for a file that matches the pattern specified in the URL.
- Parameters:
url (str) – URL of the file to be downloaded, containing wildcards for information that needs to be matched on a case by case basis.
outdir (Path) – Path of the directory to which the output will be written
- Return type:
None
climind.fetchers.fetcher_grace module
- climind.fetchers.fetcher_grace.fetch(url: str, outdir: Path, _) None[source]
Fetch files from the PODAAC website. Note that the API URL base is: API_url = “https://podaac-tools.jpl.nasa.gov/drive/files”
Requires the credentials:
username, specified by entry in .env PODAAC_USER
password, specified by entry in .env PODAAC_PSWD
- Parameters:
url (str) – URL for the file
outdir (Path) – directory to which the file will be written.
- Return type:
None
climind.fetchers.fetcher_grace_aws module
climind.fetchers.fetcher_jaxa module
climind.fetchers.fetcher_jra3q_grid module
Set of scripts to download the JRA-55 gridded data. Adapted from the scripts provided by UCAR. Data are stored by year up till a certain point and by month for near-real time data thereafter. Credentials are needed (see fetch function)
- climind.fetchers.fetcher_jra3q_grid.download_file(filename: str, file_base: str, process: bool) None[source]
Download a file.
- Parameters:
filename (str) – URL of the file to be downloaded
file_base (str) – Name of the output file to which the data will be written
- Return type:
None
- climind.fetchers.fetcher_jra3q_grid.fetch(_, out_dir: Path, _filename) None[source]
Get JRA-55 files from UCAR. Requires the credentials:
username, specified by entry in .env UCAR_USER
password, specified by entry in .env UCAR_PSWD
- Parameters:
_ – dummy input to match interface.
out_dir (Path) – Path of the directory to which the output will be written.
_filename (str) – Unused filename argument
- Return type:
None
- climind.fetchers.fetcher_jra3q_grid.get_files(filelist: List[str], web_path: str, process: bool = False, output_filelist=None) None[source]
For each file in a file list, check if it already exists on the system and if it does not, attempt to download it.
- Parameters:
filelist (List[str]) – List of files to be downloaded
web_path (str) – URL of the directory that contains the files.
- Return type:
None
- climind.fetchers.fetcher_jra3q_grid.make_file_list(first_year, final_year) List[str][source]
Make a list of annual archived filenames between the two specified years.
- Parameters:
first_year (int) – Year to start generation
final_year (int) – Year to end generation
- Returns:
List of filenames for archived data between the specified years
- Return type:
List[str]
- climind.fetchers.fetcher_jra3q_grid.make_realtime_file_list(first_year: int, final_year: int) List[str][source]
Make a list of monthly real-time filenames between the two specified years.
- Parameters:
first_year (int) – Year to start generation
final_year (int) – Year to end generation
- Returns:
List of filenames for real-time data between the specified years
- Return type:
List[str]
climind.fetchers.fetcher_jra55_grid module
Set of scripts to download the JRA-55 gridded data. Adapted from the scripts provided by UCAR. Data are stored by year up till a certain point and by month for near-real time data thereafter. Credentials are needed (see fetch function)
- climind.fetchers.fetcher_jra55_grid.download_file(filename: str, file_base: str) None[source]
Download a file.
- Parameters:
filename (str) – URL of the file to be downloaded
file_base (str) – Name of the output file to which the data will be written
- Return type:
None
- climind.fetchers.fetcher_jra55_grid.fetch(_, out_dir: Path, _filename) None[source]
Get JRA-55 files from UCAR. Requires the credentials:
username, specified by entry in .env UCAR_USER
password, specified by entry in .env UCAR_PSWD
- Parameters:
_ – dummy input to match interface.
out_dir (Path) – Path of the directory to which the output will be written.
_filename (str) – Unused filename argument
- Return type:
None
- climind.fetchers.fetcher_jra55_grid.get_files(filelist: List[str], web_path: str) None[source]
For each file in a file list, check if it already exists on the system and if it does not, attempt to download it.
- Parameters:
filelist (List[str]) – List of files to be downloaded
web_path (str) – URL of the directory that contains the files.
- Return type:
None
- climind.fetchers.fetcher_jra55_grid.make_file_list(first_year, final_year) List[str][source]
Make a list of annual archived filenames between the two specified years.
- Parameters:
first_year (int) – Year to start generation
final_year (int) – Year to end generation
- Returns:
List of filenames for archived data between the specified years
- Return type:
List[str]
- climind.fetchers.fetcher_jra55_grid.make_realtime_file_list(first_year: int, final_year: int) List[str][source]
Make a list of monthly real-time filenames between the two specified years.
- Parameters:
first_year (int) – Year to start generation
final_year (int) – Year to end generation
- Returns:
List of filenames for real-time data between the specified years
- Return type:
List[str]
climind.fetchers.fetcher_no_url module
climind.fetchers.fetcher_noaaglobaltemp module
- climind.fetchers.fetcher_noaaglobaltemp.fetch(url: str, outdir: Path, _) None[source]
Fetch NOAAGlobalTemp data. The script scrapes the directory specified in the URL for a file that matches the pattern specified in the URL.
- Parameters:
url (str) – URL of the file to be downloaded, containing wildcards for information that needs to be matched on a case by case basis.
outdir (Path) – Path of the directory to which the output will be written
- Return type:
None
climind.fetchers.fetcher_promice module
- climind.fetchers.fetcher_promice.fetch(url: str, outdir: Path, _)[source]
Fetch Greenland mass balance data. The script scrapes a webpage in order to find the specific URLs of the latest version of the dataset (these change daily). These files are then downloaded. There should be two files: a daily file and an annual file.
- Parameters:
url (srt) – URL of the directory which contains the files to be downloaded
outdir (Path) – Path of the directory to which the output will be written
- Return type:
None
climind.fetchers.fetcher_standard_url module
- climind.fetchers.fetcher_standard_url.fetch(url: str, outdir: Path, filename: str) None[source]
Fetcher for a standard URL that can be accessed without restrictions, credentials, or any other tomfoolery.
- Parameters:
url (str) – URL of the file to be downloaded.
outdir (Path) – Path of the directory to which the output will be written
filename (str) – Filename to save file as locally
- Return type:
None
climind.fetchers.fetcher_standard_url_with_rename module
- climind.fetchers.fetcher_standard_url_with_rename.fetch(url: str, outdir: Path, filename: str) None[source]
Fetcher for a standard URL that can be accessed without restrictions, credentials, or any other tomfoolery.
- Parameters:
url (str) – URL of the file to be downloaded.
outdir (Path) – Path of the directory to which the output will be written
filename (str) – Filename to save file as locally
- Return type:
None
climind.fetchers.fetcher_url_with_backsearch module
- climind.fetchers.fetcher_url_with_backsearch.fetch(url: str, out_dir: Path, _) None[source]
Fetch file but using a backsearch. Backsearching starts with the most recent month, creates a filename using that month to fill the year (YYYY) and month (MMMM) placeholders in the specified URL and then tries to download that file. Search proceeds backwards for 24 months from today’s date.
- Parameters:
url (str) – URL of the file containing placeholders for the year (YYYY) and month (MMMM)
out_dir (Path) – Path to which the output will be written
- Return type:
None
climind.fetchers.fetcher_utils module
Contains a set of routines used by the fetchers to perform various standard tasks such as extracting the filename from a URL.
- climind.fetchers.fetcher_utils.dir_and_filename_from_url(url: str) Tuple[str, str][source]
Get the filename and url up to, but not including the filename
- Parameters:
url (str)
- Returns:
Return directory name and filename
- Return type:
str, str
- climind.fetchers.fetcher_utils.filename_from_url(url: str) str[source]
Given an url, return the filename or an empty string if there is no filename
- Parameters:
url (str) – URL to be parsed
- Returns:
Return the filename part of the URL
- Return type:
str
- climind.fetchers.fetcher_utils.get_ftp_host_and_directory_from_url(url: str) Tuple[str, List[str]][source]
From a url, extract the host name and the directory, the directory being broken down into a list of subdirectories
- Parameters:
url (str) – URL to extract information from
- Returns:
str – The host name
list – A list of directories
- climind.fetchers.fetcher_utils.get_n_months_back(y: int, m: int, back: int = 12) Tuple[int, int][source]
Get the year and month that, including the specified month, makes n months
- Parameters:
y (int) – Year
m (int) – Month
back (int) – Number of months to include
- Return type:
Tuple(int, int)
- climind.fetchers.fetcher_utils.url_from_filename(url: str, filename: str) str[source]
Given an url and filename, replace the filename in the URL with the input filename
- Parameters:
url (str) – URL specifying a file, for which the filename will be changed.
filename (str) – New filename to be use in output URL
- Returns:
Returns the URL with the new filename
- Return type:
str
Module contents
The fetchers package contains all the scripts needed to download the data sets. There are specific fetchers for regular URLs, ftp sites and datasets which are not available online.
In addition, some websites and data sets have peculiarities that mean a standard reader isn’t adequate. For example, NOAAGlobalTemp has a fixed directory, but the filename changes unpredictably as it contains the date of its creation. The fetcher for NOAAGlobalTemp therefore has to parse the webpage to find the appropriate file and then download that.
Other datasets require credentials, which are stored in a .env file which is in also in the fetchers directory, but not part of the package.