nourish.load_dataset

nourish.load_dataset(name, *, version='latest', download=True, subdatasets=None)

High level function that wraps dataset.Dataset class’s load and download functionality. Downloads to and loads from directory: DATADIR/dataset_schemata_name/name/version where DATADIR is in nourish.get_config().DATADIR. DATADIR can be changed by calling init().

Parameters
  • name (str) – Name of the dataset you want to load from Nourish’s available datasets. You can get a list of these datasets by calling list_all_datasets().

  • version (str) – Version of the dataset to load. Latest version is used by default. You can get a list of all available versions for a dataset by calling list_all_datasets().

  • download (bool) – Whether or not the dataset should be downloaded before loading.

  • subdatasets (Optional[Iterable[str]]) – An iterable containing the subdatasets to load. None means all subdatasets.

Raises

FileNotFoundError – The dataset files were not previously downloaded or can’t be found, and download is False.

Returns

Dictionary that holds all subdatasets.

Return type

Dict[str, Any]

Example:

>>> data = load_dataset('noaa_jfk')
>>> data['jfk_weather_cleaned'][['DATE', 'HOURLYVISIBILITY', 'HOURLYDRYBULBTEMPF']].head(3)
                 DATE  HOURLYVISIBILITY  HOURLYDRYBULBTEMPF
0 2010-01-01 01:00:00               6.0                33.0
1 2010-01-01 02:00:00               6.0                33.0
2 2010-01-01 03:00:00               5.0                33.0