nourish.dataset.Dataset

class nourish.dataset.Dataset(schema, data_dir, *, mode=<InitializationMode.LAZY: 0>)

Bases: object

Models a particular dataset version along with download & load functionality.

Parameters
  • schemanourish.schema.SchemaDict of a particular dataset version.

  • data_dir – Directory to/from which the dataset should be downloaded/loaded from. The path can be either absolute or relative to the current working directory, but will be converted to the absolute path immediately upon initialization.

  • mode – Mode with which to treat a dataset. Available options are: Dataset.InitializationMode.LAZY, Dataset.InitializationMode.DOWNLOAD_ONLY, Dataset.InitializationMode.LOAD_ONLY, and Dataset.InitializationMode.DOWNLOAD_AND_LOAD.

Raises

ValueError – An invalid mode was specified for handling the dataset.

Example:

>>> from tempfile import TemporaryDirectory
>>> import nourish
>>> from nourish import schema
>>> dataset_schemata = schema.DatasetSchemata('./tests/schemata/datasets.yaml')
>>> jfk_schema = dataset_schemata.export_schema('datasets', 'noaa_jfk', '1.1.4')
>>> jfk_data_dir = TemporaryDirectory()
>>> jfk_dataset = Dataset(schema=jfk_schema, data_dir=jfk_data_dir.name)
>>> jfk_dataset.download()
>>> data = jfk_dataset.load()
>>> data['jfk_weather_cleaned'].shape
(75119, 16)
>>> jfk_dataset.delete()  # The directory jfk_data_dir is deleted here
>>> jfk_dataset.is_downloaded()
False

Methods

delete(*[, force])

Clear the data directory.

download([check])

Downloads, extracts, and removes dataset archive.

is_downloaded()

Check to see if the dataset was downloaded.

load([subdatasets, format_loader_map, check])

Load data files to RAM.

Attributes

data

Access loaded data objects.