nourish.dataset.Dataset¶
-
class
nourish.dataset.Dataset(schema, data_dir, *, mode=<InitializationMode.LAZY: 0>)¶ Bases:
objectModels a particular dataset version along with download & load functionality.
- Parameters
schema –
nourish.schema.SchemaDictof a particular dataset version.data_dir – Directory to/from which the dataset should be downloaded/loaded from. The path can be either absolute or relative to the current working directory, but will be converted to the absolute path immediately upon initialization.
mode – Mode with which to treat a dataset. Available options are:
Dataset.InitializationMode.LAZY,Dataset.InitializationMode.DOWNLOAD_ONLY,Dataset.InitializationMode.LOAD_ONLY, andDataset.InitializationMode.DOWNLOAD_AND_LOAD.
- Raises
ValueError – An invalid
modewas specified for handling the dataset.
Example:
>>> from tempfile import TemporaryDirectory >>> import nourish >>> from nourish import schema >>> dataset_schemata = schema.DatasetSchemata('./tests/schemata/datasets.yaml') >>> jfk_schema = dataset_schemata.export_schema('datasets', 'noaa_jfk', '1.1.4') >>> jfk_data_dir = TemporaryDirectory() >>> jfk_dataset = Dataset(schema=jfk_schema, data_dir=jfk_data_dir.name) >>> jfk_dataset.download() >>> data = jfk_dataset.load() >>> data['jfk_weather_cleaned'].shape (75119, 16) >>> jfk_dataset.delete() # The directory jfk_data_dir is deleted here >>> jfk_dataset.is_downloaded() False
Methods
delete(*[, force])Clear the data directory.
download([check])Downloads, extracts, and removes dataset archive.
Check to see if the dataset was downloaded.
load([subdatasets, format_loader_map, check])Load data files to RAM.
Attributes
Access loaded data objects.