Nourish (Under Development)

PyPI PyPI - Python Version PyPI - Implementation Gitter Runtime Tests Lint Docs Development Environment Coverage

Nourish is a Python API that enables data consumers and distributors to easily use and share datasets, and establishes a standard for exchanging data assets. It enables:

  • a data scientist to have a simpler and more unified way to begin working with a wide range of datasets, and

  • a data distributor to have a consistent, safe, and open source way to share datasets with interested communities.

Install the Package & its Dependencies

To install the latest version of Nourish, run

$ pip install nourish

Alternatively, if you have downloaded the source, switch to the source directory (same directory as this README file, cd /path/to/nourish-source) and run

$ pip install -U .

Quick Start

Import the package and load a dataset. Nourish will download WikiText-103 dataset (version 1.0.1) if it’s not already downloaded, and then load it.

import nourish
wikitext103_data = nourish.load_dataset('wikitext103')

View available Nourish datasets and their versions.

>>> nourish.list_all_datasets()
{'claim_sentences_search': ('1.0.2',), ..., 'wikitext103': ('1.0.1',)}

To view your globally set configs for Nourish, such as your default data directory, use nourish.get_config().

>>> nourish.get_config()
Config(DATADIR=PosixPath('dir/to/dowload/load/from'), ..., DATASET_SCHEMATA_URL='file/to/load/datasets/from')

By default, nourish.load_dataset() downloads to and loads from ~/.nourish/data/<dataset-name>/<dataset-version>/. To change the default data directory, use nourish.init().

nourish.init(DATADIR='new/dir/to/dowload/load/from')

Load a previously downloaded dataset using nourish.load_dataset(). With the new default data dir set, Nourish now searches for the Groningen Meaning Bank dataset (version 1.0.2) in new/dir/to/dowload/load/from/gmb/1.0.2/.

gmb_data = load_dataset('gmb', version='1.0.2', download=False)  # assuming GMB dataset was already downloaded

Notebooks

For a more extensive look at Nourish functionality, check out these notebooks:

User Guide

API References

nourish

Nourish package and high level functions.

nourish.dataset

Dataset downloading and loading functionality.

nourish.exceptions

Custom exceptions used by this package.

nourish.schema

Schemata parsing and loading functionality.

Loaders

nourish.loaders

Loaders subpackage.

nourish.loaders.table

Tabular data loaders.

nourish.loaders.text

Text file loaders.