tlt.datasets.hf_dataset.HFDataset¶
- class tlt.datasets.hf_dataset.HFDataset(dataset_dir, dataset_name='', dataset_catalog='')[source]¶
Base class to represent Hugging Face Dataset
Methods
__init__(dataset_dir[, dataset_name, ...])Class constructor
get_batch([subset])Get a single batch of images and labels from the dataset.
get_inc_dataloaders()get_text(input_ids)Helper function to decode the input_ids to text
preprocess(model_name[, batch_size, ...])Preprocess the textual dataset to apply padding, truncation and tokenize.
shuffle_split([train_pct, val_pct, ...])Randomly split the dataset into train, validation, and test subsets with a pseudo-random seed option.
Attributes
datasetThe framework dataset object
dataset_catalogThe string name of the dataset catalog (or None)
dataset_dirHost directory containing the dataset files
dataset_nameName of the dataset
test_loadertest_subsetA subset of the dataset held out for final testing/evaluation
train_loadertrain_subsetA subset of the dataset used for training
validation_loadervalidation_subsetA subset of the dataset used for validation/evaluation