tlt.datasets.hf_dataset.HFDataset¶
- class tlt.datasets.hf_dataset.HFDataset(dataset_dir, dataset_name='', dataset_catalog='')[source]¶
Base class to represent Hugging Face Dataset
Methods
__init__
(dataset_dir[, dataset_name, ...])Class constructor
get_batch
([subset])Get a single batch of images and labels from the dataset.
get_inc_dataloaders
()get_text
(input_ids)Helper function to decode the input_ids to text
preprocess
(model_name[, batch_size, ...])Preprocess the textual dataset to apply padding, truncation and tokenize.
shuffle_split
([train_pct, val_pct, ...])Randomly split the dataset into train, validation, and test subsets with a pseudo-random seed option.
Attributes
dataset
The framework dataset object
dataset_catalog
The string name of the dataset catalog (or None)
dataset_dir
Host directory containing the dataset files
dataset_name
Name of the dataset
test_loader
test_subset
A subset of the dataset held out for final testing/evaluation
train_loader
train_subset
A subset of the dataset used for training
validation_loader
validation_subset
A subset of the dataset used for validation/evaluation