tlt.datasets.hf_dataset.HFDataset

class tlt.datasets.hf_dataset.HFDataset(dataset_dir, dataset_name='', dataset_catalog='')[source]

Base class to represent Hugging Face Dataset

__init__(dataset_dir, dataset_name='', dataset_catalog='')[source]

Class constructor

Methods

__init__(dataset_dir[, dataset_name, ...])

Class constructor

get_batch([subset])

Get a single batch of images and labels from the dataset.

get_inc_dataloaders()

get_text(input_ids)

Helper function to decode the input_ids to text

preprocess(model_name[, batch_size, ...])

Preprocess the textual dataset to apply padding, truncation and tokenize.

shuffle_split([train_pct, val_pct, ...])

Randomly split the dataset into train, validation, and test subsets with a pseudo-random seed option.

Attributes

dataset

The framework dataset object

dataset_catalog

The string name of the dataset catalog (or None)

dataset_dir

Host directory containing the dataset files

dataset_name

Name of the dataset

test_loader

test_subset

A subset of the dataset held out for final testing/evaluation

train_loader

train_subset

A subset of the dataset used for training

validation_loader

validation_subset

A subset of the dataset used for validation/evaluation