tlt.datasets.text_generation.text_generation_dataset.TextGenerationDataset

class tlt.datasets.text_generation.text_generation_dataset.TextGenerationDataset(dataset_dir, dataset_name='', dataset_catalog='')[source]

Base class for a text generation dataset

__init__(dataset_dir, dataset_name='', dataset_catalog='')[source]

Class constructor

Methods

__init__(dataset_dir[, dataset_name, ...])

Class constructor

get_batch()

Get a single batch of images and labels from the dataset

preprocess(model_name[, batch_size, ...])

Preprocess the textual dataset to apply padding, truncation and tokenize.

Attributes

dataset

The framework dataset object

dataset_catalog

The string name of the dataset catalog (or None)

dataset_dir

Host directory containing the dataset files

dataset_name

Name of the dataset

test_subset

A subset of the dataset held out for final testing/evaluation

train_subset

A subset of the dataset used for training

validation_subset

A subset of the dataset used for validation/evaluation