tlt.datasets.text_generation.text_generation_dataset.TextGenerationDataset¶
- class tlt.datasets.text_generation.text_generation_dataset.TextGenerationDataset(dataset_dir, dataset_name='', dataset_catalog='')[source]¶
Base class for a text generation dataset
Methods
__init__(dataset_dir[, dataset_name, ...])Class constructor
get_batch()Get a single batch of images and labels from the dataset
preprocess(model_name[, batch_size, ...])Preprocess the textual dataset to apply padding, truncation and tokenize.
Attributes
datasetThe framework dataset object
dataset_catalogThe string name of the dataset catalog (or None)
dataset_dirHost directory containing the dataset files
dataset_nameName of the dataset
test_subsetA subset of the dataset held out for final testing/evaluation
train_subsetA subset of the dataset used for training
validation_subsetA subset of the dataset used for validation/evaluation