tlt.datasets.text_classification.tfds_text_classification_dataset.TFDSTextClassificationDataset

class tlt.datasets.text_classification.tfds_text_classification_dataset.TFDSTextClassificationDataset(dataset_dir, dataset_name, split=['train'], shuffle_files=True, **kwargs)[source]

A text classification dataset from the TensorFlow datasets catalog

__init__(dataset_dir, dataset_name, split=['train'], shuffle_files=True, **kwargs)[source]

Class constructor

Methods

__init__(dataset_dir, dataset_name[, split, ...])

Class constructor

get_batch([subset])

Get a single batch of images and labels from the dataset.

get_inc_dataloaders(hub_name, max_seq_length)

get_str_label(numerical_value)

Returns the string label (class name) associated with the specified numerical value.

preprocess(batch_size)

Batch the dataset

shuffle_split([train_pct, val_pct, ...])

Randomly split the dataset into train, validation, and test subsets with a pseudo-random seed option.

Attributes

class_names

dataset

The framework dataset object

dataset_catalog

The string name of the dataset catalog (or None)

dataset_dir

Host directory containing the dataset files

dataset_name

Name of the dataset

info

test_subset

A subset of the dataset held out for final testing/evaluation

train_subset

A subset of the dataset used for training

validation_subset

A subset of the dataset used for validation/evaluation