tlt.datasets.image_classification.tf_custom_image_classification_dataset.TFCustomImageClassificationDataset

class tlt.datasets.image_classification.tf_custom_image_classification_dataset.TFCustomImageClassificationDataset(dataset_dir, dataset_name=None, color_mode='rgb', shuffle_files=True, seed=None, **kwargs)[source]

A custom image classification dataset that can be used with TensorFlow models. Note that the directory of images is expected to be organized with subfolders for each image class. Each subfolder should contain .jpg images for the class. The name of the subfolder will be used as the class label.

dataset_dir
  ├── class_a
  ├── class_b
  └── class_c

For a user-defined split of train, validation, and test subsets, arrange class subfolders in accordingly named subfolders (note: the only acceptable names are ‘train’, ‘validation’, and/or ‘test’).

dataset_dir
  ├── train
  |   ├── class_a
  |   ├── class_b
  |   └── class_c
  ├── validation
  |   ├── class_a
  |   ├── class_b
  |   └── class_c
  └── test
      ├── class_a
      ├── class_b
      └── class_c
Parameters
  • dataset_dir (str) – Directory where the data is located. It should contain subdirectories with images for each class.

  • dataset_name (str) – optional; Name of the dataset. If no dataset name is given, the dataset_dir folder name will be used as the dataset name.

  • color_mode (str) – optional; Specify the color mode as “greyscale”, “rgb”, or “rgba”. Defaults to “rgb”.

  • shuffle_files (bool) – optional; Whether to shuffle the data. Defaults to True.

  • seed (int) – optional; Random seed for shuffling

Raises

FileNotFoundError – if dataset directory does not exist

__init__(dataset_dir, dataset_name=None, color_mode='rgb', shuffle_files=True, seed=None, **kwargs)[source]

Class constructor

Methods

__init__(dataset_dir[, dataset_name, ...])

Class constructor

get_batch([subset])

Get a single batch of images and labels from the dataset.

get_inc_dataloaders()

preprocess(image_size, batch_size[, ...])

Preprocess the dataset to convert to float32, resize, normalize, and batch the images

shuffle_split([train_pct, val_pct, ...])

Randomly split the dataset into train, validation, and test subsets with a pseudo-random seed option.

Attributes

class_names

Returns the list of class names

dataset

Returns the framework dataset object (tf.data.Dataset)

dataset_catalog

The string name of the dataset catalog (or None)

dataset_dir

Host directory containing the dataset files

dataset_name

Name of the dataset

info

Returns a dictionary of information about the dataset

test_subset

A subset of the dataset held out for final testing/evaluation

train_subset

A subset of the dataset used for training

validation_subset

A subset of the dataset used for validation/evaluation