tlt.datasets.image_classification.pytorch_custom_image_classification_dataset.PyTorchCustomImageClassificationDataset¶

class tlt.datasets.image_classification.pytorch_custom_image_classification_dataset.PyTorchCustomImageClassificationDataset(dataset_dir, dataset_name=None, num_workers=0, shuffle_files=True)[source]¶

A custom image classification dataset that can be used with PyTorch models. Note that the directory of images is expected to be organized with subfolders for each image class. Each subfolder should contain .jpg images for the class. The name of the subfolder will be used as the class label.

dataset_dir
  ├── class_a
  ├── class_b
  └── class_c

For a user-defined split of train, validation, and test subsets, arrange class subfolders in accordingly named subfolders (note: the only acceptable names are ‘train’, ‘validation’, and/or ‘test’).

dataset_dir
  ├── train
  |   ├── class_a
  |   ├── class_b
  |   └── class_c
  ├── validation
  |   ├── class_a
  |   ├── class_b
  |   └── class_c
  └── test
      ├── class_a
      ├── class_b
      └── class_c

Parameters

dataset_dir (str) – Directory where the data is located. It should contain subdirectories with images for each class.
dataset_name (str) – optional; Name of the dataset. If no dataset name is given, the dataset_dir folder name will be used as the dataset name.
num_workers (int) – optional; Number of processes to use for data loading, default is 0
shuffle_files (bool) – optional; Whether to shuffle the data. Defaults to True.

Raises

FileNotFoundError – if dataset directory does not exist

__init__(dataset_dir, dataset_name=None, num_workers=0, shuffle_files=True)[source]¶: Class constructor

Methods

`__init__`(dataset_dir[, dataset_name, ...])	Class constructor
`get_batch`([subset])	Get a single batch of images and labels from the dataset.
`get_inc_dataloaders`()
`preprocess`([image_size, batch_size, add_aug])	Preprocess the dataset to resize, normalize, and batch the images.
`shuffle_split`([train_pct, val_pct, ...])	Randomly split the dataset into train, validation, and test subsets with a pseudo-random seed option.

Attributes

`class_names`	Returns the list of class names
`data_loader`	A data loader object corresponding to the dataset
`dataset`	Returns the framework dataset object
`dataset_catalog`	The string name of the dataset catalog (or None)
`dataset_dir`	Host directory containing the dataset files
`dataset_name`	Name of the dataset
`info`	Returns a dictionary of information about the dataset
`test_loader`	A data loader object corresponding to the test subset
`test_subset`	A subset of the dataset held out for final testing/evaluation
`train_loader`	A data loader object corresponding to the training subset
`train_subset`	A subset of the dataset used for training
`validation_loader`	A data loader object corresponding to the validation subset
`validation_subset`	A subset of the dataset used for validation/evaluation