tlt.datasets.image_classification.pytorch_custom_image_classification_dataset.PyTorchCustomImageClassificationDataset¶
- class tlt.datasets.image_classification.pytorch_custom_image_classification_dataset.PyTorchCustomImageClassificationDataset(dataset_dir, dataset_name=None, num_workers=0, shuffle_files=True)[source]¶
A custom image classification dataset that can be used with PyTorch models. Note that the directory of images is expected to be organized with subfolders for each image class. Each subfolder should contain .jpg images for the class. The name of the subfolder will be used as the class label.
dataset_dir ├── class_a ├── class_b └── class_c
For a user-defined split of train, validation, and test subsets, arrange class subfolders in accordingly named subfolders (note: the only acceptable names are ‘train’, ‘validation’, and/or ‘test’).
dataset_dir ├── train | ├── class_a | ├── class_b | └── class_c ├── validation | ├── class_a | ├── class_b | └── class_c └── test ├── class_a ├── class_b └── class_c
- Parameters
dataset_dir (str) – Directory where the data is located. It should contain subdirectories with images for each class.
dataset_name (str) – optional; Name of the dataset. If no dataset name is given, the dataset_dir folder name will be used as the dataset name.
num_workers (int) – optional; Number of processes to use for data loading, default is 0
shuffle_files (bool) – optional; Whether to shuffle the data. Defaults to True.
- Raises
FileNotFoundError – if dataset directory does not exist
- __init__(dataset_dir, dataset_name=None, num_workers=0, shuffle_files=True)[source]¶
Class constructor
Methods
__init__
(dataset_dir[, dataset_name, ...])Class constructor
get_batch
([subset])Get a single batch of images and labels from the dataset.
get_inc_dataloaders
()preprocess
([image_size, batch_size, add_aug])Preprocess the dataset to resize, normalize, and batch the images.
shuffle_split
([train_pct, val_pct, ...])Randomly split the dataset into train, validation, and test subsets with a pseudo-random seed option.
Attributes
class_names
Returns the list of class names
data_loader
A data loader object corresponding to the dataset
dataset
Returns the framework dataset object
dataset_catalog
The string name of the dataset catalog (or None)
dataset_dir
Host directory containing the dataset files
dataset_name
Name of the dataset
info
Returns a dictionary of information about the dataset
test_loader
A data loader object corresponding to the test subset
test_subset
A subset of the dataset held out for final testing/evaluation
train_loader
A data loader object corresponding to the training subset
train_subset
A subset of the dataset used for training
validation_loader
A data loader object corresponding to the validation subset
validation_subset
A subset of the dataset used for validation/evaluation