tlt.datasets.image_anomaly_detection.pytorch_custom_image_anomaly_detection_dataset.PyTorchCustomImageAnomalyDetectionDataset¶

class tlt.datasets.image_anomaly_detection.pytorch_custom_image_anomaly_detection_dataset.PyTorchCustomImageAnomalyDetectionDataset(dataset_dir, dataset_name=None, num_workers=56, shuffle_files=True, defects=None)[source]¶

A custom image anomaly detection dataset that can be used with PyTorch models. Note that the directory of images is expected to be organized in one of two ways.

Method 1: With one subfolder named good and at least one other folder of defective examples. It does not matter what the names of the other folders are or how many there are, as long as there is at least one. All of the images in the non-good subfolders will be coded to bad and will only be used for validation/testing (not training).

dataset_dir
  ├── good
  ├── defective_type_a
  └── defective_type_b

Method 2: With subfolders named train and either validation or test. The train subdirectory should contain a folder named good with training samples, and the test/validation subdirectory should contain a folder named good and at least one other folder of defective examples for validation.

dataset_dir
  └── train
      └── good
  └── test
      ├── good
      ├── defective_type_a
      └── defective_type_b

Parameters

dataset_dir (str) – Directory where the data is located. It should contain subdirectories with images for each class.
dataset_name (str) – optional; Name of the dataset. If no dataset name is given, the dataset_dir folder name will be used as the dataset name.
num_workers (int) – optional; Number of processes to use for data loading, default is 56
shuffle_files (bool) – optional; Whether to shuffle the data. Defaults to True.
defects (list[str]) – Specific defects or category names to use for validation (default: None); if None, all subfolders in the dataset directory will be used.

Raises

FileNotFoundError if dataset directory does not exist or if a subdirectory named good is not found –

__init__(dataset_dir, dataset_name=None, num_workers=56, shuffle_files=True, defects=None)[source]¶: Class constructor

Methods

`__init__`(dataset_dir[, dataset_name, ...])	Class constructor
`get_batch`([subset, simsiam, cutpaste])	Get a single batch of images and labels from the dataset.
`get_inc_dataloaders`()
`preprocess`([image_size, batch_size, ...])	Preprocess the dataset to resize, normalize, and batch the images.
`shuffle_split`([train_pct, val_pct, ...])	Randomly split the good examples into train, validation, and test subsets with a pseudo-random seed option.
`simsiam_transform`(image_size)	Perform TwoCropsTransform and GaussianBlur on the dataset for SIMSIAM training.

Attributes

`class_names`	Returns the list of class names
`data_loader`	A data loader object corresponding to the dataset
`dataset`	Returns the framework dataset object (torch.utils.data.Dataset)
`dataset_catalog`	The string name of the dataset catalog (or None)
`dataset_dir`	Host directory containing the dataset files
`dataset_name`	Name of the dataset
`defect_names`	Returns the list of class names
`info`	Returns a dictionary of information about the dataset
`test_loader`	A data loader object corresponding to the test subset
`test_subset`	A subset of the dataset held out for final testing/evaluation
`train_loader`	A data loader object corresponding to the training subset
`train_subset`	A subset of the dataset used for training
`validation_loader`	A data loader object corresponding to the validation subset
`validation_subset`	A subset of the dataset used for validation/evaluation