neural_compressor.data.datasets.dataset

This is the base class for each framework.

Module Contents

Classes

TensorflowDatasets

The base class of Tensorflow datasets class.

PyTorchDatasets

The base class of PyTorch datasets class.

MXNetDatasets

The base class of MXNet datasets class.

ONNXRTQLDatasets

The base class of ONNXRT QLinear datasets class.

ONNXRTITDatasets

The base class of ONNXRT IT datasets class.

PytorchMxnetWrapDataset

The base class for PyTorch and MXNet frameworks.

PytorchMxnetWrapFunction

The Helper class for PytorchMxnetWrapDataset.

Datasets

A base class for all framework datasets.

Dataset

The base class of dataset.

IterableDataset

An iterable Dataset.

CIFAR10

The CIFAR10 and CIFAR100 database.

PytorchCIFAR10

The PyTorch datasets for CIFAR10.

MXNetCIFAR10

The MXNet datasets for CIFAR10.

TensorflowCIFAR10

The Tensorflow datasets for CIFAR10.

CIFAR100

CIFAR100 database.

PytorchCIFAR100

The PyTorch datasets for CIFAR100.

MXNetCIFAR100

The MXNet datasets for CIFAR100.

TensorflowCIFAR100

The Tensorflow datasets for CIFAR100.

MNIST

Modified National Institute of Standards and Technology database and FashionMNIST database.

PytorchMNIST

The PyTorch datasets for MNIST.

MXNetMNIST

The MXNet datasets for MNIST.

TensorflowMNIST

The Tensorflow datasets for MNIST.

FashionMNIST

FashionMNIST database.

PytorchFashionMNIST

The PyTorch datasets for FashionMNIST.

MXNetFashionMNIST

The MXNet Dataset for FashionMNIST.

TensorflowFashionMNIST

The Tensorflow Dataset for FashionMNIST.

ImageFolder

The base class for ImageFolder.

MXNetImageFolder

The MXNet Dataset for image folder.

TensorflowImageFolder

The Tensorflow Dataset for image folder.

TensorflowTFRecordDataset

The Tensorflow TFRecord Dataset.

TensorflowImageRecord

Tensorflow imageNet database in tf record format.

TensorflowVOCRecord

The Tensorflow PASCAL VOC 2012 database in tf record format.

Functions

dataset_registry(dataset_type, framework[, dataset_format])

Register dataset subclasses.

download_url(url, root[, filename, md5])

Download from url.

gen_bar_updater()

Generate progress bar.

check_integrity(fpath, md5)

Check MD5 checksum.

calculate_md5(fpath[, chunk_size])

Generate MD5 checksum for a file.

Attributes

framework_datasets

The datasets supported by neural_compressor, it's model specific and can be configured by yaml file.

class neural_compressor.data.datasets.dataset.TensorflowDatasets

Bases: object

The base class of Tensorflow datasets class.

class neural_compressor.data.datasets.dataset.PyTorchDatasets

Bases: object

The base class of PyTorch datasets class.

class neural_compressor.data.datasets.dataset.MXNetDatasets

Bases: object

The base class of MXNet datasets class.

class neural_compressor.data.datasets.dataset.ONNXRTQLDatasets

Bases: object

The base class of ONNXRT QLinear datasets class.

class neural_compressor.data.datasets.dataset.ONNXRTITDatasets

Bases: object

The base class of ONNXRT IT datasets class.

class neural_compressor.data.datasets.dataset.PytorchMxnetWrapDataset(datafunc)

The base class for PyTorch and MXNet frameworks.

Parameters:

datafunc – The datasets class of PyTorch or MXNet.

class neural_compressor.data.datasets.dataset.PytorchMxnetWrapFunction(dataset, transform, filter, *args, **kwargs)

The Helper class for PytorchMxnetWrapDataset.

Parameters:
  • dataset (datasets class) – The datasets class of PyTorch or MXNet.

  • transform (transform object) – transform to process input data.

  • filter (Filter objects) – filter out examples according to specific conditions.

neural_compressor.data.datasets.dataset.framework_datasets

The datasets supported by neural_compressor, it’s model specific and can be configured by yaml file.

User could add new datasets by implementing new Dataset subclass under this directory. The naming convention of new dataset subclass should be something like ImageClassifier, user could choose this dataset by setting “imageclassifier” string in tuning.strategy field of yaml.

Datasets variable is used to store all implemented Dataset subclasses to support model specific dataset.

class neural_compressor.data.datasets.dataset.Datasets(framework)

Bases: object

A base class for all framework datasets.

Parameters:

framework (str) – framework name, like:”tensorflow”, “tensorflow_itex”, “mxnet”, “onnxrt_qdq”, “onnxrt_qlinearops”, “onnxrt_integerops”, “pytorch”, “pytorch_ipex”, “pytorch_fx”, “onnxrt_qoperator”.

neural_compressor.data.datasets.dataset.dataset_registry(dataset_type, framework, dataset_format='')

Register dataset subclasses.

Parameters:
  • cls (class) – The class of register.

  • dataset_type (str) – The dataset registration name

  • framework (str) – support 3 framework including ‘tensorflow’, ‘pytorch’, ‘mxnet’

  • data_format (str) – The format dataset saved, eg ‘raw_image’, ‘tfrecord’

Returns:

The class of register.

Return type:

cls

class neural_compressor.data.datasets.dataset.Dataset

Bases: object

The base class of dataset.

Subclass datasets should overwrite two methods: __getitem__ for indexing to data sample and `__len__`for the size of the dataset

class neural_compressor.data.datasets.dataset.IterableDataset

Bases: object

An iterable Dataset.

Subclass iterable dataset should also implement a method: __iter__ for interating over the samples of the dataset.

neural_compressor.data.datasets.dataset.download_url(url, root, filename=None, md5=None)

Download from url.

Parameters:
  • url (str) – the address to download from.

  • root (str) – the path for saving.

  • filename (str) – the file name for saving.

  • md5 (str) – the md5 string.

neural_compressor.data.datasets.dataset.gen_bar_updater()

Generate progress bar.

neural_compressor.data.datasets.dataset.check_integrity(fpath, md5)

Check MD5 checksum.

neural_compressor.data.datasets.dataset.calculate_md5(fpath, chunk_size=1024 * 1024)

Generate MD5 checksum for a file.

class neural_compressor.data.datasets.dataset.CIFAR10(root, train=False, transform=None, filter=None, download=True)

Bases: Dataset

The CIFAR10 and CIFAR100 database.

For CIFAR10: If download is True, it will download dataset to root/ and extract it

automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it.

For CIFAR100: If download is True, it will download dataset to root/ and extract it

automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it.

Parameters:
  • root (str) – Root directory of dataset.

  • train (bool, default=False) – If True, creates dataset from train subset, otherwise from validation subset.

  • transform (transform object, default=None) – transform to process input data.

  • filter (Filter objects, default=None) – filter out examples according to specific conditions.

  • download (bool, default=True) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

load_meta()

Load meta.

download()

Download a file.

class neural_compressor.data.datasets.dataset.PytorchCIFAR10(root, train=False, transform=None, filter=None, download=True)

Bases: CIFAR10

The PyTorch datasets for CIFAR10.

class neural_compressor.data.datasets.dataset.MXNetCIFAR10(root, train=False, transform=None, filter=None, download=True)

Bases: CIFAR10

The MXNet datasets for CIFAR10.

class neural_compressor.data.datasets.dataset.TensorflowCIFAR10(root, train=False, transform=None, filter=None, download=True)

Bases: CIFAR10

The Tensorflow datasets for CIFAR10.

class neural_compressor.data.datasets.dataset.CIFAR100(root, train=False, transform=None, filter=None, download=True)

Bases: CIFAR10

CIFAR100 database.

For CIFAR100: If download is True, it will download dataset to root/ and extract it

automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it.

Parameters:
  • root (str) – Root directory of dataset.

  • train (bool, default=False) – If True, creates dataset from train subset, otherwise from validation subset.

  • transform (transform object, default=None) – transform to process input data.

  • filter (Filter objects, default=None) – filter out examples according to specific conditions.

  • download (bool, default=True) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

class neural_compressor.data.datasets.dataset.PytorchCIFAR100(root, train=False, transform=None, filter=None, download=True)

Bases: CIFAR100

The PyTorch datasets for CIFAR100.

class neural_compressor.data.datasets.dataset.MXNetCIFAR100(root, train=False, transform=None, filter=None, download=True)

Bases: CIFAR100

The MXNet datasets for CIFAR100.

class neural_compressor.data.datasets.dataset.TensorflowCIFAR100(root, train=False, transform=None, filter=None, download=True)

Bases: CIFAR100

The Tensorflow datasets for CIFAR100.

class neural_compressor.data.datasets.dataset.MNIST(root, train=False, transform=None, filter=None, download=True)

Bases: Dataset

Modified National Institute of Standards and Technology database and FashionMNIST database.

For MNIST: If download is True, it will download dataset to root/MNIST/, otherwise user

should put mnist.npz under root/MNIST/ manually.

For FashionMNIST: If download is True, it will download dataset to root/FashionMNIST/,

otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.

Parameters:
  • root (str) – Root directory of dataset.

  • train (bool, default=False) – If True, creates dataset from train subset, otherwise from validation subset.

  • transform (transform object, default=None) – transform to process input data.

  • filter (Filter objects, default=None) – filter out examples according to specific conditions.

  • download (bool, default=True) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

property class_to_idx

Return a dict of class.

read_data()

Read data from a file.

download()

Download a file.

class neural_compressor.data.datasets.dataset.PytorchMNIST(root, train=False, transform=None, filter=None, download=True)

Bases: MNIST

The PyTorch datasets for MNIST.

class neural_compressor.data.datasets.dataset.MXNetMNIST(root, train=False, transform=None, filter=None, download=True)

Bases: MNIST

The MXNet datasets for MNIST.

class neural_compressor.data.datasets.dataset.TensorflowMNIST(root, train=False, transform=None, filter=None, download=True)

Bases: MNIST

The Tensorflow datasets for MNIST.

class neural_compressor.data.datasets.dataset.FashionMNIST(root, train=False, transform=None, filter=None, download=True)

Bases: MNIST

FashionMNIST database.

For FashionMNIST: If download is True, it will download dataset to root/FashionMNIST/,

otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.

Parameters:
  • root (str) – Root directory of dataset.

  • train (bool, default=False) – If True, creates dataset from train subset, otherwise from validation subset.

  • transform (transform object, default=None) – transform to process input data.

  • filter (Filter objects, default=None) – filter out examples according to specific conditions.

  • download (bool, default=True) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

read_data()

Read data from a file.

class neural_compressor.data.datasets.dataset.PytorchFashionMNIST(root, train=False, transform=None, filter=None, download=True)

Bases: FashionMNIST

The PyTorch datasets for FashionMNIST.

class neural_compressor.data.datasets.dataset.MXNetFashionMNIST(root, train=False, transform=None, filter=None, download=True)

Bases: FashionMNIST

The MXNet Dataset for FashionMNIST.

class neural_compressor.data.datasets.dataset.TensorflowFashionMNIST(root, train=False, transform=None, filter=None, download=True)

Bases: FashionMNIST

The Tensorflow Dataset for FashionMNIST.

class neural_compressor.data.datasets.dataset.ImageFolder(root, transform=None, filter=None)

Bases: Dataset

The base class for ImageFolder.

Expects the data folder to contain subfolders representing the classes to which its images belong.

Please arrange data in this way:

root/class_1/xxx.png root/class_1/xxy.png root/class_1/xxz.png … root/class_n/123.png root/class_n/nsdf3.png root/class_n/asd932_.png

Please put images of different categories into different folders.

Args: root (str): Root directory of dataset.

transform (transform object, default=None): transform to process input data. filter (Filter objects, default=None): filter out examples according to specific

conditions.

class neural_compressor.data.datasets.dataset.MXNetImageFolder(root, transform=None, filter=None)

Bases: ImageFolder

The MXNet Dataset for image folder.

Expects the data folder to contain subfolders representing the classes to which its images belong.

Please arrange data in this way:

root/class_1/xxx.png root/class_1/xxy.png root/class_1/xxz.png … root/class_n/123.png root/class_n/nsdf3.png root/class_n/asd932_.png

Please put images of different categories into different folders.

Args: root (str): Root directory of dataset.

transform (transform object, default=None): transform to process input data. filter (Filter objects, default=None): filter out examples according to specific

conditions.

class neural_compressor.data.datasets.dataset.TensorflowImageFolder(root, transform=None, filter=None)

Bases: ImageFolder

The Tensorflow Dataset for image folder.

Expects the data folder to contain subfolders representing the classes to which its images belong.

Please arrange data in this way:

root/class_1/xxx.png root/class_1/xxy.png root/class_1/xxz.png … root/class_n/123.png root/class_n/nsdf3.png root/class_n/asd932_.png

Please put images of different categories into different folders.

Args: root (str): Root directory of dataset.

transform (transform object, default=None): transform to process input data. filter (Filter objects, default=None): filter out examples according to specific

conditions.

class neural_compressor.data.datasets.dataset.TensorflowTFRecordDataset

Bases: IterableDataset

The Tensorflow TFRecord Dataset.

Root is a full path to tfrecord file, which contains the file name.

Args: root (str): filename of dataset.

transform (transform object, default=None): transform to process input data. filter (Filter objects, default=None): filter out examples according

to specific conditions.

class neural_compressor.data.datasets.dataset.TensorflowImageRecord

Bases: IterableDataset

Tensorflow imageNet database in tf record format.

Please arrange data in this way:

root/validation-000-of-100 root/validation-001-of-100 … root/validation-099-of-100

The file name needs to follow this pattern: ‘* - * -of- *

Args: root (str): Root directory of dataset.

transform (transform object, default=None): transform to process input data. filter (Filter objects, default=None): filter out examples according

to specific conditions.

class neural_compressor.data.datasets.dataset.TensorflowVOCRecord

Bases: IterableDataset

The Tensorflow PASCAL VOC 2012 database in tf record format.

Please arrange data in this way:

root/val-00000-of-00004.tfrecord root/val-00001-of-00004.tfrecord … root/val-00003-of-00004.tfrecord

The file name needs to follow this pattern: ‘val--of-

Args: root (str): Root directory of dataset.

transform (transform object, default=None): transform to process input data. filter (Filter objects, default=None): filter out examples according

to specific conditions.