Dataset ======= 1. [Introduction](#introduction) 2. [Supported Framework Dataset Matrix](#supported-framework-dataset-matrix) 3. [Get start with Dataset API](#get-start-with-dataset-api) 4. [Examples](#examples) ## Introduction To adapt to its internal dataloader API, IntelĀ® Neural Compressor implements some built-in datasets. A dataset is a container which holds all data that can be used by the dataloader, and have the ability to be fetched by index or created as an iterator. One can implement a specific dataset by inheriting from the Dataset class by implementing `__iter__` method or `__getitem__` method, while implementing `__getitem__` method, `__len__` method is recommended. Users can use Neural Compressor built-in dataset objects as well as register their own datasets. ## Supported Framework Dataset Matrix #### TensorFlow | Dataset | Parameters | Comments | Usage | | :------ | :------ | :------ | :------ | | MNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/MNIST/, otherwise user should put mnist.npz under root/MNIST/ manually. | **In yaml file:**
dataset:
   MNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['MNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | | FashionMNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train**(bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/FashionMNIST/, otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.| **In yaml file:**
dataset:
   FashionMNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['FashionMNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | | CIFAR10(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR10:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR10'] (root=root, train=False, transform=transform, filter=None, download=True) | | CIFAR100(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR100:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR100'] (root=root, train=False, transform=transform, filter=None, download=True) | | ImageRecord(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/validation-000-of-100
root/validation-001-of-100
...
root/validation-099-of-100
The file name needs to follow this pattern: '* - * -of- *' | **In yaml file:**
dataset:
   ImageRecord:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageRecord'] (root=root, transform=transform, filter=None)
| | ImageFolder(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/class_1/xxx.png
root/class_1/xxy.png
root/class_1/xxz.png
...
root/class_n/123.png
root/class_n/nsdf3.png
root/class_n/asd932_.png
Please put images of different categories into different folders. | **In yaml file:**
dataset:
   ImageFolder:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageFolder'] (root=root,transform=transform, filter=None) | | ImagenetRaw(data_path, image_list, transform, filter) | **data_path** (str): Root directory of dataset
**image_list** (str): data file, record image_names and their labels
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
data_path/img1.jpg
data_path/img2.jpg
...
data_path/imgx.jpg
dataset will read name and label of each image from image_list file, if user set image_list to None, it will read from data_path/val_map.txt automatically. | **In yaml file:**
dataset:
   ImagenetRaw:
     data_path: /path/to/image
     image_list: /path/to/label
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImagenetRaw'] (data_path, image_list, transform=transform, filter=None) | | COCORecord(root, num_cores, transform, filter) | **root** (str): Root directory of dataset
**num_cores** (int, default=28):The number of input Datasets to interleave from in parallel
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Root is a full path to tfrecord file, which contains the file name.
**Please use Resize transform when batch_size > 1** | **In yaml file:**
dataset:
   COCORecord:
     root: /path/to/tfrecord
     num_cores: 28
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORecord'] (root, num_cores=28, transform=transform, filter=None) | | COCORaw(root, img_dir, anno_dir, transform, filter) | **root** (str): Root directory of dataset
**img_dir** (str, default='val2017'): image file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
/root/img_dir/1.jpg
/root/img_dir/2.jpg
...
/root/img_dir/n.jpg
/root/anno_dir
**Please use Resize transform when batch_size > 1** | **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     img_dir: /path/to/image
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORaw'] (root, img_dir, anno_dir, transform=transform, filter=None)
If anno_dir is not set, the dataset will use default label map | | COCONpy(root, npy_dir, anno_dir) | **root** (str): Root directory of dataset
**npy_dir** (str, default='val2017'): npy file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory | Please arrange data in this way:
/root/npy_dir/1.jpg.npy
/root/npy_dir/2.jpg.npy
...
/root/npy_dir/n.jpg.npy
/root/anno_dir
**Please use Resize transform when batch_size > 1** | **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     npy_dir: /path/to/npy
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCONpy'] (root, npy_dir, anno_dir)
If anno_dir is not set, the dataset will use default label map | | dummy(shape, low, high, dtype, label, transform, filter) | **shape** (list or tuple):shape of total samples, the first dimension should be the sample count of the dataset. support create multi shape tensors, use list of tuples for each tuple in the list, will create a such size tensor.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**label** (bool, default=True):whether to return 0 as label
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy:
     shape: [3, 224, 224, 3]
     low: 0.0
     high: 127.0
     dtype: float32
     label: True
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy'] (shape, low, high, dtype, label, transform=None, filter=None) | | dummy_v2(input_shape, label_shape, low, high, dtype, transform, filter) | **input_shape** (list or tuple):create single or multi input tensors list represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy_v2:
     input_shape: [224, 224, 3]
     label_shape: [1]
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy_v2'] (input_shape, low, high, dtype, transform=None, filter=None) | | style_transfer(content_folder, style_folder, crop_ratio, resize_shape, image_format, transform, filter) | **content_folder** (str):Root directory of content images
**style_folder** (str):Root directory of style images
**crop_ratio** (float, default=0.1):cropped ratio to each side
**resize_shape** (tuple, default=(256, 256)):target size of image
**image_format** (str, default='jpg'): target image format
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Dataset used for style transfer task. This Dataset is to construct a dataset from two specific image holders representing content image folder and style image folder. | **In yaml file:**
dataset:
   style_transfer:
     content_folder: /path/to/content_folder
     style_folder: /path/to/style_folder
     crop_ratio: 0.1
     resize_shape: [256, 256]
     image_format: 'jpg'
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['style_transfer'] (content_folder, style_folder, crop_ratio, resize_shape, image_format, transform=transform, filter=None) | | TFRecordDataset(root, transform, filter) | **root** (str): filename of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions |Root is a full path to tfrecord file, which contains the file name. | **In yaml file:**
dataset:
   TFRecordDataset:
     root: /path/to/tfrecord
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['TFRecordDataset'] (root, transform=transform) | | bert(root, label_file, task, transform, filter) | **root** (str): path of dataset
**label_file** (str): path of label file
**task** (str, default='squad'): task type of model
**model_type** (str, default='bert'): model type, support 'bert'.
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset supports tfrecord data, please refer to [Guide](../examples/tensorflow/nlp/bert_large_squad/quantization/ptq/README.md) to create tfrecord file first. | **In yaml file:**
dataset:
   bert:
     root: /path/to/root
     label_file: /path/to/label_file
     task: squad
     model_type: bert
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['bert'] (root, label_file, transform=transform) | | sparse_dummy_v2(dense_shape, label_shape, sparse_ratio, low, high, dtype, transform, filter) | **dense_shape** (list or tuple):create single or multi sparse tensors, tuple represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**sparse_ratio** (float, default=0.5): the ratio of sparsity, support [0, 1].
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   sparse_dummy_v2:
     dense_shape: [224, 224, 3]
     label_shape: [1]
     sparse_ratio: 0.5
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['sparse_dummy_v2'] (dense_shape, label_shape, sparse_ratio, low, high, dtype, transform=None, filter=None) | #### PyTorch | Dataset | Parameters | Comments | Usage | | :------ | :------ | :------ | :------ | | MNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/MNIST/, otherwise user should put mnist.npz under root/MNIST/ manually. | **In yaml file:**
dataset:
   MNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['MNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | | FashionMNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train**(bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/FashionMNIST/, otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.| **In yaml file:**
dataset:
   FashionMNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['FashionMNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | | CIFAR10(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR10:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR10'] (root=root, train=False, transform=transform, filter=None, download=True) | | CIFAR100(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR100:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR100'] (root=root, train=False, transform=transform, filter=None, download=True) | | ImageFolder(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/class_1/xxx.png
root/class_1/xxy.png
root/class_1/xxz.png
...
root/class_n/123.png
root/class_n/nsdf3.png
root/class_n/asd932_.png
Please put images of different categories into different folders. | **In yaml file:**
dataset:
   ImageFolder:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageFolder'] (root=root,transform=transform, filter=None) | | ImagenetRaw(data_path, image_list, transform, filter) | **data_path** (str): Root directory of dataset
**image_list** (str): data file, record image_names and their labels
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
data_path/img1.jpg
data_path/img2.jpg
...
data_path/imgx.jpg
dataset will read name and label of each image from image_list file, if user set image_list to None, it will read from data_path/val_map.txt automatically. | **In yaml file:**
dataset:
   ImagenetRaw:
     data_path: /path/to/image
     image_list: /path/to/label
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImagenetRaw'] (data_path, image_list, transform=transform, filter=None) | | COCORaw(root, img_dir, anno_dir, transform, filter) | **root** (str): Root directory of dataset
**img_dir** (str, default='val2017'): image file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
/root/img_dir/1.jpg
/root/img_dir/2.jpg
...
/root/img_dir/n.jpg
/root/anno_dir
**Please use Resize transform when batch_size>1**| **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     img_dir: /path/to/image
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORaw'] (root, img_dir, anno_dir, transform=transform, filter=None)
If anno_dir is not set, the dataset will use default label map | | dummy(shape, low, high, dtype, label, transform, filter) | **shape** (list or tuple):shape of total samples, the first dimension should be the sample count of the dataset. support create multi shape tensors, use list of tuples for each tuple in the list, will create a such size tensor.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**label** (bool, default=True):whether to return 0 as label
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy:
     shape: [3, 224, 224, 3]
     low: 0.0
     high: 127.0
     dtype: float32
     label: True
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy'] (shape, low, high, dtype, label, transform=None, filter=None) | | dummy_v2(input_shape, label_shape, low, high, dtype, transform, filter) | **input_shape** (list or tuple):create single or multi input tensors list represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy_v2:
     input_shape: [224, 224, 3]
     label_shape: [1]
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy_v2'] (input_shape, low, high, dtype, transform=None, filter=None) | | bert(dataset, task, model_type, transform, filter) | **dataset** (list): list of data
**task** (str): the task of the model, support "classifier", "squad"
**model_type** (str, default='bert'): model type, support 'distilbert', 'bert', 'xlnet', 'xlm'
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This Dataset is to construct from the Bert TensorDataset and not a full implementation from yaml config. The original repo link is: https://github.com/huggingface/transformers. When you want use this Dataset, you should add it before you initialize your DataLoader. | **In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['bert'] (dataset, task, model_type, transform=transform, filter=None)
Now not support yaml implementation | | sparse_dummy_v2(dense_shape, label_shape, sparse_ratio, low, high, dtype, transform, filter) | **dense_shape** (list or tuple):create single or multi sparse tensors, tuple represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**sparse_ratio** (float, default=0.5): the ratio of sparsity, support [0, 1].
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   sparse_dummy_v2:
     dense_shape: [224, 224, 3]
     label_shape: [1]
     sparse_ratio: 0.5
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['sparse_dummy_v2'] (dense_shape, label_shape, sparse_ratio, low, high, dtype, transform=None, filter=None) | #### MXNet | Dataset | Parameters | Comments | Usage | | :------ | :------ | :------ | :------ | | MNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/MNIST/, otherwise user should put mnist.npz under root/MNIST/ manually. | **In yaml file:**
dataset:
   MNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['MNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | | FashionMNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train**(bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/FashionMNIST/, otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.| **In yaml file:**
dataset:
   FashionMNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['FashionMNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | | CIFAR10(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR10:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR10'] (root=root, train=False, transform=transform, filter=None, download=True) | | CIFAR100(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR100:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR100'] (root=root, train=False, transform=transform, filter=None, download=True) | | ImageFolder(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/class_1/xxx.png
root/class_1/xxy.png
root/class_1/xxz.png
...
root/class_n/123.png
root/class_n/nsdf3.png
root/class_n/asd932_.png
Please put images of different categories into different folders. | **In yaml file:**
dataset:
   ImageFolder:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageFolder'] (root=root,transform=transform, filter=None) | | ImagenetRaw(data_path, image_list, transform, filter) | **data_path** (str): Root directory of dataset
**image_list** (str): data file, record image_names and their labels
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
data_path/img1.jpg
data_path/img2.jpg
...
data_path/imgx.jpg
dataset will read name and label of each image from image_list file, if user set image_list to None, it will read from data_path/val_map.txt automatically. | **In yaml file:**
dataset:
   ImagenetRaw:
     data_path: /path/to/image
     image_list: /path/to/label
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImagenetRaw'] (data_path, image_list, transform=transform, filter=None) | | COCORaw(root, img_dir, anno_dir, transform, filter) | **root** (str): Root directory of dataset
**img_dir** (str, default='val2017'): image file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
/root/img_dir/1.jpg
/root/img_dir/2.jpg
...
/root/img_dir/n.jpg
/root/anno_dir
**Please use Resize transform when batch_size > 1**| **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     img_dir: /path/to/image
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORaw'] (root, img_dir, anno_dir, transform=transform, filter=None)
If anno_dir is not set, the dataset will use default label map | | dummy(shape, low, high, dtype, label, transform, filter) | **shape** (list or tuple):shape of total samples, the first dimension should be the sample count of the dataset. support create multi shape tensors, use list of tuples for each tuple in the list, will create a such size tensor.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**label** (bool, default=True):whether to return 0 as label
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy:
     shape: [3, 224, 224, 3]
     low: 0.0
     high: 127.0
     dtype: float32
     label: True
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy'] (shape, low, high, dtype, label, transform=None, filter=None) | | dummy_v2(input_shape, label_shape, low, high, dtype, transform, filter) | **input_shape** (list or tuple):create single or multi input tensors list represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy_v2:
     input_shape: [224, 224, 3]
     label_shape: [1]
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy_v2'] (input_shape, low, high, dtype, transform=None, filter=None) | | sparse_dummy_v2(dense_shape, label_shape, sparse_ratio, low, high, dtype, transform, filter) | **dense_shape** (list or tuple):create single or multi sparse tensors, tuple represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**sparse_ratio** (float, default=0.5): the ratio of sparsity, support [0, 1].
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   sparse_dummy_v2:
     dense_shape: [224, 224, 3]
     label_shape: [1]
     sparse_ratio: 0.5
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['sparse_dummy_v2'] (dense_shape, label_shape, sparse_ratio, low, high, dtype, transform=None, filter=None) | #### ONNXRT | Dataset | Parameters | Comments | Usage | | :------ | :------ | :------ | :------ | | MNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/MNIST/, otherwise user should put mnist.npz under root/MNIST/ manually. | **In yaml file:**
dataset:
   MNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['MNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | | FashionMNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train**(bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/FashionMNIST/, otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.| **In yaml file:**
dataset:
   FashionMNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['FashionMNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | | CIFAR10(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR10:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR10'] (root=root, train=False, transform=transform, filter=None, download=True) | | CIFAR100(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR100:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR100'] (root=root, train=False, transform=transform, filter=None, download=True) | | ImageFolder(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/class_1/xxx.png
root/class_1/xxy.png
root/class_1/xxz.png
...
root/class_n/123.png
root/class_n/nsdf3.png
root/class_n/asd932_.png
Please put images of different categories into different folders. | **In yaml file:**
dataset:
   ImageFolder:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageFolder'] (root=root,transform=transform, filter=None) | | ImagenetRaw(data_path, image_list, transform, filter) | **data_path** (str): Root directory of dataset
**image_list** (str): data file, record image_names and their labels
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
data_path/img1.jpg
data_path/img2.jpg
...
data_path/imgx.jpg
dataset will read name and label of each image from image_list file, if user set image_list to None, it will read from data_path/val_map.txt automatically. | **In yaml file:**
dataset:
   ImagenetRaw:
     data_path: /path/to/image
     image_list: /path/to/label
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImagenetRaw'] (data_path, image_list, transform=transform, filter=None) | | COCORaw(root, img_dir, anno_dir, transform, filter) | **root** (str): Root directory of dataset
**img_dir** (str, default='val2017'): image file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
/root/img_dir/1.jpg
/root/img_dir/2.jpg
...
/root/img_dir/n.jpg
/root/anno_dir
***Please use Resize transform when batch_size > 1**| **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     img_dir: /path/to/image
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORaw'] (root, img_dir, anno_dir, transform=transform, filter=None)
If anno_dir is not set, the dataset will use default label map | | dummy(shape, low, high, dtype, label, transform, filter) | **shape** (list or tuple):shape of total samples, the first dimension should be the sample count of the dataset. support create multi shape tensors, use list of tuples for each tuple in the list, will create a such size tensor.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**label** (bool, default=True):whether to return 0 as label
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy:
     shape: [3, 224, 224, 3]
     low: 0.0
     high: 127.0
     dtype: float32
     label: True
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy'] (shape, low, high, dtype, label, transform=None, filter=None) | | dummy_v2(input_shape, label_shape, low, high, dtype, transform, filter) | **input_shape** (list or tuple):create single or multi input tensors list represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy_v2:
     input_shape: [224, 224, 3]
     label_shape: [1]
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy_v2'] (input_shape, low, high, dtype, transform=None, filter=None) | | GLUE(data_dir, model_name_or_path, max_seq_length, do_lower_case, task, model_type, dynamic_length, evaluate, transform, filter) | **data_dir** (str): The input data dir
**model_name_or_path** (str): Path to pre-trained student model or shortcut name,
**max_seq_length** (int, default=128): The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.
**do_lower_case** (bool, default=True): Whether or not to lowercase the input.
**task** (bool, default=True): The name of the task to fine-tune. Choices include mrpc, qqp, qnli, rte, sts-b, cola, mnli, wnli.
**model_type** (str, default='bert'): model type, support 'distilbert', 'bert', 'mobilebert', 'roberta'.
**dynamic_length** (bool, default=False): Whether to use fixed sequence length.
**evaluate** (bool, default=True): Whether do evaluation or training.
**transform** (bool, default=True): If true,
**filter** (bool, default=True): If true, | Refer to [this example](/examples/onnxrt/language_translation/bert) on how to prepare dataset | **In yaml file:**
dataset:
   bert:
     data_dir: False
     model_name_or_path: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['bert'] (data_dir='/path/to/data/', model_name_or_path='bert-base-uncased', max_seq_length=128, task='mrpc', model_type='bert', dynamic_length=True, transform=None, filter=None) | | sparse_dummy_v2(dense_shape, label_shape, sparse_ratio, low, high, dtype, transform, filter) | **dense_shape** (list or tuple):create single or multi sparse tensors, tuple represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**sparse_ratio** (float, default=0.5): the ratio of sparsity, support [0, 1].
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   sparse_dummy_v2:
     dense_shape: [224, 224, 3]
     label_shape: [1]
     sparse_ratio: 0.5
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['sparse_dummy_v2'] (dense_shape, label_shape, sparse_ratio, low, high, dtype, transform=None, filter=None) | ## Get start with Dataset API ### Config dataloader in a yaml file ```yaml quantization: approach: post_training_static_quant calibration: dataloader: dataset: COCORaw: root: /path/to/calibration/dataset filter: LabelBalance: size: 1 transform: Resize: size: 300 evaluation: accuracy: metric: ... dataloader: batch_size: 16 dataset: COCORaw: root: /path/to/evaluation/dataset transform: Resize: size: 300 performance: dataloader: batch_size: 16 dataset: dummy_v2: input_shape: [224, 224, 3] ``` ## User-specific dataset Users can register their own datasets as follows: ```python class Dataset(object): def __init__(self, args): # init code here def __getitem__(self, idx): # use idx to get data and label return data, label def __len__(self): return len ``` After defining the dataset class, pass it to the quantizer: ```python from neural_compressor.experimental import Quantization, common quantizer = Quantization(yaml_file) quantizer.calib_dataloader = common.DataLoader(dataset) # user can pass more optional args to dataloader such as batch_size and collate_fn quantizer.model = graph quantizer.eval_func = eval_func q_model = quantizer.fit() ``` ## Examples - Refer to this [example](https://github.com/intel/neural-compressor/tree/v1.14.2/examples/onnxrt/object_detection/onnx_model_zoo/DUC/quantization/ptq) to learn how to define a customised dataset. - Refer to this [HelloWorld example](/examples/helloworld/tf_example6) to learn how to configure a built-in dataset.