Transform
=========

1. [Introduction](#introduction)

2. [Transform support list](#transform-support-list)

    2.1 [Tensorflow](#tensorflow)

    2.2 [Pytorch](#pytorch)

    2.3 [MXNet](#mxnet)

    2.4 [ONNXRT](#onnxrt)

## Introduction

Neural Compressor supports built-in preprocessing methods on different framework backends. Refer to [this HelloWorld example](/examples/helloworld/tf_example1) on how to configure a transform in a dataloader.

## Transform Support List

### TensorFlow

| Transform | Parameters | Comments | Usage(In yaml file) |
| :------ | :------ | :------ | :------ |
| Resize(size, interpolation) | **size** (list or int): Size of the result <br> **interpolation** (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' | Resize the input image to the given size | Resize: <br> &ensp;&ensp; size: 256 <br> &ensp;&ensp;  interpolation: bilinear |
| CenterCrop(size) | **size** (list or int): Size of the result | Crop the given image at the center to the given size | CenterCrop: <br> &ensp;&ensp; size: [10, 10] # or size: 10 |
| RandomResizedCrop(size, scale, ratio, interpolation) | **size** (list or int): Size of the result <br> **scale** (tuple or list, default=(0.08, 1.0)): range of the size of the origin size cropped <br> **ratio** (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped <br> **interpolation** (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest' | Crop the given image to random size and aspect ratio | RandomResizedCrop: <br> &ensp;&ensp; size: [10, 10] # or size: 10 <br> &ensp;&ensp; scale: [0.08, 1.0] <br> &ensp;&ensp; ratio: [3. / 4., 4. / 3.] <br> &ensp;&ensp; interpolation: bilinear |
| Normalize(mean, std) | **mean** (list, default=[0.0]): means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape <br> **std** (list, default=[1.0]):stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape | Normalize a image with mean and standard deviation | Normalize: <br> &ensp;&ensp; mean: [0.0, 0.0, 0.0] <br> &ensp;&ensp; std: [1.0, 1.0, 1.0] |
| RandomCrop(size) | **size** (list or int): Size of the result | Crop the image at a random location to the given size | RandomCrop: <br> &ensp;&ensp; size: [10, 10] # size: 10 |
| Compose(transform_list) | **transform_list** (list of Transform objects):  list of transforms to compose | Compose several transforms together | If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms. <br> **In user code:** <br> from neural_compressor.experimental.data  import TRANSFORMS <br> preprocess = TRANSFORMS(framework, 'preprocess') <br> resize = preprocess["Resize"] (\**args) <br> normalize = preprocess["Normalize"] (\**args) <br> compose = preprocess["Compose"] ([resize, normalize]) <br> sample = compose(sample) <br> # sample: image, label |
| CropResize(x, y, width, height, size, interpolation) | **x** (int): Left boundary of the cropping area <br> **y** (int): Top boundary of the cropping area <br> **width** (int): Width of the cropping area <br> **height** (int): Height of the cropping area <br> **size** (list or int): resize to new size after cropping <br> **interpolation** (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest' and 'bicubic' | Crop the input image with given location and resize it| CropResize: <br> &ensp;&ensp; x: 0 <br> &ensp;&ensp; y: 5 <br> &ensp;&ensp; width: 224 <br> &ensp;&ensp; height: 224 <br> &ensp;&ensp; size: [100, 100] # or size: 100 <br> &ensp;&ensp; interpolation: bilinear |
| RandomHorizontalFlip() | None | Horizontally flip the given image randomly | RandomHorizontalFlip: {} |
| RandomVerticalFlip() | None | Vertically flip the given image randomly | RandomVerticalFlip: {} |
| DecodeImage() | None | Decode a JPEG-encoded image to a uint8 tensor | DecodeImage: {} |
| EncodeJped() | None | Encode image to a  Tensor of type string | EncodeJped: {} |
| Transpose(perm) | **perm** (list): A permutation of the dimensions of input image | Transpose image according perm | Transpose: <br> &ensp;&ensp; perm: [1, 2, 0] |
| ResizeWithRatio(min_dim, max_dim, padding) | **min_dim** (int, default=800): Resizes the image such that its smaller dimension == min_dim <br> **max_dim** (int, default=1365): Ensures that the image longest side does not exceed this value <br> **padding** (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim | Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array or tf.Tensor. | ResizeWithRatio: <br> &ensp;&ensp; min_dim: 800 <br> &ensp;&ensp; max_dim: 1365 <br> &ensp;&ensp; padding: True |
| CropToBoundingBox(offset_height, offset_width, target_height, target_width) | **offset_height** (int): Vertical coordinate of the top-left corner of the result in the input <br> **offset_width** (int): Horizontal coordinate of the top-left corner of the result in the input <br> **target_height** (int): Height of the result <br> **target_width** (int): Width of the result | Crop an image to a specified bounding box | CropToBoundingBox: <br> &ensp;&ensp; offset_height: 10 <br> &ensp;&ensp; offset_width: 10 <br> &ensp;&ensp; target_height: 224 <br> &ensp;&ensp; 224 |
| Cast(dtype) | **dtype** (str, default='float32'): A dtype to convert image to | Convert image to given dtype | Cast: <br> &ensp;&ensp; dtype: float32 |
| ToArray() | None | Convert PIL Image to numpy array | ToArray: {} |
| Rescale() | None | Scale the values of image to [0,1] | Rescale: {} |
| AlignImageChannel(dim) | **dim** (int): The channel number of result image | Align image channel, now just support [H,W]->[H,W,dim], [H,W,4]->[H,W,3] and [H,W,3]->[H,W]. <br> This transform is going to be deprecated. | AlignImageChannel: <br> &ensp;&ensp; dim: 3 |
| ParseDecodeImagenet() | None | Parse features in Example proto | ParseDecodeImagenet: {} |
| ResizeCropImagenet(height, width, random_crop, resize_side, random_flip_left_right, mean_value, scale) | **height** (int): Height of the result <br> **width** (int): Width of the result <br> **random_crop** (bool, default=False): whether to random crop <br> **resize_side** (int, default=256):desired shape after resize operation <br> **random_flip_left_right** (bool, default=False): whether to random flip left and right <br> **mean_value** (list, default=[0.0,0.0,0.0]):means for each channel <br> **scale** (float, default=1.0):std value | Combination of a series of transforms which is applicable to images in Imagenet | ResizeCropImagenet: <br> &ensp;&ensp; height: 224 <br> &ensp;&ensp; width: 224 <br> &ensp;&ensp; random_crop: False <br> &ensp;&ensp; resize_side: 256 <br> &ensp;&ensp; random_flip_left_right: False <br> &ensp;&ensp; mean_value: [123.68, 116.78, 103.94] <br> &ensp;&ensp; scale: 0.017 |
| QuantizedInput(dtype, scale) | **dtype**(str): desired image dtype, support 'uint8', 'int8' <br> **scale**(float, default=None):scaling ratio of each point in image | Convert the dtype of input to quantize it | QuantizedInput: <br> &ensp;&ensp; dtype: 'uint8' |
| LabelShift(label_shift) | **label_shift**(int, default=0): number of label shift | Convert label to label - label_shift | LabelShift: <br> &ensp;&ensp; label_shift: 0 |
| BilinearImagenet(height, width, central_fraction, mean_value, scale) | **height**(int): Height of the result <br> **width**(int):Width of the result <br> **central_fraction**(float, default=0.875):fraction of size to crop <br> **mean_value**(list, default=[0.0,0.0,0.0]):means for each channel <br> **scale**(float, default=1.0):std value | Combination of a series of transforms which is applicable to images in Imagenet | BilinearImagenet: <br> &ensp;&ensp; height: 224 <br> &ensp;&ensp; width: 224 <br> &ensp;&ensp; central_fraction: 0.875 <br> &ensp;&ensp; mean_value: [0.0,0.0,0.0] <br> &ensp;&ensp; scale: 1.0 |
| SquadV1(label_file, n_best_size, max_seq_length, max_query_length, max_answer_length, do_lower_case, doc_stride) | **label_file** (str): path of label file <br> **vocab_file**(str): path of vocabulary file <br> **n_best_size** (int, default=20): The total number of n-best predictions to generate in the nbest_predictions.json output file <br> **max_seq_length** (int, default=384): The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter, than this will be padded <br> **max_query_length** (int, default=64): The maximum number of tokens for the question. Questions longer than this will be truncated to this length <br> **max_answer_length** (int, default=30): The maximum length of an answer that can be generated. This is needed because the start and end predictions are not conditioned on one another <br> **do_lower_case** (bool, default=True): Whether to lower case the input text. Should be True for uncased models and False for cased models <br> **doc_stride** (int, default=128): When splitting up a long document into chunks, how much stride to take between chunks | Postprocess the predictions of bert on SQuAD | SquadV1 <br> &ensp;&ensp; label_file: /path/to/label_file <br> &ensp;&ensp; n_best_size: 20 <br> &ensp;&ensp; max_seq_length: 384 <br> &ensp;&ensp; max_query_length: 64 <br> &ensp;&ensp; max_answer_length: 30 <br> &ensp;&ensp; do_lower_case: True <br> &ensp;&ensp; doc_stride: True |

### Pytorch

| Transform | Parameters | Comments | Usage(In yaml file) |
| :------ | :------ | :------ | :------ |
| Resize(size) | **size** (list or int): Size of the result <br> interpolation(str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' | Resize the input image to the given size | Resize: <br> &ensp;&ensp; size: 256 <br> &ensp;&ensp;  interpolation: bilinear |
| CenterCrop(size) | **size** (list or int): Size of the result | Crop the given image at the center to the given size | CenterCrop: <br> &ensp;&ensp; size: [10, 10] # or size: 10 |
| RandomResizedCrop(size, scale, ratio, interpolation) | **size** (list or int): Size of the result <br> **scale** (tuple or list, default=(0.08, 1.0)): range of size of the origin size cropped <br> **ratio** (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped <br> **interpolation** (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' | Crop the given image to random size and aspect ratio | RandomResizedCrop: <br> &ensp;&ensp; size: [10, 10] # or size: 10 <br> &ensp;&ensp; scale: [0.08, 1.0] <br> &ensp;&ensp; ratio: [3. / 4., 4. / 3.] <br> &ensp;&ensp; interpolation: bilinear |
| Normalize(mean, std) | **mean** (list, default=[0.0]): means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape <br> **std** (list, default=[1.0]): stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape | Normalize a image with mean and standard deviation | Normalize: <br> &ensp;&ensp; mean: [0.0, 0.0, 0.0] <br> &ensp;&ensp; std: [1.0, 1.0, 1.0] |
| RandomCrop(size) | **size** (list or int): Size of the result | Crop the image at a random location to the given size | RandomCrop: <br> &ensp;&ensp; size: [10, 10] # size: 10 |
| Compose(transform_list) | **transform_list** (list of Transform objects):  list of transforms to compose | Compose several transforms together | If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms. <br> **In user code:** <br> from neural_compressor.experimental.data  import TRANSFORMS <br> preprocess = TRANSFORMS(framework, 'preprocess') <br> resize = preprocess["Resize"] (\**args) <br> normalize = preprocess["Normalize"] (\**args) <br> compose = preprocess["Compose"] ([resize, normalize]) <br> sample = compose(sample) <br> # sample: image, label|
| RandomHorizontalFlip() | None | Horizontally flip the given image randomly | RandomHorizontalFlip: {} |
| RandomVerticalFlip() | None | Vertically flip the given image randomly | RandomVerticalFlip: {} |
| Transpose(perm) | **perm** (list): A permutation of the dimensions of input image | Transpose image according perm | Transpose: <br> &ensp;&ensp; perm: [1, 2, 0] |
| CropToBoundingBox(offset_height, offset_width, target_height, target_width) | **offset_height** (int): Vertical coordinate of the top-left corner of the result in the input <br> **offset_width** (int): Horizontal coordinate of the top-left corner of the result in the input <br> **target_height** (int): Height of the result <br> **target_width** (int): Width of the result | Crop an image to a specified bounding box | CropToBoundingBox: <br> &ensp;&ensp; offset_height: 10 <br> &ensp;&ensp; offset_width: 10 <br> &ensp;&ensp; target_height: 224 <br> &ensp;&ensp; 224 |
| ToTensor() | None | Convert a PIL Image or numpy.ndarray to tensor | ToTensor: {} |
| ToPILImage() | None | Convert a tensor or an ndarray to PIL Image | ToPILImage: {} |
| Pad(padding, fill, padding_mode) | **padding** (int or tuple or list): Padding on each border <br> **fill** (int or str or tuple): Pixel fill value for constant fill. Default is 0 <br> **padding_mode** (str): Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant | Pad the given image on all sides with the given “pad” value | Pad: <br> &ensp;&ensp; padding: 0 <br> &ensp;&ensp; fill: 0 <br> &ensp;&ensp; padding_mode: constant |
| ColorJitter(brightness, contrast, saturation, hue) | **brightness** (float or tuple of python:float (min, max)): How much to jitter brightness. Default is 0 <br> **contrast** (float or tuple of python:float (min, max)): How much to jitter contrast. Default is 0 <br> **saturation** (float or tuple of python:float (min, max)): How much to jitter saturation. Default is 0 <br> **hue** (float or tuple of python:float (min, max)): How much to jitter hue. Default is 0 | Randomly change the brightness, contrast, saturation and hue of an image | ColorJitter: <br> &ensp;&ensp; brightness: 0 <br> &ensp;&ensp; contrast: 0 <br> &ensp;&ensp; saturation: 0 <br> &ensp;&ensp; hue: 0 |
| ToArray() | None | Convert PIL Image to numpy array | ToArray: {} |
| CropResize(x, y, width, height, size, interpolation) | **x** (int):Left boundary of the cropping area <br> **y** (int):Top boundary of the cropping area <br> **width** (int):Width of the cropping area <br> **height** (int):Height of the cropping area <br> **size** (list or int): resize to new size after cropping <br> **interpolation** (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' | Crop the input image with given location and resize it | CropResize: <br> &ensp;&ensp; x: 0 <br> &ensp;&ensp; y: 5 <br> &ensp;&ensp; width: 224 <br> &ensp;&ensp; height: 224 <br> &ensp;&ensp; size: [100, 100] # or size: 100 <br> &ensp;&ensp; interpolation: bilinear |
| Cast(dtype) | **dtype** (str, default ='float32'): The target data type | Convert image to given dtype | Cast: <br> &ensp;&ensp; dtype: float32 |
| AlignImageChannel(dim) | **dim** (int): The channel number of result image | Align image channel, now just support [H,W,4]->[H,W,3] and [H,W,3]->[H,W], input image must be PIL Image. <br> This transform is going to be deprecated. | AlignImageChannel: <br> &ensp;&ensp; dim: 3 |
| ResizeWithRatio(min_dim, max_dim, padding) | **min_dim** (int, default=800): Resizes the image such that its smaller dimension == min_dim <br> **max_dim** (int, default=1365): Ensures that the image longest side does not exceed this value <br> **padding** (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim | Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array. | ResizeWithRatio: <br> &ensp;&ensp; min_dim: 800 <br> &ensp;&ensp; max_dim: 1365 <br> &ensp;&ensp; padding: True |
| LabelShift(label_shift) | **label_shift**(int, default=0): number of label shift | Convert label to label - label_shift | LabelShift: <br> &ensp;&ensp; label_shift: 0 |

### MXNet

| Transform | Parameters | Comments | Usage(In yaml file) |
| :------ | :------ | :------ | :------ |
| Resize(size, interpolation) | **size** (list or int): Size of the result <br> **interpolation** (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' | Resize the input image to the given size | Resize: <br> &ensp;&ensp; size: 256 <br> &ensp;&ensp;  interpolation: bilinear |
| CenterCrop(size) | **size** (list or int): Size of the result | Crop the given image at the center to the given size | CenterCrop: <br> &ensp;&ensp; size: [10, 10] # or size: 10 |
| RandomResizedCrop(size, scale, ratio, interpolation) | **size** (list or int): Size of the result <br> **scale** (tuple or list, default=(0.08, 1.0)):range of size of the origin size cropped <br> **ratio** (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped <br> **interpolation** (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' | Crop the given image to random size and aspect ratio | RandomResizedCrop: <br> &ensp;&ensp; size: [10, 10] # or size: 10 <br> &ensp;&ensp; scale: [0.08, 1.0] <br> &ensp;&ensp; ratio: [3. / 4., 4. / 3.] <br> &ensp;&ensp; interpolation: bilinear |
| Normalize(mean, std) | **mean** (list, default=[0.0]):means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape <br> **std** (list, default=[1.0]):stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape | Normalize a image with mean and standard deviation | Normalize: <br> &ensp;&ensp; mean: [0.0, 0.0, 0.0] <br> &ensp;&ensp; std: [1.0, 1.0, 1.0] |
| RandomCrop(size) | **size** (list or int): Size of the result | Crop the image at a random location to the given size | RandomCrop: <br> &ensp;&ensp; size: [10, 10] # size: 10 |
| Compose(transform_list) | **transform_list** (list of Transform objects):  list of transforms to compose | Compose several transforms together | If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms. <br> **In user code:** <br> from neural_compressor.experimental.data  import TRANSFORMS <br> preprocess = TRANSFORMS(framework, 'preprocess') <br> resize = preprocess["Resize"] (\**args) <br> normalize = preprocess["Normalize"] (\**args) <br> compose = preprocess["Compose"] ([resize, normalize]) <br> sample = compose(sample) <br> # sample: image, label |
| CropResize(x, y, width, height, size, interpolation) | **x** (int): Left boundary of the cropping area <br> **y** (int): Top boundary of the cropping area <br> **width** (int): Width of the cropping area <br> **height** (int): Height of the cropping area <br> **size** (list or int): resize to new size after cropping <br> **interpolation** (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' | Crop the input image with given location and resize it | CropResize: <br> &ensp;&ensp; x: 0 <br> &ensp;&ensp; y: 5 <br> &ensp;&ensp; width: 224 <br> &ensp;&ensp; height: 224 <br> &ensp;&ensp; size: [100, 100] # or size: 100 <br> &ensp;&ensp; interpolation: bilinear |
| RandomHorizontalFlip() | None | Horizontally flip the given image randomly | RandomHorizontalFlip: {} |
| RandomVerticalFlip() | None | Vertically flip the given image randomly | RandomVerticalFlip: {} |
| CropToBoundingBox(offset_height, offset_width, target_height, target_width) | **offset_height** (int): Vertical coordinate of the top-left corner of the result in the input <br> **offset_width** (int): Horizontal coordinate of the top-left corner of the result in the input <br> **target_height** (int): Height of the result <br> **target_width** (int): Width of the result | Crop an image to a specified bounding box | CropToBoundingBox: <br> &ensp;&ensp; offset_height: 10 <br> &ensp;&ensp; offset_width: 10 <br> &ensp;&ensp; target_height: 224 <br> &ensp;&ensp; 224 |
| ToArray() | None | Convert NDArray to numpy array | ToArray: {} |
| ToTensor() | None | Convert an image NDArray or batch of image NDArray to a tensor NDArray | ToTensor: {} |
| Cast(dtype) | **dtype** (str, default ='float32'): The target data type | Convert image to given dtype | Cast: <br> &ensp;&ensp; dtype: float32 |
| Transpose(perm) | **perm** (list): A permutation of the dimensions of input image | Transpose image according perm | Transpose: <br> &ensp;&ensp; perm: [1, 2, 0] |
| AlignImageChannel(dim) | **dim** (int): The channel number of result image | Align image channel, now just support [H,W]->[H,W,dim], [H,W,4]->[H,W,3] and [H,W,3]->[H,W]. <br> This transform is going to be deprecated. | AlignImageChannel: <br> &ensp;&ensp; dim: 3 |
| ToNDArray() | None | Convert np.array to NDArray | ToNDArray: {} |
| ResizeWithRatio(min_dim, max_dim, padding) | **min_dim** (int, default=800): Resizes the image such that its smaller dimension == min_dim <br> **max_dim** (int, default=1365): Ensures that the image longest side does not exceed this value <br> **padding** (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim | Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array. | ResizeWithRatio: <br> &ensp;&ensp; min_dim: 800 <br> &ensp;&ensp; max_dim: 1365 <br> &ensp;&ensp; padding: True |

### ONNXRT

| Type | Parameters | Comments | Usage(In yaml file) |
| :------ | :------ | :------ | :------ |
| Resize(size, interpolation) | **size** (list or int): Size of the result <br> **interpolation** (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' | Resize the input image to the given size | Resize: <br> &ensp;&ensp; size: 256 <br> &ensp;&ensp;  interpolation: bilinear |
| CenterCrop(size) | **size** (list or int): Size of the result | Crop the given image at the center to the given size | CenterCrop: <br> &ensp;&ensp; size: [10, 10] # or size: 10 |
| RandomResizedCrop(size, scale, ratio, interpolation) | **size** (list or int): Size of the result <br> **scale** (tuple or list, default=(0.08, 1.0)): range of size of the origin size cropped <br> **ratio** (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped <br> **interpolation** (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest' | Crop the given image to random size and aspect ratio | RandomResizedCrop: <br> &ensp;&ensp; size: [10, 10] # or size: 10 <br> &ensp;&ensp; scale: [0.08, 1.0] <br> &ensp;&ensp; ratio: [3. / 4., 4. / 3.] <br> &ensp;&ensp; interpolation: bilinear |
| Normalize(mean, std) | **mean** (list, default=[0.0]):means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape <br> **std** (list, default=[1.0]): stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape | Normalize a image with mean and standard deviation | Normalize: <br> &ensp;&ensp; mean: [0.0, 0.0, 0.0] <br> &ensp;&ensp; std: [1.0, 1.0, 1.0] |
| RandomCrop(size) | **size** (list or int): Size of the result | Crop the image at a random location to the given size | RandomCrop: <br> &ensp;&ensp; size: [10, 10] # size: 10 |
| Compose(transform_list) | **transform_list** (list of Transform objects): list of transforms to compose | Compose several transforms together | If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms. <br> **In user code:** <br> from neural_compressor.experimental.data  import TRANSFORMS <br> preprocess = TRANSFORMS(framework, 'preprocess') <br> resize = preprocess["Resize"] (\**args) <br> normalize = preprocess["Normalize"] (\**args) <br> compose = preprocess["Compose"] ([resize, normalize]) <br> sample = compose(sample) <br> # sample: image, label |
| CropResize(x, y, width, height, size, interpolation) | **x** (int): Left boundary of the cropping area <br> **y** (int): Top boundary of the cropping area <br> **width** (int): Width of the cropping area <br> **height** (int): Height of the cropping area <br> **size** (list or int): resize to new size after cropping <br> **interpolation** (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest' | Crop the input image with given location and resize it| CropResize: <br> &ensp;&ensp; x: 0 <br> &ensp;&ensp; y: 5 <br> &ensp;&ensp; width: 224 <br> &ensp;&ensp; height: 224 <br> &ensp;&ensp; size: [100, 100] # or size: 100 <br> &ensp;&ensp; interpolation: bilinear |
| RandomHorizontalFlip() | None | Horizontally flip the given image randomly | RandomHorizontalFlip: {} |
| RandomVerticalFlip() | None | Vertically flip the given image randomly | RandomVerticalFlip: {} |
| CropToBoundingBox(offset_height, offset_width, target_height, target_width) | **offset_height** (int): Vertical coordinate of the top-left corner of the result in the input <br> **offset_width** (int): Horizontal coordinate of the top-left corner of the result in the input <br> **target_height** (int): Height of the result <br> **target_width** (int): Width of the result | Crop an image to a specified bounding box | CropToBoundingBox: <br> &ensp;&ensp; offset_height: 10 <br> &ensp;&ensp; offset_width: 10 <br> &ensp;&ensp; target_height: 224 <br> &ensp;&ensp; 224 |
| ToArray() | None | Convert PIL Image to numpy array | ToArray: {} |
| Rescale() | None | Scale the values of image to [0,1] | Rescale: {} |
| AlignImageChannel(dim) | **dim** (int): The channel number of result image | Align image channel, now just support [H,W]->[H,W,dim], [H,W,4]->[H,W,3] and [H,W,3]->[H,W]. <br> This transform is going to be deprecated. | AlignImageChannel: <br> &ensp;&ensp; dim: 3 |
| ResizeCropImagenet(height, width, random_crop, resize_side, random_flip_left_right, mean_value, scale) | **height** (int): Height of the result <br> **width** (int): Width of the result <br> **random_crop** (bool, default=False): whether to random crop <br> **resize_side** (int, default=256): desired shape after resize operation <br> **random_flip_left_right** (bool, default=False): whether to random flip left and right <br> **mean_value** (list, default=[0.0,0.0,0.0]): mean for each channel <br> **scale** (float, default=1.0): std value | Combination of a series of transforms which is applicable to images in Imagenet | ResizeCropImagenet: <br> &ensp;&ensp; height: 224 <br> &ensp;&ensp; width: 224 <br> &ensp;&ensp; random_crop: False <br> &ensp;&ensp; resize_side: 256 <br> &ensp;&ensp; random_flip_left_right: False <br> &ensp;&ensp; mean_value: [123.68, 116.78, 103.94] <br> &ensp;&ensp; scale: 0.017 |
| Cast(dtype) | **dtype** (str, default ='float32'): The target data type | Convert image to given dtype | Cast: <br> &ensp;&ensp; dtype: float32 |
| ResizeWithRatio(min_dim, max_dim, padding) | **min_dim** (int, default=800): Resizes the image such that its smaller dimension == min_dim <br> **max_dim** (int, default=1365): Ensures that the image longest side does not exceed this value <br> **padding** (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim | Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array. | ResizeWithRatio: <br> &ensp;&ensp; min_dim: 800 <br> &ensp;&ensp; max_dim: 1365 <br> &ensp;&ensp; padding: True |