Transform
-
2.1 Tensorflow
2.2 Pytorch
2.3 MXNet
2.4 ONNXRT
Introduction
Neural Compressor supports built-in preprocessing methods on different framework backends. Refer to this HelloWorld example on how to configure a transform in a dataloader.
Transform Support List
TensorFlow
Transform | Parameters | Comments | Usage(In yaml file) |
---|---|---|---|
Resize(size, interpolation) | size (list or int): Size of the result interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' |
Resize the input image to the given size | Resize: size: 256 interpolation: bilinear |
CenterCrop(size) | size (list or int): Size of the result | Crop the given image at the center to the given size | CenterCrop: size: [10, 10] # or size: 10 |
RandomResizedCrop(size, scale, ratio, interpolation) | size (list or int): Size of the result scale (tuple or list, default=(0.08, 1.0)): range of the size of the origin size cropped ratio (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest' |
Crop the given image to random size and aspect ratio | RandomResizedCrop: size: [10, 10] # or size: 10 scale: [0.08, 1.0] ratio: [3. / 4., 4. / 3.] interpolation: bilinear |
Normalize(mean, std) | mean (list, default=[0.0]): means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape std (list, default=[1.0]):stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape |
Normalize a image with mean and standard deviation | Normalize: mean: [0.0, 0.0, 0.0] std: [1.0, 1.0, 1.0] |
RandomCrop(size) | size (list or int): Size of the result | Crop the image at a random location to the given size | RandomCrop: size: [10, 10] # size: 10 |
Compose(transform_list) | transform_list (list of Transform objects): list of transforms to compose | Compose several transforms together | If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms. In user code: from neural_compressor.experimental.data import TRANSFORMS preprocess = TRANSFORMS(framework, 'preprocess') resize = preprocess["Resize"] (*args) normalize = preprocess["Normalize"] (*args) compose = preprocess["Compose"] ([resize, normalize]) sample = compose(sample) # sample: image, label |
CropResize(x, y, width, height, size, interpolation) | x (int): Left boundary of the cropping area y (int): Top boundary of the cropping area width (int): Width of the cropping area height (int): Height of the cropping area size (list or int): resize to new size after cropping interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest' and 'bicubic' |
Crop the input image with given location and resize it | CropResize: x: 0 y: 5 width: 224 height: 224 size: [100, 100] # or size: 100 interpolation: bilinear |
RandomHorizontalFlip() | None | Horizontally flip the given image randomly | RandomHorizontalFlip: {} |
RandomVerticalFlip() | None | Vertically flip the given image randomly | RandomVerticalFlip: {} |
DecodeImage() | None | Decode a JPEG-encoded image to a uint8 tensor | DecodeImage: {} |
EncodeJped() | None | Encode image to a Tensor of type string | EncodeJped: {} |
Transpose(perm) | perm (list): A permutation of the dimensions of input image | Transpose image according perm | Transpose: perm: [1, 2, 0] |
ResizeWithRatio(min_dim, max_dim, padding) | min_dim (int, default=800): Resizes the image such that its smaller dimension == min_dim max_dim (int, default=1365): Ensures that the image longest side does not exceed this value padding (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim |
Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array or tf.Tensor. | ResizeWithRatio: min_dim: 800 max_dim: 1365 padding: True |
CropToBoundingBox(offset_height, offset_width, target_height, target_width) | offset_height (int): Vertical coordinate of the top-left corner of the result in the input offset_width (int): Horizontal coordinate of the top-left corner of the result in the input target_height (int): Height of the result target_width (int): Width of the result |
Crop an image to a specified bounding box | CropToBoundingBox: offset_height: 10 offset_width: 10 target_height: 224 224 |
Cast(dtype) | dtype (str, default='float32'): A dtype to convert image to | Convert image to given dtype | Cast: dtype: float32 |
ToArray() | None | Convert PIL Image to numpy array | ToArray: {} |
Rescale() | None | Scale the values of image to [0,1] | Rescale: {} |
AlignImageChannel(dim) | dim (int): The channel number of result image | Align image channel, now just support [H,W]->[H,W,dim], [H,W,4]->[H,W,3] and [H,W,3]->[H,W]. This transform is going to be deprecated. |
AlignImageChannel: dim: 3 |
ParseDecodeImagenet() | None | Parse features in Example proto | ParseDecodeImagenet: {} |
ResizeCropImagenet(height, width, random_crop, resize_side, random_flip_left_right, mean_value, scale) | height (int): Height of the result width (int): Width of the result random_crop (bool, default=False): whether to random crop resize_side (int, default=256):desired shape after resize operation random_flip_left_right (bool, default=False): whether to random flip left and right mean_value (list, default=[0.0,0.0,0.0]):means for each channel scale (float, default=1.0):std value |
Combination of a series of transforms which is applicable to images in Imagenet | ResizeCropImagenet: height: 224 width: 224 random_crop: False resize_side: 256 random_flip_left_right: False mean_value: [123.68, 116.78, 103.94] scale: 0.017 |
QuantizedInput(dtype, scale) | dtype(str): desired image dtype, support 'uint8', 'int8' scale(float, default=None):scaling ratio of each point in image |
Convert the dtype of input to quantize it | QuantizedInput: dtype: 'uint8' |
LabelShift(label_shift) | label_shift(int, default=0): number of label shift | Convert label to label - label_shift | LabelShift: label_shift: 0 |
BilinearImagenet(height, width, central_fraction, mean_value, scale) | height(int): Height of the result width(int):Width of the result central_fraction(float, default=0.875):fraction of size to crop mean_value(list, default=[0.0,0.0,0.0]):means for each channel scale(float, default=1.0):std value |
Combination of a series of transforms which is applicable to images in Imagenet | BilinearImagenet: height: 224 width: 224 central_fraction: 0.875 mean_value: [0.0,0.0,0.0] scale: 1.0 |
SquadV1(label_file, n_best_size, max_seq_length, max_query_length, max_answer_length, do_lower_case, doc_stride) | label_file (str): path of label file vocab_file(str): path of vocabulary file n_best_size (int, default=20): The total number of n-best predictions to generate in the nbest_predictions.json output file max_seq_length (int, default=384): The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter, than this will be padded max_query_length (int, default=64): The maximum number of tokens for the question. Questions longer than this will be truncated to this length max_answer_length (int, default=30): The maximum length of an answer that can be generated. This is needed because the start and end predictions are not conditioned on one another do_lower_case (bool, default=True): Whether to lower case the input text. Should be True for uncased models and False for cased models doc_stride (int, default=128): When splitting up a long document into chunks, how much stride to take between chunks |
Postprocess the predictions of bert on SQuAD | SquadV1 label_file: /path/to/label_file n_best_size: 20 max_seq_length: 384 max_query_length: 64 max_answer_length: 30 do_lower_case: True doc_stride: True |
Pytorch
Transform | Parameters | Comments | Usage(In yaml file) |
---|---|---|---|
Resize(size) | size (list or int): Size of the result interpolation(str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' |
Resize the input image to the given size | Resize: size: 256 interpolation: bilinear |
CenterCrop(size) | size (list or int): Size of the result | Crop the given image at the center to the given size | CenterCrop: size: [10, 10] # or size: 10 |
RandomResizedCrop(size, scale, ratio, interpolation) | size (list or int): Size of the result scale (tuple or list, default=(0.08, 1.0)): range of size of the origin size cropped ratio (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' |
Crop the given image to random size and aspect ratio | RandomResizedCrop: size: [10, 10] # or size: 10 scale: [0.08, 1.0] ratio: [3. / 4., 4. / 3.] interpolation: bilinear |
Normalize(mean, std) | mean (list, default=[0.0]): means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape std (list, default=[1.0]): stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape |
Normalize a image with mean and standard deviation | Normalize: mean: [0.0, 0.0, 0.0] std: [1.0, 1.0, 1.0] |
RandomCrop(size) | size (list or int): Size of the result | Crop the image at a random location to the given size | RandomCrop: size: [10, 10] # size: 10 |
Compose(transform_list) | transform_list (list of Transform objects): list of transforms to compose | Compose several transforms together | If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms. In user code: from neural_compressor.experimental.data import TRANSFORMS preprocess = TRANSFORMS(framework, 'preprocess') resize = preprocess["Resize"] (*args) normalize = preprocess["Normalize"] (*args) compose = preprocess["Compose"] ([resize, normalize]) sample = compose(sample) # sample: image, label |
RandomHorizontalFlip() | None | Horizontally flip the given image randomly | RandomHorizontalFlip: {} |
RandomVerticalFlip() | None | Vertically flip the given image randomly | RandomVerticalFlip: {} |
Transpose(perm) | perm (list): A permutation of the dimensions of input image | Transpose image according perm | Transpose: perm: [1, 2, 0] |
CropToBoundingBox(offset_height, offset_width, target_height, target_width) | offset_height (int): Vertical coordinate of the top-left corner of the result in the input offset_width (int): Horizontal coordinate of the top-left corner of the result in the input target_height (int): Height of the result target_width (int): Width of the result |
Crop an image to a specified bounding box | CropToBoundingBox: offset_height: 10 offset_width: 10 target_height: 224 224 |
ToTensor() | None | Convert a PIL Image or numpy.ndarray to tensor | ToTensor: {} |
ToPILImage() | None | Convert a tensor or an ndarray to PIL Image | ToPILImage: {} |
Pad(padding, fill, padding_mode) | padding (int or tuple or list): Padding on each border fill (int or str or tuple): Pixel fill value for constant fill. Default is 0 padding_mode (str): Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant |
Pad the given image on all sides with the given “pad” value | Pad: padding: 0 fill: 0 padding_mode: constant |
ColorJitter(brightness, contrast, saturation, hue) | brightness (float or tuple of python:float (min, max)): How much to jitter brightness. Default is 0 contrast (float or tuple of python:float (min, max)): How much to jitter contrast. Default is 0 saturation (float or tuple of python:float (min, max)): How much to jitter saturation. Default is 0 hue (float or tuple of python:float (min, max)): How much to jitter hue. Default is 0 |
Randomly change the brightness, contrast, saturation and hue of an image | ColorJitter: brightness: 0 contrast: 0 saturation: 0 hue: 0 |
ToArray() | None | Convert PIL Image to numpy array | ToArray: {} |
CropResize(x, y, width, height, size, interpolation) | x (int):Left boundary of the cropping area y (int):Top boundary of the cropping area width (int):Width of the cropping area height (int):Height of the cropping area size (list or int): resize to new size after cropping interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' |
Crop the input image with given location and resize it | CropResize: x: 0 y: 5 width: 224 height: 224 size: [100, 100] # or size: 100 interpolation: bilinear |
Cast(dtype) | dtype (str, default ='float32'): The target data type | Convert image to given dtype | Cast: dtype: float32 |
AlignImageChannel(dim) | dim (int): The channel number of result image | Align image channel, now just support [H,W,4]->[H,W,3] and [H,W,3]->[H,W], input image must be PIL Image. This transform is going to be deprecated. |
AlignImageChannel: dim: 3 |
ResizeWithRatio(min_dim, max_dim, padding) | min_dim (int, default=800): Resizes the image such that its smaller dimension == min_dim max_dim (int, default=1365): Ensures that the image longest side does not exceed this value padding (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim |
Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array. | ResizeWithRatio: min_dim: 800 max_dim: 1365 padding: True |
LabelShift(label_shift) | label_shift(int, default=0): number of label shift | Convert label to label - label_shift | LabelShift: label_shift: 0 |
MXNet
Transform | Parameters | Comments | Usage(In yaml file) |
---|---|---|---|
Resize(size, interpolation) | size (list or int): Size of the result interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' |
Resize the input image to the given size | Resize: size: 256 interpolation: bilinear |
CenterCrop(size) | size (list or int): Size of the result | Crop the given image at the center to the given size | CenterCrop: size: [10, 10] # or size: 10 |
RandomResizedCrop(size, scale, ratio, interpolation) | size (list or int): Size of the result scale (tuple or list, default=(0.08, 1.0)):range of size of the origin size cropped ratio (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' |
Crop the given image to random size and aspect ratio | RandomResizedCrop: size: [10, 10] # or size: 10 scale: [0.08, 1.0] ratio: [3. / 4., 4. / 3.] interpolation: bilinear |
Normalize(mean, std) | mean (list, default=[0.0]):means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape std (list, default=[1.0]):stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape |
Normalize a image with mean and standard deviation | Normalize: mean: [0.0, 0.0, 0.0] std: [1.0, 1.0, 1.0] |
RandomCrop(size) | size (list or int): Size of the result | Crop the image at a random location to the given size | RandomCrop: size: [10, 10] # size: 10 |
Compose(transform_list) | transform_list (list of Transform objects): list of transforms to compose | Compose several transforms together | If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms. In user code: from neural_compressor.experimental.data import TRANSFORMS preprocess = TRANSFORMS(framework, 'preprocess') resize = preprocess["Resize"] (*args) normalize = preprocess["Normalize"] (*args) compose = preprocess["Compose"] ([resize, normalize]) sample = compose(sample) # sample: image, label |
CropResize(x, y, width, height, size, interpolation) | x (int): Left boundary of the cropping area y (int): Top boundary of the cropping area width (int): Width of the cropping area height (int): Height of the cropping area size (list or int): resize to new size after cropping interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' |
Crop the input image with given location and resize it | CropResize: x: 0 y: 5 width: 224 height: 224 size: [100, 100] # or size: 100 interpolation: bilinear |
RandomHorizontalFlip() | None | Horizontally flip the given image randomly | RandomHorizontalFlip: {} |
RandomVerticalFlip() | None | Vertically flip the given image randomly | RandomVerticalFlip: {} |
CropToBoundingBox(offset_height, offset_width, target_height, target_width) | offset_height (int): Vertical coordinate of the top-left corner of the result in the input offset_width (int): Horizontal coordinate of the top-left corner of the result in the input target_height (int): Height of the result target_width (int): Width of the result |
Crop an image to a specified bounding box | CropToBoundingBox: offset_height: 10 offset_width: 10 target_height: 224 224 |
ToArray() | None | Convert NDArray to numpy array | ToArray: {} |
ToTensor() | None | Convert an image NDArray or batch of image NDArray to a tensor NDArray | ToTensor: {} |
Cast(dtype) | dtype (str, default ='float32'): The target data type | Convert image to given dtype | Cast: dtype: float32 |
Transpose(perm) | perm (list): A permutation of the dimensions of input image | Transpose image according perm | Transpose: perm: [1, 2, 0] |
AlignImageChannel(dim) | dim (int): The channel number of result image | Align image channel, now just support [H,W]->[H,W,dim], [H,W,4]->[H,W,3] and [H,W,3]->[H,W]. This transform is going to be deprecated. |
AlignImageChannel: dim: 3 |
ToNDArray() | None | Convert np.array to NDArray | ToNDArray: {} |
ResizeWithRatio(min_dim, max_dim, padding) | min_dim (int, default=800): Resizes the image such that its smaller dimension == min_dim max_dim (int, default=1365): Ensures that the image longest side does not exceed this value padding (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim |
Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array. | ResizeWithRatio: min_dim: 800 max_dim: 1365 padding: True |
ONNXRT
Type | Parameters | Comments | Usage(In yaml file) |
---|---|---|---|
Resize(size, interpolation) | size (list or int): Size of the result interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest', 'bicubic' |
Resize the input image to the given size | Resize: size: 256 interpolation: bilinear |
CenterCrop(size) | size (list or int): Size of the result | Crop the given image at the center to the given size | CenterCrop: size: [10, 10] # or size: 10 |
RandomResizedCrop(size, scale, ratio, interpolation) | size (list or int): Size of the result scale (tuple or list, default=(0.08, 1.0)): range of size of the origin size cropped ratio (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest' |
Crop the given image to random size and aspect ratio | RandomResizedCrop: size: [10, 10] # or size: 10 scale: [0.08, 1.0] ratio: [3. / 4., 4. / 3.] interpolation: bilinear |
Normalize(mean, std) | mean (list, default=[0.0]):means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape std (list, default=[1.0]): stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape |
Normalize a image with mean and standard deviation | Normalize: mean: [0.0, 0.0, 0.0] std: [1.0, 1.0, 1.0] |
RandomCrop(size) | size (list or int): Size of the result | Crop the image at a random location to the given size | RandomCrop: size: [10, 10] # size: 10 |
Compose(transform_list) | transform_list (list of Transform objects): list of transforms to compose | Compose several transforms together | If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms. In user code: from neural_compressor.experimental.data import TRANSFORMS preprocess = TRANSFORMS(framework, 'preprocess') resize = preprocess["Resize"] (*args) normalize = preprocess["Normalize"] (*args) compose = preprocess["Compose"] ([resize, normalize]) sample = compose(sample) # sample: image, label |
CropResize(x, y, width, height, size, interpolation) | x (int): Left boundary of the cropping area y (int): Top boundary of the cropping area width (int): Width of the cropping area height (int): Height of the cropping area size (list or int): resize to new size after cropping interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest' |
Crop the input image with given location and resize it | CropResize: x: 0 y: 5 width: 224 height: 224 size: [100, 100] # or size: 100 interpolation: bilinear |
RandomHorizontalFlip() | None | Horizontally flip the given image randomly | RandomHorizontalFlip: {} |
RandomVerticalFlip() | None | Vertically flip the given image randomly | RandomVerticalFlip: {} |
CropToBoundingBox(offset_height, offset_width, target_height, target_width) | offset_height (int): Vertical coordinate of the top-left corner of the result in the input offset_width (int): Horizontal coordinate of the top-left corner of the result in the input target_height (int): Height of the result target_width (int): Width of the result |
Crop an image to a specified bounding box | CropToBoundingBox: offset_height: 10 offset_width: 10 target_height: 224 224 |
ToArray() | None | Convert PIL Image to numpy array | ToArray: {} |
Rescale() | None | Scale the values of image to [0,1] | Rescale: {} |
AlignImageChannel(dim) | dim (int): The channel number of result image | Align image channel, now just support [H,W]->[H,W,dim], [H,W,4]->[H,W,3] and [H,W,3]->[H,W]. This transform is going to be deprecated. |
AlignImageChannel: dim: 3 |
ResizeCropImagenet(height, width, random_crop, resize_side, random_flip_left_right, mean_value, scale) | height (int): Height of the result width (int): Width of the result random_crop (bool, default=False): whether to random crop resize_side (int, default=256): desired shape after resize operation random_flip_left_right (bool, default=False): whether to random flip left and right mean_value (list, default=[0.0,0.0,0.0]): mean for each channel scale (float, default=1.0): std value |
Combination of a series of transforms which is applicable to images in Imagenet | ResizeCropImagenet: height: 224 width: 224 random_crop: False resize_side: 256 random_flip_left_right: False mean_value: [123.68, 116.78, 103.94] scale: 0.017 |
Cast(dtype) | dtype (str, default ='float32'): The target data type | Convert image to given dtype | Cast: dtype: float32 |
ResizeWithRatio(min_dim, max_dim, padding) | min_dim (int, default=800): Resizes the image such that its smaller dimension == min_dim max_dim (int, default=1365): Ensures that the image longest side does not exceed this value padding (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim |
Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array. | ResizeWithRatio: min_dim: 800 max_dim: 1365 padding: True |