`neural_compressor.adaptor.ox_utils.util`

Helper classes or functions for onnxrt adaptor.

Module Contents

Classes

`QuantType`	Represent QuantType value.
`ValueInfo`	Represents a casted tensor info.
`QuantizedValue`	Represents a linearly quantized value (input/output/intializer).
`QuantizedInitializer`	Represents a linearly quantized weight input from ONNX operators.
`QuantizationMode`	Represent QuantizationMode value.
`QuantizedValueType`	Represent QuantizedValueType value.
`QuantFormat`	Represent QuantFormat value.

Functions

`dtype_to_name`(dtype_mapping, dtype)	Map data type and its string representation.
`make_quant_node`(name, inputs, outputs)	Make a QuantizeLinear node.
`make_dquant_node`(name, inputs, outputs[, axis])	Make a DequantizeLinear node.
`is_B_transposed`(node)	Wheter inuput B is transposed.
`split_shared_bias`(model)	Split shared tensor.
`float_to_float16`(tensor)	Convert float to float16.
`float_to_bfloat16`(tensor)	Convert float to bfloat16.
`cast_tensor`(tensor, dtype)	Convert tensor float to target dtype.
`remove_init_from_model_input`(model)	Remove initializer from model input.
`collate_preds`(results)	Collect model outputs.
`quantize_data_with_scale_zero`(data, qType, scheme, ...)	Quantize data with scale and zero point.
`calculate_scale_zp`(rmin, rmax, quantize_range, qType, ...)	Calculate scale and zero point.
`quantize_data`(data, quantize_range, qType, scheme)	Quantize data.
`quantize_data_per_channel`(data, axis, quantize_range, ...)	Quantize tensor per-channel.
`dequantize_data_with_scale_zero`(tensor_value, ...)	Dequantize tensor with sacale and zero point.
`dequantize_data`(tensor_value, scale_value, zo_value[, ...])	Dequantize tensor.
`quantize_nparray`(qtype, arr, scale, zero_point[, low, ...])	Quantize numpy array.
`attribute_to_kwarg`(attribute)	Convert attribute to kwarg format for use with onnx.helper.make_node.
`find_by_name`(name, item_list)	Helper function to find item by name in a list.
`get_smooth_scales_per_op`(max_vals_per_channel, ...)	Get the smooth scales for weights.
`get_smooth_scales_per_input`(max_vals_per_channel, ...)	Get the smooth scales for weights.
`insert_smooth_mul_op_per_input`(scales, shape_infos, ...)	Insert the mul layer after inupt.
`adjust_weights_per_op`(model, nodes, scales)	Adjust the weights per input scale.
`adjust_weights_per_input`(model, nodes, scales)	Adjust the weights per input scale.
`insert_smooth_mul_op_per_op`(scales, shape_infos, ...)	Insert the mul layer before op.
`trt_env_setup`(model)	Set environment variable for Tensorrt Execution Provider.

neural_compressor.adaptor.ox_utils.util.dtype_to_name(dtype_mapping, dtype)[source]: Map data type and its string representation.

class neural_compressor.adaptor.ox_utils.util.QuantType[source]: Represent QuantType value.

neural_compressor.adaptor.ox_utils.util.make_quant_node(name, inputs, outputs)[source]: Make a QuantizeLinear node.

neural_compressor.adaptor.ox_utils.util.make_dquant_node(name, inputs, outputs, axis=None)[source]: Make a DequantizeLinear node.

neural_compressor.adaptor.ox_utils.util.is_B_transposed(node)[source]: Wheter inuput B is transposed.

neural_compressor.adaptor.ox_utils.util.split_shared_bias(model)[source]: Split shared tensor.

neural_compressor.adaptor.ox_utils.util.float_to_float16(tensor)[source]: Convert float to float16.

neural_compressor.adaptor.ox_utils.util.float_to_bfloat16(tensor)[source]: Convert float to bfloat16.

neural_compressor.adaptor.ox_utils.util.cast_tensor(tensor, dtype)[source]

Convert tensor float to target dtype.

Parameters:

tensor (TensorProto) – TensorProto object
dtype (int) – target data type

neural_compressor.adaptor.ox_utils.util.remove_init_from_model_input(model)[source]: Remove initializer from model input.

neural_compressor.adaptor.ox_utils.util.collate_preds(results)[source]: Collect model outputs.

neural_compressor.adaptor.ox_utils.util.quantize_data_with_scale_zero(data, qType, scheme, scale, zero_point)[source]

Quantize data with scale and zero point.

To pack weights, we compute a linear transformation

when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and
when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where
m = max(abs(rmin), abs(rmax))

Parameters:

data (np.array) – data to quantize
qType (int) – data type to quantize to. Supported types UINT8 and INT8
scheme (string) – sym or asym quantization.
scale (float) – computed scale of quantized data
zero_point (uint8 or int8) – computed zero point of quantized data

neural_compressor.adaptor.ox_utils.util.calculate_scale_zp(rmin, rmax, quantize_range, qType, scheme)[source]: Calculate scale and zero point.

neural_compressor.adaptor.ox_utils.util.quantize_data(data, quantize_range, qType, scheme)[source]

Quantize data.

To pack weights, we compute a linear transformation

when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and
when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where
m = max(abs(rmin), abs(rmax))

and add necessary intermediate nodes to trasnform quantized weight to full weight using the equation r = S(q-z), where

r: real original value q: quantized value S: scale z: zero point

Parameters:

data (array) – data to quantize
quantize_range (list) – list of data to weight pack.
qType (int) – data type to quantize to. Supported types UINT8 and INT8
scheme (string) – sym or asym quantization.

neural_compressor.adaptor.ox_utils.util.quantize_data_per_channel(data, axis, quantize_range, qType, scheme)[source]: Quantize tensor per-channel.

neural_compressor.adaptor.ox_utils.util.dequantize_data_with_scale_zero(tensor_value, scale_value, zo_value)[source]: Dequantize tensor with sacale and zero point.

neural_compressor.adaptor.ox_utils.util.dequantize_data(tensor_value, scale_value, zo_value, axis=0)[source]: Dequantize tensor.

class neural_compressor.adaptor.ox_utils.util.ValueInfo(tensor_name, dtype, new_dtype)[source]: Represents a casted tensor info.

class neural_compressor.adaptor.ox_utils.util.QuantizedValue(name, new_quantized_name, scale_name, zero_point_name, quantized_value_type, axis=None, qType=QuantType.QUInt8)[source]: Represents a linearly quantized value (input/output/intializer).

class neural_compressor.adaptor.ox_utils.util.QuantizedInitializer(name, initializer, rmins, rmaxs, zero_points, scales, data=[], quantized_data=[], axis=None, qType=QuantType.QUInt8)[source]: Represents a linearly quantized weight input from ONNX operators.

class neural_compressor.adaptor.ox_utils.util.QuantizationMode[source]: Represent QuantizationMode value.

class neural_compressor.adaptor.ox_utils.util.QuantizedValueType[source]: Represent QuantizedValueType value.

class neural_compressor.adaptor.ox_utils.util.QuantFormat[source]: Represent QuantFormat value.

neural_compressor.adaptor.ox_utils.util.quantize_nparray(qtype, arr, scale, zero_point, low=None, high=None)[source]: Quantize numpy array.

neural_compressor.adaptor.ox_utils.util.attribute_to_kwarg(attribute)[source]: Convert attribute to kwarg format for use with onnx.helper.make_node.

neural_compressor.adaptor.ox_utils.util.find_by_name(name, item_list)[source]: Helper function to find item by name in a list.

neural_compressor.adaptor.ox_utils.util.get_smooth_scales_per_op(max_vals_per_channel, input_tensors_2_weights, input_tensors_2_weights_nodes, alpha)[source]

Get the smooth scales for weights.

The ops with the same input will share one mul layer. TODO support individual scales for each layer.

Parameters:

max_vals_per_channel – Max values per channel after calibration
input_tensors_2_weights – A dict saved input tensor name and its corresponding weights
input_tensors_2_weights_nodes – A dict saved input tensor name and its corresponding weight nodes
alpha – smooth alpha in paper

Returns:

the smooth scales for weights, currently one input tensor only have one scale

neural_compressor.adaptor.ox_utils.util.get_smooth_scales_per_input(max_vals_per_channel, input_tensors_2_weights, alpha)[source]

Get the smooth scales for weights.

The ops with the same input will share one mul layer. TODO support individual scales for each layer.

Parameters:

max_vals_per_channel – Max values per channel after calibration
input_tensors_2_weights – A dict saved input tensor name and its corresponding weights
alpha – smooth alpha in paper

Returns:

the smooth scales for weights, currently one input tensor only have one scale

neural_compressor.adaptor.ox_utils.util.insert_smooth_mul_op_per_input(scales, shape_infos, input_tensors_2_weights_nodes)[source]

Insert the mul layer after inupt.

The ops with the same input will share one mul layer.

Parameters:

scales – The smooth scales
shape_infos – the input tensor shape information
input_tensors_2_weights_nodes – A dict

Returns:

added Mul layers new_init_tensors: added scales tensor

Return type:

new_added_mul_nodes

neural_compressor.adaptor.ox_utils.util.adjust_weights_per_op(model, nodes, scales)[source]

Adjust the weights per input scale.

Each op has one individual Mul layer.

Parameters:

model – The onnx model
nodes – The nodes whose weights needs to be adjustd
scales – The input scales

neural_compressor.adaptor.ox_utils.util.adjust_weights_per_input(model, nodes, scales)[source]

Adjust the weights per input scale.

The ops with the same input will share one mul layer

Parameters:

model – The onnx model
nodes – The nodes whose weights needs to be adjustd
scales – The input scales

neural_compressor.adaptor.ox_utils.util.insert_smooth_mul_op_per_op(scales, shape_infos, input_tensors_2_weights_nodes)[source]

Insert the mul layer before op.

Each op has one individual Mul layer.

Parameters:

scales – The smooth scales
shape_infos – the input tensor shape information
input_tensors_2_weights_nodes – A dict

Returns:

added Mul layers new_init_tensors: added scales tensor name_2_nodes: a dict, key is the node name, value is the node

Return type:

new_added_mul_nodes

neural_compressor.adaptor.ox_utils.util.trt_env_setup(model)[source]: Set environment variable for Tensorrt Execution Provider.

neural_compressor.adaptor.ox_utils.util

Module Contents

Classes

Functions

`neural_compressor.adaptor.ox_utils.util`