neural_compressor.adaptor.ox_utils.util
Helper classes or functions for onnxrt adaptor.
Module Contents
Classes
Represent QuantType value. |
|
Represents a casted tensor info. |
|
Represents a linearly quantized value (input/output/intializer). |
|
Represents a linearly quantized weight input from ONNX operators. |
|
Represent QuantizationMode value. |
|
Represent QuantizedValueType value. |
|
Represent QuantFormat value. |
Functions
|
Map data type and its string representation. |
|
Make a QuantizeLinear node. |
|
Make a DequantizeLinear node. |
|
Wheter inuput B is transposed. |
|
Split shared tensor. |
|
Convert float to float16. |
|
Convert float to bfloat16. |
|
Convert tensor float to target dtype. |
|
Remove initializer from model input. |
|
Collect model outputs. |
|
Quantize data with scale and zero point. |
|
Calculate scale and zero point. |
|
Quantize data. |
|
Quantize tensor per-channel. |
|
Dequantize tensor with sacale and zero point. |
|
Dequantize tensor. |
|
Quantize numpy array. |
|
Convert attribute to kwarg format for use with onnx.helper.make_node. |
|
Helper function to find item by name in a list. |
|
Get the smooth scales for weights. |
|
Get the smooth scales for weights. |
|
Insert the mul layer after inupt. |
|
Adjust the weights per input scale. |
|
Adjust the weights per input scale. |
|
Insert the mul layer before op. |
|
Set environment variable for Tensorrt Execution Provider. |
- neural_compressor.adaptor.ox_utils.util.dtype_to_name(dtype_mapping, dtype)[source]
Map data type and its string representation.
- neural_compressor.adaptor.ox_utils.util.make_quant_node(name, inputs, outputs)[source]
Make a QuantizeLinear node.
- neural_compressor.adaptor.ox_utils.util.make_dquant_node(name, inputs, outputs, axis=None)[source]
Make a DequantizeLinear node.
- neural_compressor.adaptor.ox_utils.util.is_B_transposed(node)[source]
Wheter inuput B is transposed.
Split shared tensor.
- neural_compressor.adaptor.ox_utils.util.float_to_bfloat16(tensor)[source]
Convert float to bfloat16.
- neural_compressor.adaptor.ox_utils.util.cast_tensor(tensor, dtype)[source]
Convert tensor float to target dtype.
- Parameters:
tensor (TensorProto) – TensorProto object
dtype (int) – target data type
- neural_compressor.adaptor.ox_utils.util.remove_init_from_model_input(model)[source]
Remove initializer from model input.
- neural_compressor.adaptor.ox_utils.util.quantize_data_with_scale_zero(data, qType, scheme, scale, zero_point)[source]
Quantize data with scale and zero point.
- To pack weights, we compute a linear transformation
when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and
- when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where
m = max(abs(rmin), abs(rmax))
- Parameters:
data (np.array) – data to quantize
qType (int) – data type to quantize to. Supported types UINT8 and INT8
scheme (string) – sym or asym quantization.
scale (float) – computed scale of quantized data
zero_point (uint8 or int8) – computed zero point of quantized data
- neural_compressor.adaptor.ox_utils.util.calculate_scale_zp(rmin, rmax, quantize_range, qType, scheme)[source]
Calculate scale and zero point.
- neural_compressor.adaptor.ox_utils.util.quantize_data(data, quantize_range, qType, scheme)[source]
Quantize data.
- To pack weights, we compute a linear transformation
when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and
- when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where
m = max(abs(rmin), abs(rmax))
and add necessary intermediate nodes to trasnform quantized weight to full weight using the equation r = S(q-z), where
r: real original value q: quantized value S: scale z: zero point
- Parameters:
data (array) – data to quantize
quantize_range (list) – list of data to weight pack.
qType (int) – data type to quantize to. Supported types UINT8 and INT8
scheme (string) – sym or asym quantization.
- neural_compressor.adaptor.ox_utils.util.quantize_data_per_channel(data, axis, quantize_range, qType, scheme)[source]
Quantize tensor per-channel.
- neural_compressor.adaptor.ox_utils.util.dequantize_data_with_scale_zero(tensor_value, scale_value, zo_value)[source]
Dequantize tensor with sacale and zero point.
- neural_compressor.adaptor.ox_utils.util.dequantize_data(tensor_value, scale_value, zo_value, axis=0)[source]
Dequantize tensor.
- class neural_compressor.adaptor.ox_utils.util.ValueInfo(tensor_name, dtype, new_dtype)[source]
Represents a casted tensor info.
- class neural_compressor.adaptor.ox_utils.util.QuantizedValue(name, new_quantized_name, scale_name, zero_point_name, quantized_value_type, axis=None, qType=QuantType.QUInt8)[source]
Represents a linearly quantized value (input/output/intializer).
- class neural_compressor.adaptor.ox_utils.util.QuantizedInitializer(name, initializer, rmins, rmaxs, zero_points, scales, data=[], quantized_data=[], axis=None, qType=QuantType.QUInt8)[source]
Represents a linearly quantized weight input from ONNX operators.
- class neural_compressor.adaptor.ox_utils.util.QuantizationMode[source]
Represent QuantizationMode value.
- class neural_compressor.adaptor.ox_utils.util.QuantizedValueType[source]
Represent QuantizedValueType value.
- neural_compressor.adaptor.ox_utils.util.quantize_nparray(qtype, arr, scale, zero_point, low=None, high=None)[source]
Quantize numpy array.
- neural_compressor.adaptor.ox_utils.util.attribute_to_kwarg(attribute)[source]
Convert attribute to kwarg format for use with onnx.helper.make_node.
- neural_compressor.adaptor.ox_utils.util.find_by_name(name, item_list)[source]
Helper function to find item by name in a list.
- neural_compressor.adaptor.ox_utils.util.get_smooth_scales_per_op(max_vals_per_channel, input_tensors_2_weights, input_tensors_2_weights_nodes, alpha)[source]
Get the smooth scales for weights.
The ops with the same input will share one mul layer. TODO support individual scales for each layer.
- Parameters:
max_vals_per_channel – Max values per channel after calibration
input_tensors_2_weights – A dict saved input tensor name and its corresponding weights
input_tensors_2_weights_nodes – A dict saved input tensor name and its corresponding weight nodes
alpha – smooth alpha in paper
- Returns:
the smooth scales for weights, currently one input tensor only have one scale
- neural_compressor.adaptor.ox_utils.util.get_smooth_scales_per_input(max_vals_per_channel, input_tensors_2_weights, alpha)[source]
Get the smooth scales for weights.
The ops with the same input will share one mul layer. TODO support individual scales for each layer.
- Parameters:
max_vals_per_channel – Max values per channel after calibration
input_tensors_2_weights – A dict saved input tensor name and its corresponding weights
alpha – smooth alpha in paper
- Returns:
the smooth scales for weights, currently one input tensor only have one scale
- neural_compressor.adaptor.ox_utils.util.insert_smooth_mul_op_per_input(scales, shape_infos, input_tensors_2_weights_nodes)[source]
Insert the mul layer after inupt.
The ops with the same input will share one mul layer.
- Parameters:
scales – The smooth scales
shape_infos – the input tensor shape information
input_tensors_2_weights_nodes – A dict
- Returns:
added Mul layers new_init_tensors: added scales tensor
- Return type:
new_added_mul_nodes
- neural_compressor.adaptor.ox_utils.util.adjust_weights_per_op(model, nodes, scales)[source]
Adjust the weights per input scale.
Each op has one individual Mul layer.
- Parameters:
model – The onnx model
nodes – The nodes whose weights needs to be adjustd
scales – The input scales
- neural_compressor.adaptor.ox_utils.util.adjust_weights_per_input(model, nodes, scales)[source]
Adjust the weights per input scale.
The ops with the same input will share one mul layer
- Parameters:
model – The onnx model
nodes – The nodes whose weights needs to be adjustd
scales – The input scales
- neural_compressor.adaptor.ox_utils.util.insert_smooth_mul_op_per_op(scales, shape_infos, input_tensors_2_weights_nodes)[source]
Insert the mul layer before op.
Each op has one individual Mul layer.
- Parameters:
scales – The smooth scales
shape_infos – the input tensor shape information
input_tensors_2_weights_nodes – A dict
- Returns:
added Mul layers new_init_tensors: added scales tensor name_2_nodes: a dict, key is the node name, value is the node
- Return type:
new_added_mul_nodes