neural_compressor.adaptor.ox_utils.util
¶
Helper classes or functions for onnxrt adaptor.
Module Contents¶
Classes¶
Represent QuantType value. |
|
Represents a casted tensor info. |
|
Represents a linearly quantized value (input/output/intializer). |
|
Represents a linearly quantized weight input from ONNX operators. |
|
Represent QuantizationMode value. |
|
Represent QuantizedValueType value. |
|
Represent QuantFormat value. |
Functions¶
|
Map data type and its string representation. |
|
Make a QuantizeLinear node. |
|
Make a DequantizeLinear node. |
|
Wheter inuput B is transposed. |
|
Split shared tensor. |
|
Convert tensor float to target dtype. |
|
Remove initializer from model input. |
|
Collect model outputs. |
|
Quantize data with scale and zero point. |
|
Calculate scale and zero point. |
|
Quantize data. |
|
Quantize tensor per-channel. |
|
Dequantize tensor with sacale and zero point. |
|
Dequantize tensor. |
|
Quantize numpy array. |
|
Convert attribute to kwarg format for use with onnx.helper.make_node. |
|
Helper function to find item by name in a list. |
- neural_compressor.adaptor.ox_utils.util.dtype_to_name(dtype_mapping, dtype)¶
Map data type and its string representation.
- class neural_compressor.adaptor.ox_utils.util.QuantType¶
Bases:
enum.Enum
Represent QuantType value.
- neural_compressor.adaptor.ox_utils.util.make_quant_node(name, inputs, outputs)¶
Make a QuantizeLinear node.
- neural_compressor.adaptor.ox_utils.util.make_dquant_node(name, inputs, outputs, axis=None)¶
Make a DequantizeLinear node.
- neural_compressor.adaptor.ox_utils.util.is_B_transposed(node)¶
Wheter inuput B is transposed.
Split shared tensor.
- neural_compressor.adaptor.ox_utils.util.cast_tensor(tensor, dtype)¶
Convert tensor float to target dtype.
- Parameters:
tensor (TensorProto) – TensorProto object
dtype (int) – target data type
- neural_compressor.adaptor.ox_utils.util.remove_init_from_model_input(model)¶
Remove initializer from model input.
- neural_compressor.adaptor.ox_utils.util.collate_preds(results)¶
Collect model outputs.
- neural_compressor.adaptor.ox_utils.util.quantize_data_with_scale_zero(data, qType, scheme, scale, zero_point)¶
Quantize data with scale and zero point.
- To pack weights, we compute a linear transformation
when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and
- when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where
m = max(abs(rmin), abs(rmax))
- Parameters:
data (np.array) – data to quantize
qType (int) – data type to quantize to. Supported types UINT8 and INT8
scheme (string) – sym or asym quantization.
scale (float) – computed scale of quantized data
zero_point (uint8 or int8) – computed zero point of quantized data
- neural_compressor.adaptor.ox_utils.util.calculate_scale_zp(rmin, rmax, quantize_range, qType, scheme)¶
Calculate scale and zero point.
- neural_compressor.adaptor.ox_utils.util.quantize_data(data, quantize_range, qType, scheme)¶
Quantize data.
- To pack weights, we compute a linear transformation
when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and
- when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where
m = max(abs(rmin), abs(rmax))
and add necessary intermediate nodes to trasnform quantized weight to full weight using the equation r = S(q-z), where
r: real original value q: quantized value S: scale z: zero point
- Parameters:
data (array) – data to quantize
quantize_range (list) – list of data to weight pack.
qType (int) – data type to quantize to. Supported types UINT8 and INT8
scheme (string) – sym or asym quantization.
- neural_compressor.adaptor.ox_utils.util.quantize_data_per_channel(tensor_value, qType, scheme, scale_value, zo_value)¶
Quantize tensor per-channel.
- neural_compressor.adaptor.ox_utils.util.dequantize_data_with_scale_zero(tensor_value, scale_value, zo_value)¶
Dequantize tensor with sacale and zero point.
- neural_compressor.adaptor.ox_utils.util.dequantize_data(tensor_value, scale_value, zo_value, axis=0)¶
Dequantize tensor.
- class neural_compressor.adaptor.ox_utils.util.ValueInfo(tensor_name, dtype, new_dtype)¶
Represents a casted tensor info.
- class neural_compressor.adaptor.ox_utils.util.QuantizedValue(name, new_quantized_name, scale_name, zero_point_name, quantized_value_type, axis=None, qType=QuantType.QUInt8)¶
Represents a linearly quantized value (input/output/intializer).
- class neural_compressor.adaptor.ox_utils.util.QuantizedInitializer(name, initializer, rmins, rmaxs, zero_points, scales, data=[], quantized_data=[], axis=None, qType=QuantType.QUInt8)¶
Represents a linearly quantized weight input from ONNX operators.
- class neural_compressor.adaptor.ox_utils.util.QuantizationMode¶
Bases:
enum.Enum
Represent QuantizationMode value.
- class neural_compressor.adaptor.ox_utils.util.QuantizedValueType¶
Bases:
enum.Enum
Represent QuantizedValueType value.
- class neural_compressor.adaptor.ox_utils.util.QuantFormat¶
Bases:
enum.Enum
Represent QuantFormat value.
- neural_compressor.adaptor.ox_utils.util.quantize_nparray(qtype, arr, scale, zero_point, low=None, high=None)¶
Quantize numpy array.
- neural_compressor.adaptor.ox_utils.util.attribute_to_kwarg(attribute)¶
Convert attribute to kwarg format for use with onnx.helper.make_node.
- neural_compressor.adaptor.ox_utils.util.find_by_name(name, item_list)¶
Helper function to find item by name in a list.