neural_compressor.adaptor.ox_utils.util
Helper classes or functions for onnxrt adaptor.
Classes
Represent QuantType value. |
|
Represents a casted tensor info. |
|
Represents a linearly quantized value (input/output/initializer). |
|
Represents a linearly quantized weight input from ONNX operators. |
|
Represent QuantizationMode value. |
|
Represent QuantizedValueType value. |
|
Represent QuantFormat value. |
Functions
|
Get the original name of the given node. |
|
Progress bar for cases where tqdm can't be used. |
|
Map data type and its string representation. |
|
Make a QuantizeLinear node. |
|
Make a DequantizeLinear node. |
|
Whether inuput B is transposed. |
|
Split shared tensor. |
|
Convert float to float16. |
|
Convert float to bfloat16. |
|
Convert tensor float to target dtype. |
|
Remove initializer from model input. |
|
Collect model outputs. |
|
Quantize data with scale and zero point. |
|
Calculate scale and zero point. |
|
Quantize data. |
|
Quantize tensor per-channel. |
|
Dequantize tensor with scale and zero point. |
|
Dequantize tensor. |
|
Quantize numpy array. |
|
Convert attribute to kwarg format for use with onnx.helper.make_node. |
|
Helper function to find item by name in a list. |
|
Set environment variable for Tensorrt Execution Provider. |
|
Convert to numpy ndarrays. |
|
Symbolic shape inference. |
Module Contents
- neural_compressor.adaptor.ox_utils.util.get_node_original_name(node) str [source]
Get the original name of the given node.
- neural_compressor.adaptor.ox_utils.util.simple_progress_bar(total, i)[source]
Progress bar for cases where tqdm can’t be used.
- neural_compressor.adaptor.ox_utils.util.dtype_to_name(dtype_mapping, dtype)[source]
Map data type and its string representation.
- neural_compressor.adaptor.ox_utils.util.make_quant_node(name, inputs, outputs, axis=None)[source]
Make a QuantizeLinear node.
- neural_compressor.adaptor.ox_utils.util.make_dquant_node(name, inputs, outputs, axis=None)[source]
Make a DequantizeLinear node.
- neural_compressor.adaptor.ox_utils.util.is_B_transposed(node)[source]
Whether inuput B is transposed.
Split shared tensor.
- neural_compressor.adaptor.ox_utils.util.float_to_bfloat16(tensor)[source]
Convert float to bfloat16.
- neural_compressor.adaptor.ox_utils.util.cast_tensor(tensor, dtype, is_large_model=False)[source]
Convert tensor float to target dtype.
- Parameters:
tensor (TensorProto) – TensorProto object
dtype (int) – target data type
is_large_model (bool) – if is large model, make tensor with raw=True
- neural_compressor.adaptor.ox_utils.util.remove_init_from_model_input(model)[source]
Remove initializer from model input.
- neural_compressor.adaptor.ox_utils.util.quantize_data_with_scale_zero(data, qType, scheme, scale, zero_point)[source]
Quantize data with scale and zero point.
- To pack weights, we compute a linear transformation
when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and
- when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where
m = max(abs(rmin), abs(rmax))
- Parameters:
data (np.array) – data to quantize
qType (int) – data type to quantize to. Supported types UINT8 and INT8
scheme (string) – sym or asym quantization.
scale (float) – computed scale of quantized data
zero_point (uint8 or int8) – computed zero point of quantized data
- neural_compressor.adaptor.ox_utils.util.calculate_scale_zp(rmin, rmax, quantize_range, qType, scheme)[source]
Calculate scale and zero point.
- neural_compressor.adaptor.ox_utils.util.quantize_data(data, quantize_range, qType, scheme)[source]
Quantize data.
- To pack weights, we compute a linear transformation
when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and
- when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where
m = max(abs(rmin), abs(rmax))
and add necessary intermediate nodes to transform quantized weight to full weight using the equation r = S(q-z), where
r: real original value q: quantized value S: scale z: zero point
- Parameters:
data (array) – data to quantize
quantize_range (list) – list of data to weight pack.
qType (int) – data type to quantize to. Supported types UINT8 and INT8
scheme (string) – sym or asym quantization.
- neural_compressor.adaptor.ox_utils.util.quantize_data_per_channel(data, axis, quantize_range, qType, scheme)[source]
Quantize tensor per-channel.
- neural_compressor.adaptor.ox_utils.util.dequantize_data_with_scale_zero(tensor_value, scale_value, zo_value)[source]
Dequantize tensor with scale and zero point.
- neural_compressor.adaptor.ox_utils.util.dequantize_data(tensor_value, scale_value, zo_value, axis=0)[source]
Dequantize tensor.
- class neural_compressor.adaptor.ox_utils.util.ValueInfo(tensor_name, dtype, new_dtype)[source]
Represents a casted tensor info.
- class neural_compressor.adaptor.ox_utils.util.QuantizedValue(name, new_quantized_name, scale_name, zero_point_name, quantized_value_type, axis=None, qType=QuantType.QUInt8)[source]
Represents a linearly quantized value (input/output/initializer).
- class neural_compressor.adaptor.ox_utils.util.QuantizedInitializer(name, initializer, rmins, rmaxs, zero_points, scales, data=[], quantized_data=[], axis=None, qType=QuantType.QUInt8)[source]
Represents a linearly quantized weight input from ONNX operators.
- class neural_compressor.adaptor.ox_utils.util.QuantizationMode[source]
Represent QuantizationMode value.
- class neural_compressor.adaptor.ox_utils.util.QuantizedValueType[source]
Represent QuantizedValueType value.
- neural_compressor.adaptor.ox_utils.util.quantize_nparray(qtype, arr, scale, zero_point, low=None, high=None)[source]
Quantize numpy array.
- neural_compressor.adaptor.ox_utils.util.attribute_to_kwarg(attribute)[source]
Convert attribute to kwarg format for use with onnx.helper.make_node.
- neural_compressor.adaptor.ox_utils.util.find_by_name(name, item_list)[source]
Helper function to find item by name in a list.