neural_compressor.adaptor.ox_utils.util

Helper classes or functions for onnxrt adaptor.

Module Contents

Classes

QuantType

Represent QuantType value.

ValueInfo

Represents a casted tensor info.

QuantizedValue

Represents a linearly quantized value (input/output/initializer).

QuantizedInitializer

Represents a linearly quantized weight input from ONNX operators.

QuantizationMode

Represent QuantizationMode value.

QuantizedValueType

Represent QuantizedValueType value.

QuantFormat

Represent QuantFormat value.

Functions

get_node_original_name(→ str)

Get the original name of the given node.

simple_progress_bar(total, i)

Progress bar for cases where tqdm can't be used.

dtype_to_name(dtype_mapping, dtype)

Map data type and its string representation.

make_quant_node(name, inputs, outputs[, axis])

Make a QuantizeLinear node.

make_dquant_node(name, inputs, outputs[, axis])

Make a DequantizeLinear node.

is_B_transposed(node)

Whether inuput B is transposed.

split_shared_bias(model)

Split shared tensor.

float_to_float16(tensor)

Convert float to float16.

float_to_bfloat16(tensor)

Convert float to bfloat16.

cast_tensor(tensor, dtype[, is_large_model])

Convert tensor float to target dtype.

remove_init_from_model_input(model)

Remove initializer from model input.

collate_preds(results)

Collect model outputs.

quantize_data_with_scale_zero(data, qType, scheme, ...)

Quantize data with scale and zero point.

calculate_scale_zp(rmin, rmax, quantize_range, qType, ...)

Calculate scale and zero point.

quantize_data(data, quantize_range, qType, scheme)

Quantize data.

quantize_data_per_channel(data, axis, quantize_range, ...)

Quantize tensor per-channel.

dequantize_data_with_scale_zero(tensor_value, ...)

Dequantize tensor with scale and zero point.

dequantize_data(tensor_value, scale_value, zo_value[, ...])

Dequantize tensor.

quantize_nparray(qtype, arr, scale, zero_point[, low, ...])

Quantize numpy array.

attribute_to_kwarg(attribute)

Convert attribute to kwarg format for use with onnx.helper.make_node.

find_by_name(name, item_list)

Helper function to find item by name in a list.

trt_env_setup(model)

Set environment variable for Tensorrt Execution Provider.

to_numpy(data)

Convert to numpy ndarrays.

infer_shapes(in_mp[, int_max, auto_merge, ...])

Symbolic shape inference.

neural_compressor.adaptor.ox_utils.util.get_node_original_name(node) str[source]

Get the original name of the given node.

neural_compressor.adaptor.ox_utils.util.simple_progress_bar(total, i)[source]

Progress bar for cases where tqdm can’t be used.

neural_compressor.adaptor.ox_utils.util.dtype_to_name(dtype_mapping, dtype)[source]

Map data type and its string representation.

class neural_compressor.adaptor.ox_utils.util.QuantType[source]

Represent QuantType value.

neural_compressor.adaptor.ox_utils.util.make_quant_node(name, inputs, outputs, axis=None)[source]

Make a QuantizeLinear node.

neural_compressor.adaptor.ox_utils.util.make_dquant_node(name, inputs, outputs, axis=None)[source]

Make a DequantizeLinear node.

neural_compressor.adaptor.ox_utils.util.is_B_transposed(node)[source]

Whether inuput B is transposed.

neural_compressor.adaptor.ox_utils.util.split_shared_bias(model)[source]

Split shared tensor.

neural_compressor.adaptor.ox_utils.util.float_to_float16(tensor)[source]

Convert float to float16.

neural_compressor.adaptor.ox_utils.util.float_to_bfloat16(tensor)[source]

Convert float to bfloat16.

neural_compressor.adaptor.ox_utils.util.cast_tensor(tensor, dtype, is_large_model=False)[source]

Convert tensor float to target dtype.

Parameters:
  • tensor (TensorProto) – TensorProto object

  • dtype (int) – target data type

  • is_large_model (bool) – if is large model, make tensor with raw=True

neural_compressor.adaptor.ox_utils.util.remove_init_from_model_input(model)[source]

Remove initializer from model input.

neural_compressor.adaptor.ox_utils.util.collate_preds(results)[source]

Collect model outputs.

neural_compressor.adaptor.ox_utils.util.quantize_data_with_scale_zero(data, qType, scheme, scale, zero_point)[source]

Quantize data with scale and zero point.

To pack weights, we compute a linear transformation
  • when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and

  • when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where

    m = max(abs(rmin), abs(rmax))

Parameters:
  • data (np.array) – data to quantize

  • qType (int) – data type to quantize to. Supported types UINT8 and INT8

  • scheme (string) – sym or asym quantization.

  • scale (float) – computed scale of quantized data

  • zero_point (uint8 or int8) – computed zero point of quantized data

neural_compressor.adaptor.ox_utils.util.calculate_scale_zp(rmin, rmax, quantize_range, qType, scheme)[source]

Calculate scale and zero point.

neural_compressor.adaptor.ox_utils.util.quantize_data(data, quantize_range, qType, scheme)[source]

Quantize data.

To pack weights, we compute a linear transformation
  • when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and

  • when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where

    m = max(abs(rmin), abs(rmax))

and add necessary intermediate nodes to transform quantized weight to full weight using the equation r = S(q-z), where

r: real original value q: quantized value S: scale z: zero point

Parameters:
  • data (array) – data to quantize

  • quantize_range (list) – list of data to weight pack.

  • qType (int) – data type to quantize to. Supported types UINT8 and INT8

  • scheme (string) – sym or asym quantization.

neural_compressor.adaptor.ox_utils.util.quantize_data_per_channel(data, axis, quantize_range, qType, scheme)[source]

Quantize tensor per-channel.

neural_compressor.adaptor.ox_utils.util.dequantize_data_with_scale_zero(tensor_value, scale_value, zo_value)[source]

Dequantize tensor with scale and zero point.

neural_compressor.adaptor.ox_utils.util.dequantize_data(tensor_value, scale_value, zo_value, axis=0)[source]

Dequantize tensor.

class neural_compressor.adaptor.ox_utils.util.ValueInfo(tensor_name, dtype, new_dtype)[source]

Represents a casted tensor info.

class neural_compressor.adaptor.ox_utils.util.QuantizedValue(name, new_quantized_name, scale_name, zero_point_name, quantized_value_type, axis=None, qType=QuantType.QUInt8)[source]

Represents a linearly quantized value (input/output/initializer).

class neural_compressor.adaptor.ox_utils.util.QuantizedInitializer(name, initializer, rmins, rmaxs, zero_points, scales, data=[], quantized_data=[], axis=None, qType=QuantType.QUInt8)[source]

Represents a linearly quantized weight input from ONNX operators.

class neural_compressor.adaptor.ox_utils.util.QuantizationMode[source]

Represent QuantizationMode value.

class neural_compressor.adaptor.ox_utils.util.QuantizedValueType[source]

Represent QuantizedValueType value.

class neural_compressor.adaptor.ox_utils.util.QuantFormat[source]

Represent QuantFormat value.

neural_compressor.adaptor.ox_utils.util.quantize_nparray(qtype, arr, scale, zero_point, low=None, high=None)[source]

Quantize numpy array.

neural_compressor.adaptor.ox_utils.util.attribute_to_kwarg(attribute)[source]

Convert attribute to kwarg format for use with onnx.helper.make_node.

neural_compressor.adaptor.ox_utils.util.find_by_name(name, item_list)[source]

Helper function to find item by name in a list.

neural_compressor.adaptor.ox_utils.util.trt_env_setup(model)[source]

Set environment variable for Tensorrt Execution Provider.

neural_compressor.adaptor.ox_utils.util.to_numpy(data)[source]

Convert to numpy ndarrays.

neural_compressor.adaptor.ox_utils.util.infer_shapes(in_mp, int_max=2**31 - 1, auto_merge=False, guess_output_rank=False, verbose=0, base_dir='')[source]

Symbolic shape inference.