neural_compressor.onnxrt.utils.utility

Module Contents

Functions

find_by_name(name, item_list)

Helper function to find item by name in a list.

simple_progress_bar(total, i)

Progress bar for cases where tqdm can't be used.

register_algo(name)

Decorator function to register algorithms in the algos_mapping dictionary.

get_model_info(→ List[Tuple[str, Callable]])

is_B_transposed(node)

Whether inuput B is transposed.

get_qrange_for_qType(qType[, reduce_range])

Helper function to get the quantization range for a type.

quantize_data(data, quantize_range, qType, scheme)

Quantize data.

check_model_with_infer_shapes(model)

Check if the model has been shape inferred.

Attributes

ONNXRT116_VERSION

ONNXRT1161_VERSION

algos_mapping

WHITE_MODULE_LIST

MAXIMUM_PROTOBUF

PRIORITY_RTN

PRIORITY_GPTQ

PRIORITY_AWQ

PRIORITY_SMOOTH_QUANT

dtype_mapping

neural_compressor.onnxrt.utils.utility.find_by_name(name, item_list)[source]

Helper function to find item by name in a list.

neural_compressor.onnxrt.utils.utility.simple_progress_bar(total, i)[source]

Progress bar for cases where tqdm can’t be used.

neural_compressor.onnxrt.utils.utility.register_algo(name)[source]

Decorator function to register algorithms in the algos_mapping dictionary.

Usage example:

@register_algo(name=example_algo) def example_algo(model: Union[onnx.ModelProto, Path, str],

quant_config: RTNConfig) -> onnx.ModelProto:

Parameters:

name (str) – The name under which the algorithm function will be registered.

Returns:

The decorator function to be used with algorithm functions.

Return type:

decorator

neural_compressor.onnxrt.utils.utility.is_B_transposed(node)[source]

Whether inuput B is transposed.

neural_compressor.onnxrt.utils.utility.get_qrange_for_qType(qType, reduce_range=False)[source]

Helper function to get the quantization range for a type.

Parameters:
  • qType (int) – data type

  • reduce_range (bool, optional) – use 7 bit or not. Defaults to False.

neural_compressor.onnxrt.utils.utility.quantize_data(data, quantize_range, qType, scheme)[source]

Quantize data.

To pack weights, we compute a linear transformation
  • when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and

  • when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where

    m = max(abs(rmin), abs(rmax))

and add necessary intermediate nodes to transform quantized weight to full weight using the equation r = S(q-z), where

r: real original value q: quantized value S: scale z: zero point

Parameters:
  • data (array) – data to quantize

  • quantize_range (list) – list of data to weight pack.

  • qType (int) – data type to quantize to. Supported types UINT8 and INT8

  • scheme (string) – sym or asym quantization.

neural_compressor.onnxrt.utils.utility.check_model_with_infer_shapes(model)[source]

Check if the model has been shape inferred.