neural_compressor.adaptor.ox_utils.quantizer

Quantizer for onnx models.

Module Contents

Classes

Quantizer

Quantizer class.

class neural_compressor.adaptor.ox_utils.quantizer.Quantizer(model, q_config, mode, static, quantization_params, op_types_to_quantize, fallback_list=['fp32'], reduce_range=None)

Quantizer class.

check_opset_version()

Check opset version.

should_quantize(node)

Check if node should be quantized.

quantize_model()

Quantize onnx model.

merge_dedicated_qdq_pair()

Merge dedicated Q/DQ pairs.

should_cast(node)

Check if node should be casted.

insert_qdq()

Insert Q/DQ pairs.

should_convert(node)

Check if node should be converted.

convert_qdq_to_operator_oriented()

Convert QDQ to QOperator format.

remove_redundant_pairs()

Remove redudant Q/DQ, Cast/Cast pairs.

dtype_cast(node, cfg, keep_io_types=True)

Cast node dtype.

quantize_outputs(node, initializer_use_weight_qType=True, direct_int8=False)

Quantize node outputs.

quantize_inputs(node, indices=None, initializer_use_weight_qType=True, direct_int8=False)

Quantize node inputs.

quantize_bias_tensor(node)

Quantize bias.

quantize_bias(bias_name, input_name, weight_name, beta=1.0)

Quantized the bias.

Zero Point == 0 and Scale == Input_Scale * Weight_Scale

quantize_weights_per_channel(node, indices, weight_qType, scheme, axis)

Quantize weights per-channel.

quantize_weight_per_channel(weight_name, weight_qType, scheme, channel_axis)

Quantize weight per-channel.

static tensor_proto_to_array(initializer)

Convert TensorProto to array.

get_bias_add_nodes(node, weight_name, last_output, quantized_bias_name)

Given a node, this function handles bias add by adding a “reshape” node on bias and an “add” node.

Parameters:
  • node (NodeProto) – current node (Conv)

  • weight_name (string) – weight name

  • last_output (_type_) – output of previous node (input to bias add)

  • quantized_bias_name (string) – bias name

is_valid_quantize_weight(weight_name)

Check weight can be quantized.

dequantize_tensor(node, value_name)

Dequantize tensor.