neural_compressor.adaptor.ox_utils.smooth_quant

SmoothQuant for onnxrt adaptor.

Module Contents

Classes

ORTSmoothQuant

Fake input channel quantization.

Functions

get_quant_dequant_output(model, input_data, ...)

Get loss between fp32 output and QDQ output.

make_sub_graph(node, inits, input_data, output_data, ...)

Build a model with the specific node.

quant_dequant_data(data[, reduce_range, qType, scheme])

Quantize and then dequantize data.

neural_compressor.adaptor.ox_utils.smooth_quant.get_quant_dequant_output(model, input_data, output_data, reduce_range, backend)[source]

Get loss between fp32 output and QDQ output.

Parameters:
  • model (object) – model

  • input_data (numpy.ndarray) – fp32 input

  • output_data (numpy.ndarray) – fp32 output

  • reduce_range (bool) – use 7 bit or not

  • backend (str) – execution provider

neural_compressor.adaptor.ox_utils.smooth_quant.make_sub_graph(node, inits, input_data, output_data, reduce_range, opset, ir_version)[source]

Build a model with the specific node.

Parameters:
  • node (object) – node

  • inits (list) – initializer inputs of this node

  • input_data (numpy.ndarray) – fp32 input

  • output_data (numpy.ndarray) – fp32 output

  • reduce_range (bool) – use 7 bit or not

  • opset (object) – opset of the model

  • ir_version (object) – ir_version of the model

neural_compressor.adaptor.ox_utils.smooth_quant.quant_dequant_data(data, reduce_range=False, qType=3, scheme='sym')[source]

Quantize and then dequantize data.

Parameters:
  • data (numpy.ndarray) – target data

  • reduce_range (bool) – use 7 bit or not

  • qType (int) – data type

  • scheme (str) – sym or asym quantization

class neural_compressor.adaptor.ox_utils.smooth_quant.ORTSmoothQuant(model, dataloader, reduce_range=False, backend='CPUExecutionProvider')[source]

Fake input channel quantization.

For more details please refer to: [1] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models [2] SPIQ: Data-Free Per-Channel Static Input Quantization We only support inplace mode which means the model weights will be changed, you can call recover function to recover the weights if needed.