neural_compressor.adaptor.ox_utils.smooth_quant

SmoothQuant for onnxrt adaptor.

Classes

ORTSmoothQuant

Fake input channel quantization.

Functions

`get_quant_dequant_output`(model, input_data, ...)	Get loss between fp32 output and QDQ output.
`make_sub_graph`(node, inits, input_data, output_data, ...)	Build a model with the specific node.
`quant_dequant_data`(data[, reduce_range, qType, scheme])	Quantize and then dequantize data.

Module Contents

neural_compressor.adaptor.ox_utils.smooth_quant.get_quant_dequant_output(model, input_data, output_data, reduce_range, backend)[source]

Get loss between fp32 output and QDQ output.

Parameters:

model (object) – model
input_data (numpy.ndarray) – fp32 input
output_data (numpy.ndarray) – fp32 output
reduce_range (bool) – use 7 bit or not
backend (str) – execution provider

neural_compressor.adaptor.ox_utils.smooth_quant.make_sub_graph(node, inits, input_data, output_data, reduce_range, opset, ir_version)[source]

Build a model with the specific node.

Parameters:

node (object) – node
inits (list) – initializer inputs of this node
input_data (numpy.ndarray) – fp32 input
output_data (numpy.ndarray) – fp32 output
reduce_range (bool) – use 7 bit or not
opset (object) – opset of the model
ir_version (object) – ir_version of the model

neural_compressor.adaptor.ox_utils.smooth_quant.quant_dequant_data(data, reduce_range=False, qType=3, scheme='sym')[source]

Quantize and then dequantize data.

Parameters:

data (numpy.ndarray) – target data
reduce_range (bool) – use 7 bit or not
qType (int) – data type
scheme (str) – sym or asym quantization

class neural_compressor.adaptor.ox_utils.smooth_quant.ORTSmoothQuant(model, dataloader, reduce_range=False, backend='CPUExecutionProvider')[source]

Fake input channel quantization.

For more details please refer to: [1] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models [2] SPIQ: Data-Free Per-Channel Static Input Quantization We only support inplace mode which means the model weights will be changed, you can call recover function to recover the weights if needed.