neural_compressor.onnxrt.algorithms.smoother.core

Smoother for onnxrt.

Module Contents

Classes

Smoother

Fake input channel quantization.

class neural_compressor.onnxrt.algorithms.smoother.core.Smoother(model: onnx.ModelProto | neural_compressor.onnxrt.utils.onnx_model.ONNXModel | pathlib.Path | str, dataloader: neural_compressor.onnxrt.quantization.calibrate.CalibrationDataReader, providers: List[str] = ['CPUExecutionProvider'])[source]

Fake input channel quantization.

For more details please refer to: [1] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models [2] SPIQ: Data-Free Per-Channel Static Input Quantization We only support inplace mode which means the model weights will be changed, you can call recover function to recover the weights if needed.