neural_compressor.adaptor.torch_utils.waq.smooth_quant

Module Contents

Classes

TorchSmoothQuant

Fake input channel quantization, for more details please refer to

class neural_compressor.adaptor.torch_utils.waq.smooth_quant.TorchSmoothQuant(model, dataloader=None, example_inputs=None, q_func=None, traced_model=None)[source]

Fake input channel quantization, for more details please refer to [1] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models [2] SPIQ: Data-Free Per-Channel Static Input Quantization Currently, we only handle the layers whose smooth scale could be absorbed, we will support other layers later.

We only support inplace mode which means the model weights will be changed, you can call recover function to recover the weights if needed