neural_compressor.torch.algorithms.pt2e_quant.core

Module Contents

Classes

W8A8StaticQuantizer

The base quantizer for all algorithm quantizers.

class neural_compressor.torch.algorithms.pt2e_quant.core.W8A8StaticQuantizer(quant_config: Any | None = None)[source]

The base quantizer for all algorithm quantizers.

The Quantizer unifies the interfaces across various quantization algorithms, including GPTQ, RTN, etc. Given a float model, Quantizer apply the quantization algorithm to the model according to the quant_config.

To implement a new quantization algorithm,, inherit from Quantizer and implement the following methods:
  • prepare: prepare a given model for convert.

  • convert: convert a prepared model to a quantized model.

Note: quantize and execute are optional for new quantization algorithms.