neural_compressor.torch.algorithms.weight_only.teq
Module Contents
Classes
Weight-only quantization, Trainable Equivalent Transformation (TEQ). |
|
The base quantizer for all algorithm quantizers. |
- class neural_compressor.torch.algorithms.weight_only.teq.TrainableEquivalentTransformation(model, weight_config={}, absorb_to_layer={}, folding=True, example_inputs=None)[source]
Weight-only quantization, Trainable Equivalent Transformation (TEQ).
- class neural_compressor.torch.algorithms.weight_only.teq.TEQuantizer(quant_config, folding, absorb_to_layer, example_inputs)[source]
The base quantizer for all algorithm quantizers.
The Quantizer unifies the interfaces across various quantization algorithms, including GPTQ, RTN, etc. Given a float model, Quantizer apply the quantization algorithm to the model according to the quant_config.
- To implement a new quantization algorithm,, inherit from Quantizer and implement the following methods:
prepare: prepare a given model for convert.
convert: convert a prepared model to a quantized model.
Note: quantize and execute are optional for new quantization algorithms.