neural_compressor.onnxrt.algorithms
Package Contents
Classes
|
Fake input channel quantization. |
Functions
|
Apply RTN on onnx model. |
|
Apply GPTQ on onnx model. |
|
Apply Activation-aware Weight quantization(AWQ) on onnx model. |
|
Quantize model layer by layer to save memory. |