neural_compressor.onnxrt.algorithms

Package Contents

Classes

Smoother

Fake input channel quantization.

Functions

apply_rtn_on_model(→ onnx.ModelProto)

Apply RTN on onnx model.

apply_gptq_on_model(→ onnx.ModelProto)

Apply GPTQ on onnx model.

apply_awq_on_model(→ onnx.ModelProto)

Apply Activation-aware Weight quantization(AWQ) on onnx model.

layer_wise_quant(...)

Quantize model layer by layer to save memory.