neural_compressor.onnxrt.quantization.algorithm_entry

Module Contents

Functions

smooth_quant_entry(→ onnx.ModelProto)

Apply smooth quant.

rtn_quantize_entry(→ onnx.ModelProto)

The main entry to apply rtn quantization.

gptq_quantize_entry(→ onnx.ModelProto)

The main entry to apply gptq quantization.

awq_quantize_entry(→ onnx.ModelProto)

The main entry to apply awq quantization.

neural_compressor.onnxrt.quantization.algorithm_entry.smooth_quant_entry(model: pathlib.Path | str, quant_config: neural_compressor.onnxrt.quantization.config.SmoohQuantConfig, calibration_data_reader: neural_compressor.onnxrt.quantization.calibrate.CalibrationDataReader, *args, **kwargs) onnx.ModelProto[source]

Apply smooth quant.

neural_compressor.onnxrt.quantization.algorithm_entry.rtn_quantize_entry(model: pathlib.Path | str, quant_config: neural_compressor.onnxrt.quantization.config.RTNConfig, *args, **kwargs) onnx.ModelProto[source]

The main entry to apply rtn quantization.

neural_compressor.onnxrt.quantization.algorithm_entry.gptq_quantize_entry(model: pathlib.Path | str, quant_config: neural_compressor.onnxrt.quantization.config.GPTQConfig, calibration_data_reader: neural_compressor.onnxrt.quantization.calibrate.CalibrationDataReader, *args, **kwargs) onnx.ModelProto[source]

The main entry to apply gptq quantization.

neural_compressor.onnxrt.quantization.algorithm_entry.awq_quantize_entry(model: pathlib.Path | str, quant_config: neural_compressor.onnxrt.quantization.config.AWQConfig, calibration_data_reader: neural_compressor.onnxrt.quantization.calibrate.CalibrationDataReader, *args, **kwargs) onnx.ModelProto[source]

The main entry to apply awq quantization.