neural_compressor.onnxrt.algorithms.layer_wise.core
Module Contents
Functions
|
Quantize model layer by layer to save memory. |
- neural_compressor.onnxrt.algorithms.layer_wise.core.layer_wise_quant(model: onnx.ModelProto | neural_compressor.onnxrt.utils.onnx_model.ONNXModel | pathlib.Path | str, quant_func: Callable, weight_config: dict, data_reader: neural_compressor.onnxrt.quantization.calibrate.CalibrationDataReader = None, *args, **kwargs) neural_compressor.onnxrt.utils.onnx_model.ONNXModel [source]
Quantize model layer by layer to save memory.
- Parameters:
model (Union[onnx.ModelProto, ONNXModel, Path, str]) – onnx model.
quant_func (Callable) – quantization algo function.
weight_config (dict) – quantization config.
data_reader (CalibrationDataReader, optional) – data_reader for calibration. Defaults to None.
- Returns:
_description_
- Return type:
_type_