neural_compressor.onnxrt.algorithms.layer_wise.core

Module Contents

Functions

layer_wise_quant(...)

Quantize model layer by layer to save memory.

neural_compressor.onnxrt.algorithms.layer_wise.core.layer_wise_quant(model: onnx.ModelProto | neural_compressor.onnxrt.utils.onnx_model.ONNXModel | pathlib.Path | str, quant_func: Callable, weight_config: dict, data_reader: neural_compressor.onnxrt.quantization.calibrate.CalibrationDataReader = None, *args, **kwargs) neural_compressor.onnxrt.utils.onnx_model.ONNXModel[source]

Quantize model layer by layer to save memory.

Parameters:
  • model (Union[onnx.ModelProto, ONNXModel, Path, str]) – onnx model.

  • quant_func (Callable) – quantization algo function.

  • weight_config (dict) – quantization config.

  • data_reader (CalibrationDataReader, optional) – data_reader for calibration. Defaults to None.

Returns:

_description_

Return type:

_type_