neural_compressor.onnxrt.quantization.autotune
Module Contents
Functions
|
|
|
The main entry of auto-tune. |
- neural_compressor.onnxrt.quantization.autotune.autotune(model_input: pathlib.Path | str, tune_config: neural_compressor.common.base_tuning.TuningConfig, eval_fn: Callable, eval_args: Tuple[Any] | None = None, calibration_data_reader: neural_compressor.onnxrt.quantization.calibrate.CalibrationDataReader = None) None | onnx.ModelProto [source]
The main entry of auto-tune.
- Parameters:
model_input (Union[Path, str]) – onnx model path.
tune_config (TuningConfig) – tuning config. TuningConfig is created with algorithm configs, parameters supported tuning are in their params_list. Support: Expand parameters to a list of parameters like TuningConfig(config_set=[RTNConfig(weight_bits=[4, 8])]) Pass a list of configs like TuningConfig(config_set=[RTNConfig(), GPTQConfig()])
eval_fn (Callable) – evaluate function. During evaluation, autotune will only pass model path as the input of function.
eval_args (Optional[Tuple[Any]]) – evaluate arguments. Positional arguments for eval_fn.
calibration_data_reader (CalibrationDataReader) – dataloader for calibration.