neural_compressor.torch.quantization.autotune
Intel Neural Compressor Pytorch quantization AutoTune API.
Functions
Generate RTN double quant config set. |
|
|
Generate all quant config set. |
|
The main entry of auto-tune. |
Module Contents
- neural_compressor.torch.quantization.autotune.get_rtn_double_quant_config_set() List[neural_compressor.torch.quantization.config.RTNConfig] [source]
Generate RTN double quant config set.
- Returns:
a set of quant config
- Return type:
List[RTNConfig]
- neural_compressor.torch.quantization.autotune.get_all_config_set() neural_compressor.common.base_config.BaseConfig | List[neural_compressor.common.base_config.BaseConfig] [source]
Generate all quant config set.
- Returns:
a set of quant config
- Return type:
Union[BaseConfig, List[BaseConfig]]
- neural_compressor.torch.quantization.autotune.autotune(model: torch.nn.Module, tune_config: neural_compressor.common.base_tuning.TuningConfig, eval_fn: Callable, eval_args=None, run_fn=None, run_args=None, example_inputs=None)[source]
The main entry of auto-tune.
- Parameters:
model (torch.nn.Module) – _description_
tune_config (TuningConfig) – _description_
eval_fn (Callable) – for evaluation of quantized models.
eval_args (tuple, optional) – arguments used by eval_fn. Defaults to None.
run_fn (Callable, optional) – for calibration to quantize model. Defaults to None.
run_args (tuple, optional) – arguments used by run_fn. Defaults to None.
example_inputs (tensor/tuple/dict, optional) – used to trace torch model. Defaults to None.
- Returns:
The quantized model.