neural_compressor.torch.quantization.autotune

Intel Neural Compressor Pytorch quantization AutoTune API.

Functions

get_rtn_double_quant_config_set(...)

Generate RTN double quant config set.

get_all_config_set(...)

Generate all quant config set.

autotune(model, tune_config, eval_fn[, eval_args, ...])

The main entry of auto-tune.

Module Contents

neural_compressor.torch.quantization.autotune.get_rtn_double_quant_config_set() List[neural_compressor.torch.quantization.config.RTNConfig][source]

Generate RTN double quant config set.

Returns:

a set of quant config

Return type:

List[RTNConfig]

neural_compressor.torch.quantization.autotune.get_all_config_set() neural_compressor.common.base_config.BaseConfig | List[neural_compressor.common.base_config.BaseConfig][source]

Generate all quant config set.

Returns:

a set of quant config

Return type:

Union[BaseConfig, List[BaseConfig]]

neural_compressor.torch.quantization.autotune.autotune(model: torch.nn.Module, tune_config: neural_compressor.common.base_tuning.TuningConfig, eval_fn: Callable, eval_args=None, run_fn=None, run_args=None, example_inputs=None)[source]

The main entry of auto-tune.

Parameters:
  • model (torch.nn.Module) – _description_

  • tune_config (TuningConfig) – _description_

  • eval_fn (Callable) – for evaluation of quantized models.

  • eval_args (tuple, optional) – arguments used by eval_fn. Defaults to None.

  • run_fn (Callable, optional) – for calibration to quantize model. Defaults to None.

  • run_args (tuple, optional) – arguments used by run_fn. Defaults to None.

  • example_inputs (tensor/tuple/dict, optional) – used to trace torch model. Defaults to None.

Returns:

The quantized model.