neural_compressor.torch.quantization.quantize

Intel Neural Compressor Pytorch quantization base API.

Functions

need_apply(configs_mapping, algo_name)

Check whether to apply this algorithm according to configs_mapping.

quantize(→ torch.nn.Module)

The main entry to quantize model with static mode.

prepare(model, quant_config[, inplace, example_inputs])

Prepare the model for calibration.

convert(model[, quant_config, inplace])

Convert the prepared model to a quantized model.

finalize_calibration(model)

Generate and save calibration info.

Module Contents

neural_compressor.torch.quantization.quantize.need_apply(configs_mapping: Dict[Tuple[str, callable], neural_compressor.common.base_config.BaseConfig], algo_name)[source]

Check whether to apply this algorithm according to configs_mapping.

Parameters:
  • configs_mapping (Dict[Tuple[str, callable], BaseConfig]) – configs mapping

  • algo_name (str) – algo name

Returns:

True or False.

Return type:

Bool

neural_compressor.torch.quantization.quantize.quantize(model: torch.nn.Module, quant_config: neural_compressor.common.base_config.BaseConfig, run_fn: Callable = None, run_args: Any = None, inplace: bool = True, example_inputs: Any = None) torch.nn.Module[source]

The main entry to quantize model with static mode.

Parameters:
  • model – a float model to be quantized.

  • quant_config – a quantization configuration.

  • run_fn – a calibration function for calibrating the model. Defaults to None.

  • run_args – positional arguments for run_fn. Defaults to None.

  • example_inputs – used to trace torch model.

Returns:

The quantized model.

neural_compressor.torch.quantization.quantize.prepare(model: torch.nn.Module, quant_config: neural_compressor.common.base_config.BaseConfig, inplace: bool = True, example_inputs: Any = None)[source]

Prepare the model for calibration.

Insert observers into the model so that it can monitor the input and output tensors during calibration.

Parameters:
  • model (torch.nn.Module) – origin model

  • quant_config (BaseConfig) – path to quantization config

  • inplace (bool, optional) – It will change the given model in-place if True.

  • example_inputs (tensor/tuple/dict, optional) – used to trace torch model.

Returns:

prepared and calibrated module.

neural_compressor.torch.quantization.quantize.convert(model: torch.nn.Module, quant_config: neural_compressor.common.base_config.BaseConfig = None, inplace: bool = True)[source]

Convert the prepared model to a quantized model.

Parameters:
  • model (torch.nn.Module) – torch model

  • quant_config (BaseConfig, optional) – path to quantization config, only required when model is not prepared.

  • inplace (bool, optional) – It will change the given model in-place if True.

Returns:

The quantized model.

neural_compressor.torch.quantization.quantize.finalize_calibration(model)[source]

Generate and save calibration info.