neural_compressor.torch.quantization.quantize

Module Contents

Functions

quantize(→ torch.nn.Module)

The main entry to quantize model with static mode.

prepare(model, quant_config[, inplace, example_inputs])

Prepare the model for calibration.

convert(model[, quant_config, inplace])

Convert the prepared model to a quantized model.

neural_compressor.torch.quantization.quantize.quantize(model: torch.nn.Module, quant_config: neural_compressor.common.base_config.BaseConfig, run_fn: Callable = None, run_args: Any = None, inplace: bool = True, example_inputs: Any = None) torch.nn.Module[source]

The main entry to quantize model with static mode.

Parameters:
  • model – a float model to be quantized.

  • quant_config – a quantization configuration.

  • run_fn – a calibration function for calibrating the model. Defaults to None.

  • run_args – positional arguments for run_fn. Defaults to None.

  • example_inputs – used to trace torch model.

Returns:

The quantized model.

neural_compressor.torch.quantization.quantize.prepare(model: torch.nn.Module, quant_config: neural_compressor.common.base_config.BaseConfig, inplace: bool = True, example_inputs: Any = None)[source]

Prepare the model for calibration.

Insert observers into the model so that it can monitor the input and output tensors during calibration.

Parameters:
  • model (torch.nn.Module) – origin model

  • quant_config (BaseConfig) – path to quantization config

  • inplace (bool) – It will change the given model in-place if True.

  • example_inputs – used to trace torch model.

Returns:

prepared and calibrated module.

neural_compressor.torch.quantization.quantize.convert(model: torch.nn.Module, quant_config: neural_compressor.common.base_config.BaseConfig = None, inplace: bool = True)[source]

Convert the prepared model to a quantized model.

Parameters:
  • model (torch.nn.Module) – the prepared model

  • quant_config (BaseConfig, optional) – path to quantization config

  • inplace (bool, optional) – It will change the given model in-place if True.

Returns:

The quantized model.