neural_compressor.torch.quantization.quantize

Intel Neural Compressor Pytorch quantization base API.

Functions

`need_apply`(configs_mapping, algo_name)	Check whether to apply this algorithm according to configs_mapping.
`quantize`(→ torch.nn.Module)	The main entry to quantize model with static mode.
`prepare`(model, quant_config[, inplace, example_inputs])	Prepare the model for calibration.
`convert`(model[, quant_config, inplace])	Convert the prepared model to a quantized model.
`finalize_calibration`(model)	Generate and save calibration info.

Module Contents

neural_compressor.torch.quantization.quantize.need_apply(configs_mapping: Dict[Tuple[str, callable], neural_compressor.common.base_config.BaseConfig], algo_name)[source]

Check whether to apply this algorithm according to configs_mapping.

Parameters:

configs_mapping (Dict[Tuple[str, callable], BaseConfig]) – configs mapping
algo_name (str) – algo name

Returns:

True or False.

Return type:

Bool

neural_compressor.torch.quantization.quantize.quantize(model: torch.nn.Module, quant_config: neural_compressor.common.base_config.BaseConfig, run_fn: Callable = None, run_args: Any = None, inplace: bool = True, example_inputs: Any = None) → torch.nn.Module[source]

The main entry to quantize model with static mode.

Parameters:

model – a float model to be quantized.
quant_config – a quantization configuration.
run_fn – a calibration function for calibrating the model. Defaults to None.
run_args – positional arguments for run_fn. Defaults to None.
example_inputs – used to trace torch model.

Returns:

The quantized model.

neural_compressor.torch.quantization.quantize.prepare(model: torch.nn.Module, quant_config: neural_compressor.common.base_config.BaseConfig, inplace: bool = True, example_inputs: Any = None)[source]

Prepare the model for calibration.

Insert observers into the model so that it can monitor the input and output tensors during calibration.

Parameters:

model (torch.nn.Module) – origin model
quant_config (BaseConfig) – path to quantization config
inplace (bool, optional) – It will change the given model in-place if True.
example_inputs (tensor/tuple/dict, optional) – used to trace torch model.

Returns:

prepared and calibrated module.

neural_compressor.torch.quantization.quantize.convert(model: torch.nn.Module, quant_config: neural_compressor.common.base_config.BaseConfig = None, inplace: bool = True)[source]

Convert the prepared model to a quantized model.

Parameters:

model (torch.nn.Module) – torch model
quant_config (BaseConfig, optional) – path to quantization config, only required when model is not prepared.
inplace (bool, optional) – It will change the given model in-place if True.

Returns:

The quantized model.

neural_compressor.torch.quantization.quantize.finalize_calibration(model)[source]: Generate and save calibration info.