neural_compressor.torch.quantization.quantize
Intel Neural Compressor Pytorch quantization base API.
Functions
|
Check whether to apply this algorithm according to configs_mapping. |
|
Preprocess the quantization configuration. |
|
The main entry to quantize model with static mode. |
|
Prepare the model for calibration. |
|
Prepares a copy of the model for quantization calibration or |
|
Convert the prepared model to a quantized model. |
|
Generate and save calibration info. |
Module Contents
- neural_compressor.torch.quantization.quantize.need_apply(configs_mapping: Dict[Tuple[str, callable], neural_compressor.common.base_config.BaseConfig], algo_name)[source]
Check whether to apply this algorithm according to configs_mapping.
- Parameters:
configs_mapping (Dict[Tuple[str, callable], BaseConfig]) – configs mapping
algo_name (str) – algo name
- Returns:
True or False.
- Return type:
Bool
- neural_compressor.torch.quantization.quantize.preprocess_quant_config(model, quant_config, mode='prepare', example_inputs=None, run_fn=None)[source]
Preprocess the quantization configuration.
- Parameters:
model – a float model to be quantized.
quant_config – a quantization configuration.
mode (str, optional) – Which mode is in use currently. Defaults to “prepare”.
run_fn – a calibration function for calibrating the model. Defaults to None.
example_inputs – used to trace torch model.
- Returns:
model to be quantized. OrderedDictType[Union[str, str], OrderedDictType[str, BaseConfig]]: The configuration mapping.
- Return type:
model
- neural_compressor.torch.quantization.quantize.quantize(model: torch.nn.Module, quant_config: neural_compressor.common.base_config.BaseConfig, run_fn: Callable = None, run_args: Any = None, inplace: bool = True, example_inputs: Any = None) torch.nn.Module[source]
The main entry to quantize model with static mode.
- Parameters:
model – a float model to be quantized.
quant_config – a quantization configuration.
run_fn – a calibration function for calibrating the model. Defaults to None.
run_args – positional arguments for run_fn. Defaults to None.
example_inputs – used to trace torch model.
- Returns:
The quantized model.
- neural_compressor.torch.quantization.quantize.prepare(model: torch.nn.Module, quant_config: neural_compressor.common.base_config.BaseConfig, inplace: bool = True, example_inputs: Any = None)[source]
Prepare the model for calibration.
Insert observers into the model so that it can monitor the input and output tensors during calibration.
- Parameters:
model (torch.nn.Module) – origin model
quant_config (BaseConfig) – path to quantization config
inplace (bool, optional) – It will change the given model in-place if True.
example_inputs (tensor/tuple/dict, optional) – used to trace torch model.
- Returns:
prepared and calibrated module.
- neural_compressor.torch.quantization.quantize.prepare_qat(model: torch.nn.Module, mapping=None, inplace: bool = True)[source]
Prepares a copy of the model for quantization calibration or quantization-aware training and converts it to quantized version.
Quantization configuration should be assigned preemptively to individual submodules in .qconfig attribute.
- Parameters:
model – input model to be modified in-place
quant_config – quantization config that maps float modules to quantized modules to be replaced.
inplace – carry out model transformations in-place, the original module is mutated
- neural_compressor.torch.quantization.quantize.convert(model: torch.nn.Module, quant_config: neural_compressor.common.base_config.BaseConfig = None, inplace: bool = True, **kwargs)[source]
Convert the prepared model to a quantized model.
- Parameters:
model (torch.nn.Module) – torch model
quant_config (BaseConfig, optional) – path to quantization config, only required when model is not prepared.
inplace (bool, optional) – It will change the given model in-place if True.
- Returns:
The quantized model.