neural_compressor.torch.quantization.algorithm_entry
Intel Neural Compressor PyTorch supported algorithm entries.
Functions
|
The main entry to apply rtn quantization. |
|
The main entry to apply gptq quantization. |
|
The main entry to apply static quantization, includes pt2e quantization and ipex quantization. |
|
The main entry to apply pt2e dynamic quantization. |
|
The main entry to apply pt2e static quantization. |
|
The main entry to apply smooth quantization. |
|
The main entry to apply AWQ quantization. |
|
The main entry to apply TEQ quantization. |
|
The main entry to apply AutoRound quantization. |
|
The main entry to apply AutoRound quantization. |
|
The main entry to apply fp8 quantization. |
|
The main entry to apply AutoRound quantization. |
|
The main entry to apply Mixed Precision. |
Module Contents
- neural_compressor.torch.quantization.algorithm_entry.rtn_entry(model: torch.nn.Module, configs_mapping: Dict[Tuple[str, callable], neural_compressor.torch.quantization.RTNConfig], mode: neural_compressor.common.utils.Mode = Mode.QUANTIZE, *args, **kwargs) torch.nn.Module [source]
The main entry to apply rtn quantization.
- Parameters:
- Returns:
prepared model or quantized model.
- Return type:
torch.nn.Module
- neural_compressor.torch.quantization.algorithm_entry.gptq_entry(model: torch.nn.Module, configs_mapping: Dict[Tuple[str, callable], neural_compressor.torch.quantization.GPTQConfig], mode: neural_compressor.common.utils.Mode = Mode.QUANTIZE, *args, **kwargs) torch.nn.Module [source]
The main entry to apply gptq quantization.
- Parameters:
model (torch.nn.Module) – raw fp32 model or prepared model.
configs_mapping (Dict[Tuple[str, callable], GPTQConfig]) – per-op configuration.
mode (Mode, optional) – select from [PREPARE, CONVERT and QUANTIZE]. Defaults to Mode.QUANTIZE.
- Returns:
prepared model or quantized model.
- Return type:
torch.nn.Module
- neural_compressor.torch.quantization.algorithm_entry.static_quant_entry(model: torch.nn.Module, configs_mapping: Dict[Tuple[str, callable], neural_compressor.torch.quantization.StaticQuantConfig], mode: neural_compressor.common.utils.Mode = Mode.QUANTIZE, *args, **kwargs) torch.nn.Module [source]
The main entry to apply static quantization, includes pt2e quantization and ipex quantization.
- Parameters:
model (torch.nn.Module) – raw fp32 model or prepared model.
configs_mapping (Dict[Tuple[str, callable], StaticQuantConfig]) – per-op configuration.
mode (Mode, optional) – select from [PREPARE, CONVERT and QUANTIZE]. Defaults to Mode.QUANTIZE.
- Returns:
prepared model or quantized model.
- Return type:
torch.nn.Module
- neural_compressor.torch.quantization.algorithm_entry.pt2e_dynamic_quant_entry(model: torch.nn.Module, configs_mapping, mode: neural_compressor.common.utils.Mode, *args, **kwargs) torch.nn.Module [source]
The main entry to apply pt2e dynamic quantization.
- Parameters:
model (torch.nn.Module) – raw fp32 model or prepared model.
configs_mapping – per-op configuration.
mode (Mode, optional) – select from [PREPARE, CONVERT and QUANTIZE]. Defaults to Mode.QUANTIZE.
- Returns:
prepared model or quantized model.
- Return type:
torch.nn.Module
- neural_compressor.torch.quantization.algorithm_entry.pt2e_static_quant_entry(model: torch.nn.Module, configs_mapping, mode: neural_compressor.common.utils.Mode, *args, **kwargs) torch.nn.Module [source]
The main entry to apply pt2e static quantization.
- Parameters:
model (torch.nn.Module) – raw fp32 model or prepared model.
configs_mapping – per-op configuration.
mode (Mode, optional) – select from [PREPARE, CONVERT and QUANTIZE]. Defaults to Mode.QUANTIZE.
- Returns:
prepared model or quantized model.
- Return type:
torch.nn.Module
- neural_compressor.torch.quantization.algorithm_entry.smooth_quant_entry(model: torch.nn.Module, configs_mapping: Dict[Tuple[str, callable], neural_compressor.torch.quantization.SmoothQuantConfig], mode: neural_compressor.common.utils.Mode = Mode.QUANTIZE, *args, **kwargs) torch.nn.Module [source]
The main entry to apply smooth quantization.
- Parameters:
model (torch.nn.Module) – raw fp32 model or prepared model.
configs_mapping (Dict[Tuple[str, callable], SmoothQuantConfig]) – per-op configuration.
mode (Mode, optional) – select from [PREPARE, CONVERT and QUANTIZE]. Defaults to Mode.QUANTIZE.
- Returns:
prepared model or quantized model.
- Return type:
torch.nn.Module
- neural_compressor.torch.quantization.algorithm_entry.awq_quantize_entry(model: torch.nn.Module, configs_mapping: Dict[Tuple[str, callable], neural_compressor.torch.quantization.AWQConfig], mode: neural_compressor.common.utils.Mode = Mode.QUANTIZE, *args, **kwargs) torch.nn.Module [source]
The main entry to apply AWQ quantization.
- Parameters:
- Returns:
prepared model or quantized model.
- Return type:
torch.nn.Module
- neural_compressor.torch.quantization.algorithm_entry.teq_quantize_entry(model: torch.nn.Module, configs_mapping: Dict[Tuple[str, callable], neural_compressor.torch.quantization.TEQConfig], mode: neural_compressor.common.utils.Mode, *args, **kwargs) torch.nn.Module [source]
The main entry to apply TEQ quantization.
- Parameters:
- Returns:
prepared model or quantized model.
- Return type:
torch.nn.Module
- neural_compressor.torch.quantization.algorithm_entry.autoround_quantize_entry(model: torch.nn.Module, configs_mapping: Dict[Tuple[str, callable], neural_compressor.torch.quantization.AutoRoundConfig], mode: neural_compressor.common.utils.Mode = Mode.QUANTIZE, *args, **kwargs) torch.nn.Module [source]
The main entry to apply AutoRound quantization.
- Parameters:
model (torch.nn.Module) – raw fp32 model or prepared model.
configs_mapping (Dict[Tuple[str, callable], AutoRoundConfig]) – per-op configuration.
mode (Mode, optional) – select from [PREPARE, CONVERT and QUANTIZE]. Defaults to Mode.QUANTIZE.
- Returns:
prepared model or quantized model.
- Return type:
torch.nn.Module
- neural_compressor.torch.quantization.algorithm_entry.hqq_entry(model: torch.nn.Module, configs_mapping: Dict[Tuple[str, Callable], neural_compressor.torch.quantization.HQQConfig], mode: neural_compressor.common.utils.Mode = Mode.QUANTIZE, *args, **kwargs) torch.nn.Module [source]
The main entry to apply AutoRound quantization.
- Parameters:
model (torch.nn.Module) – raw fp32 model or prepared model.
configs_mapping (Dict[Tuple[str, callable], AutoRoundConfig]) – per-op configuration.
mode (Mode, optional) – select from [PREPARE, CONVERT and QUANTIZE]. Defaults to Mode.QUANTIZE.
- Returns:
prepared model or quantized model.
- Return type:
torch.nn.Module
- neural_compressor.torch.quantization.algorithm_entry.fp8_entry(model: torch.nn.Module, configs_mapping: Dict[Tuple[str], neural_compressor.torch.quantization.FP8Config], mode: neural_compressor.common.utils.Mode = Mode.QUANTIZE, *args, **kwargs) torch.nn.Module [source]
The main entry to apply fp8 quantization.
- neural_compressor.torch.quantization.algorithm_entry.mx_quant_entry(model: torch.nn.Module, configs_mapping: Dict[Tuple[str, callable], neural_compressor.torch.quantization.MXQuantConfig], mode: neural_compressor.common.utils.Mode = Mode.QUANTIZE, *args, **kwargs) torch.nn.Module [source]
The main entry to apply AutoRound quantization.
- Parameters:
model (torch.nn.Module) – raw fp32 model or prepared model.
configs_mapping (Dict[Tuple[str, callable], AutoRoundConfig]) – per-op configuration.
mode (Mode, optional) – select from [PREPARE, CONVERT and QUANTIZE]. Defaults to Mode.QUANTIZE.
- Returns:
prepared model or quantized model.
- Return type:
torch.nn.Module
- neural_compressor.torch.quantization.algorithm_entry.mixed_precision_entry(model: torch.nn.Module, configs_mapping: Dict[Tuple[str], neural_compressor.torch.quantization.MixedPrecisionConfig], *args, **kwargs) torch.nn.Module [source]
The main entry to apply Mixed Precision.
- Parameters:
model (torch.nn.Module) – raw fp32 model or prepared model.
configs_mapping (Dict[Tuple[str, callable], MixPrecisionConfig]) – per-op configuration.
mode (Mode, optional) – select from [PREPARE, CONVERT and QUANTIZE]. Defaults to Mode.QUANTIZE.
- Returns:
prepared model or quantized model.
- Return type:
torch.nn.Module