neural_compressor.torch.quantization.config
===========================================

.. py:module:: neural_compressor.torch.quantization.config

.. autoapi-nested-parse::

   Intel Neural Compressor Pytorch quantization config API.


Classes
-------

.. autoapisummary::

   neural_compressor.torch.quantization.config.OperatorConfig
   neural_compressor.torch.quantization.config.TorchBaseConfig
   neural_compressor.torch.quantization.config.RTNConfig
   neural_compressor.torch.quantization.config.GPTQConfig
   neural_compressor.torch.quantization.config.AWQConfig
   neural_compressor.torch.quantization.config.TEQConfig
   neural_compressor.torch.quantization.config.AutoRoundConfig
   neural_compressor.torch.quantization.config.MXQuantConfig
   neural_compressor.torch.quantization.config.DynamicQuantConfig
   neural_compressor.torch.quantization.config.StaticQuantConfig
   neural_compressor.torch.quantization.config.SmoothQuantConfig
   neural_compressor.torch.quantization.config.HQQConfig
   neural_compressor.torch.quantization.config.FP8Config
   neural_compressor.torch.quantization.config.MixedPrecisionConfig


Functions
---------

.. autoapisummary::

   neural_compressor.torch.quantization.config.get_default_rtn_config
   neural_compressor.torch.quantization.config.get_default_double_quant_config
   neural_compressor.torch.quantization.config.get_default_gptq_config
   neural_compressor.torch.quantization.config.get_default_awq_config
   neural_compressor.torch.quantization.config.get_default_teq_config
   neural_compressor.torch.quantization.config.get_default_AutoRound_config
   neural_compressor.torch.quantization.config.get_default_mx_config
   neural_compressor.torch.quantization.config.get_default_dynamic_config
   neural_compressor.torch.quantization.config.get_default_static_config
   neural_compressor.torch.quantization.config.get_default_sq_config
   neural_compressor.torch.quantization.config.get_default_hqq_config
   neural_compressor.torch.quantization.config.get_default_fp8_config
   neural_compressor.torch.quantization.config.get_default_fp8_config_set
   neural_compressor.torch.quantization.config.get_default_mixed_precision_config
   neural_compressor.torch.quantization.config.get_default_mixed_precision_config_set
   neural_compressor.torch.quantization.config.get_all_registered_configs
   neural_compressor.torch.quantization.config.get_woq_tuning_config


Module Contents
---------------

.. py:class:: OperatorConfig


   OperatorConfig.


.. py:class:: TorchBaseConfig(white_list: Optional[List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE]] = DEFAULT_WHITE_LIST)


   Base config class for torch backend.


.. py:class:: RTNConfig(dtype: str = 'int', bits: int = 4, use_sym: bool = True, group_size: int = 32, group_dim: int = 1, use_full_range: bool = False, use_mse_search: bool = False, use_layer_wise: bool = False, model_path: str = '', use_double_quant: bool = False, double_quant_dtype: str = 'int', double_quant_bits: int = 8, double_quant_use_sym: bool = False, double_quant_group_size: int = 256, quant_lm_head: bool = False, white_list: Optional[List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE]] = DEFAULT_WHITE_LIST)


   Config class for round-to-nearest weight-only quantization.


.. py:function:: get_default_rtn_config(processor_type: Optional[Union[str, neural_compressor.torch.utils.ProcessorType]] = None) -> RTNConfig

   Get the default configuration of RTN.

   :param processor_type: The user-specified processor type.
                          Defaults to None.
   :type processor_type: Optional[Union[str, torch_utils.ProcessorType]], optional

   :returns: RTNConfig
   :rtype: RTNConfig


.. py:function:: get_default_double_quant_config(type='BNB_NF4')

   Get the default configuration of double quant.

   :param type: double quant type. Defaults to "BNB_NF4".
   :type type: str, optional

   :returns: double quant config.
   :rtype: dict


.. py:class:: GPTQConfig(dtype: str = 'int', bits: int = 4, use_sym: bool = True, group_size: int = 32, use_mse_search: bool = False, use_layer_wise: bool = False, model_path: str = '', use_double_quant: bool = False, double_quant_dtype: str = 'int', double_quant_bits: int = 8, double_quant_use_sym: bool = False, double_quant_group_size: int = 256, quant_lm_head: bool = False, act_order: bool = False, percdamp: float = 0.01, block_size: int = 2048, static_groups: bool = False, white_list: Optional[List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE]] = DEFAULT_WHITE_LIST)


   Config class for GPTQ.

   GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers.
   https://arxiv.org/abs/2210.17323


.. py:function:: get_default_gptq_config(processor_type: Optional[Union[str, neural_compressor.torch.utils.ProcessorType]] = None) -> GPTQConfig

   Get the default configuration of GPTQ.

   :param processor_type: The user-specified processor type.
                          Defaults to None.
   :type processor_type: Optional[Union[str, torch_utils.ProcessorType]], optional

   :returns: GPTQConfig
   :rtype: GPTQConfig


.. py:class:: AWQConfig(dtype: str = 'int', bits: int = 4, use_sym: bool = True, group_size: int = 32, group_dim: int = 1, use_full_range: bool = False, use_mse_search: bool = False, use_layer_wise: bool = False, model_path: str = '', use_double_quant: bool = False, double_quant_dtype: str = 'int', double_quant_bits: int = 8, double_quant_use_sym: bool = True, double_quant_group_size: int = 256, quant_lm_head: bool = False, use_auto_scale: bool = True, use_auto_clip: bool = True, folding: bool = False, white_list: Optional[List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE]] = DEFAULT_WHITE_LIST, absorb_layer_dict: dict = {})


   Config class for AWQ.

   AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration.
   https://arxiv.org/abs/2306.00978


.. py:function:: get_default_awq_config() -> AWQConfig

   Generate the default awq config.

   :returns: the default awq config.


.. py:class:: TEQConfig(dtype: str = 'int', bits: int = 4, use_sym: bool = True, group_size: int = 32, group_dim: int = 1, use_full_range: bool = False, use_mse_search: bool = False, use_layer_wise: bool = False, use_double_quant: bool = False, double_quant_dtype: str = 'int', double_quant_bits: int = 8, double_quant_use_sym: bool = True, double_quant_group_size: int = 256, quant_lm_head: bool = False, absorb_to_layer: dict = {}, folding: bool = True, white_list: Optional[List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE]] = DEFAULT_WHITE_LIST)


   Config class for TEQ.

   TEQ: Activation-aware Weight Quantization for LLM Compression and Acceleration.
   https://arxiv.org/abs/2306.00978


.. py:function:: get_default_teq_config() -> TEQConfig

   Generate the default teq config.

   :returns: the default teq config.


.. py:class:: AutoRoundConfig(dtype: str = 'int', bits: int = 4, use_sym: bool = False, group_size: int = 128, act_bits: int = 32, act_group_size: int = None, act_sym: bool = None, act_dynamic: bool = True, enable_full_range: bool = False, batch_size: int = 8, lr_scheduler=None, enable_quanted_input: bool = True, enable_minmax_tuning: bool = True, lr: float = None, minmax_lr: float = None, low_gpu_mem_usage: bool = False, iters: int = 200, seqlen: int = 2048, nsamples: int = 128, sampler: str = 'rand', seed: int = 42, nblocks: int = 1, gradient_accumulate_steps: int = 1, not_use_best_mse: bool = False, dynamic_max_gap: int = -1, scale_dtype: str = 'fp16', use_layer_wise: bool = False, quant_block_list: list = None, white_list: Optional[List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE]] = DEFAULT_WHITE_LIST)


   Config class for AUTOROUND.

   AUTOROUND: Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs.
   https://arxiv.org/abs/2309.05516
   code: https://github.com/intel/auto-round


.. py:function:: get_default_AutoRound_config(processor_type: Optional[Union[str, neural_compressor.torch.utils.ProcessorType]] = None) -> RTNConfig

   Get the default configuration of AutoRound.

   :param processor_type: The user-specified processor type.
                          Defaults to None.
   :type processor_type: Optional[Union[str, torch_utils.ProcessorType]], optional

   :returns: AutoRoundConfig
   :rtype: AutoRoundConfig


.. py:class:: MXQuantConfig(w_dtype: str = 'int8', act_dtype: str = 'int8', out_dtype: str = 'bfloat16', blocksize: int = 32, round_method: str = 'nearest', weight_only: bool = False, white_list: Optional[List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE]] = DEFAULT_WHITE_LIST)


   Config class for MX quantization.


.. py:function:: get_default_mx_config() -> MXQuantConfig

   Generate the default mx config.

   :returns: the default rtn config.


.. py:class:: DynamicQuantConfig(w_dtype: str = 'int8', w_sym: bool = True, w_granularity: str = 'per_tensor', w_algo: str = 'minmax', act_dtype: str = 'uint8', act_sym: bool = False, act_granularity: str = 'per_tensor', act_algo: str = 'kl', white_list: Optional[List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE]] = DEFAULT_WHITE_LIST)


   Config class for dynamic quantization.


.. py:function:: get_default_dynamic_config() -> DynamicQuantConfig

   Generate the default dynamic quant config.

   :returns: the default dynamic quant config.


.. py:class:: StaticQuantConfig(w_dtype: str = 'int8', w_sym: bool = True, w_granularity: str = 'per_channel', w_algo: str = 'minmax', act_dtype: str = 'uint8', act_sym: bool = False, act_granularity: str = 'per_tensor', act_algo: str = 'minmax', excluded_precisions: list = [], white_list: Optional[List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE]] = DEFAULT_WHITE_LIST, model_info: Optional[List[Tuple[str, Callable]]] = None)


   Config class for static quantization.


.. py:function:: get_default_static_config() -> StaticQuantConfig

   Generate the default static quant config.

   :returns: the default static quant config.


.. py:class:: SmoothQuantConfig(w_dtype: str = 'int8', w_sym: bool = True, w_granularity: str = 'per_channel', w_algo: str = 'minmax', act_dtype: str = 'uint8', act_sym: bool = False, act_granularity: str = 'per_tensor', act_algo: str = 'minmax', excluded_precisions: list = [], alpha: float = 0.5, folding: bool = False, scale_sharing: bool = False, init_alpha: float = 0.5, alpha_min: float = 0.0, alpha_max: float = 1.0, alpha_step: float = 0.1, shared_criterion: str = 'max', do_blockwise: bool = False, auto_alpha_args: dict = None, white_list: Optional[List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE]] = DEFAULT_WHITE_LIST)


   Config class for smooth quantization.


.. py:function:: get_default_sq_config() -> SmoothQuantConfig

   Generate the default smoothquant config.

   :returns: the default smoothquant config.


.. py:class:: HQQConfig(dtype: str = 'int', bits: int = 4, group_size: int = 64, quant_zero: bool = True, quant_scale: bool = False, scale_quant_group_size: int = 128, quant_lm_head: bool = False, white_list: Optional[List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE]] = DEFAULT_WHITE_LIST)


   Configuration class for Half-Quadratic Quantization (HQQ).

   HQQ is a quantization algorithm that reduces the precision of weights and activations in neural networks.
   For more details, refer to the blog: https://mobiusml.github.io/hqq_blog/
   and the code: https://github.com/mobiusml/hqq


.. py:function:: get_default_hqq_config() -> HQQConfig

   Generate the default HQQ config.

   :returns: the default HQQ config.


.. py:class:: FP8Config(dump_stats_path: str = './hqt_output/measure', fp8_config: str = 'E4M3', hp_dtype: str = 'bf16', blocklist: dict = {'names': [], 'types': ()}, allowlist: dict = {'names': [], 'types': FP8_WHITE_LIST}, mode: str = 'AUTO', scale_method: str = 'maxabs_hw', scale_params: dict = {}, observer: str = 'maxabs', mod_dict: dict = {}, measure_exclude: str = 'OUTPUT', **kwargs)


   Config class for FP8 quantization.


.. py:function:: get_default_fp8_config() -> FP8Config

   Generate the default fp8 config.

   :returns: the default fp8 config.


.. py:function:: get_default_fp8_config_set() -> FP8Config

   Generate the default fp8 config set.

   :returns: the default fp8 config.


.. py:class:: MixedPrecisionConfig(dtype: Union[str, List[str]] = 'fp16', white_list: Optional[List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE]] = DEFAULT_WHITE_LIST)


   Config class for mixed-precision.


.. py:function:: get_default_mixed_precision_config() -> MixedPrecisionConfig

   Generate the default mixed-precision config.

   :returns: the default mixed-precision config.


.. py:function:: get_default_mixed_precision_config_set() -> MixedPrecisionConfig

   Generate the default mixed-precision config set.

   :returns: the default mixed-precision config.


.. py:function:: get_all_registered_configs() -> Dict[str, neural_compressor.common.base_config.BaseConfig]

   Get all registered configs.


.. py:function:: get_woq_tuning_config() -> list

   Generate the config set for WOQ tuning.

   :returns: the list of WOQ quant config.