neural_compressor.torch.quantization.config
Module Contents
Classes
Config class for round-to-nearest weight-only quantization. |
|
Config class for GPTQ. |
|
The base config for all algorithm configs. |
Functions
|
Generate the default rtn config. |
|
Generate the default gptq config. |
|
Generate the default HQQ config. |
- class neural_compressor.torch.quantization.config.RTNConfig(dtype: str = 'int', bits: int = 4, use_sym: bool = True, group_size: int = 32, group_dim: int = 1, use_full_range: bool = False, use_mse_search: bool = False, export_compressed_model: bool = False, use_layer_wise: bool = False, model_path: str = '', use_double_quant: bool = False, double_quant_dtype: str = 'int', double_quant_bits: int = 8, double_quant_use_sym: bool = False, double_quant_group_size: int = 256, white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] | None = DEFAULT_WHITE_LIST)[source]
Config class for round-to-nearest weight-only quantization.
- neural_compressor.torch.quantization.config.get_default_rtn_config() RTNConfig [source]
Generate the default rtn config.
- Returns:
the default rtn config.
- class neural_compressor.torch.quantization.config.GPTQConfig(dtype: str = 'int', bits: int = 4, use_sym: bool = True, group_size: int = 32, use_mse_search: bool = False, export_compressed_model: bool = False, use_layer_wise: bool = False, model_path: str = '', use_double_quant: bool = False, double_quant_dtype: str = 'int', double_quant_bits: int = 8, double_quant_use_sym: bool = False, double_quant_group_size: int = 256, act_order: bool = False, percdamp: float = 0.01, block_size: int = 2048, static_groups: bool = False, white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] | None = DEFAULT_WHITE_LIST)[source]
Config class for GPTQ.
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. https://arxiv.org/abs/2210.17323
- neural_compressor.torch.quantization.config.get_default_gptq_config() GPTQConfig [source]
Generate the default gptq config.
- Returns:
the default gptq config.
- class neural_compressor.torch.quantization.config.HQQConfig(bits: int = 4, group_size: int = 64, quant_zero: bool = True, quant_scale: bool = False, scale_quant_group_size: int = 128, skip_lm_head: bool = True, white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] | None = DEFAULT_WHITE_LIST)[source]
The base config for all algorithm configs.