neural_compressor.torch.quantization.config

Module Contents

Classes

RTNConfig

Config class for round-to-nearest weight-only quantization.

GPTQConfig

Config class for GPTQ.

HQQConfig

The base config for all algorithm configs.

Functions

get_default_rtn_config(→ RTNConfig)

Generate the default rtn config.

get_default_gptq_config(→ GPTQConfig)

Generate the default gptq config.

get_default_hqq_config(→ HQQConfig)

Generate the default HQQ config.

class neural_compressor.torch.quantization.config.RTNConfig(dtype: str = 'int', bits: int = 4, use_sym: bool = True, group_size: int = 32, group_dim: int = 1, use_full_range: bool = False, use_mse_search: bool = False, export_compressed_model: bool = False, use_layer_wise: bool = False, model_path: str = '', use_double_quant: bool = False, double_quant_dtype: str = 'int', double_quant_bits: int = 8, double_quant_use_sym: bool = False, double_quant_group_size: int = 256, white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] | None = DEFAULT_WHITE_LIST)[source]

Config class for round-to-nearest weight-only quantization.

neural_compressor.torch.quantization.config.get_default_rtn_config() RTNConfig[source]

Generate the default rtn config.

Returns:

the default rtn config.

class neural_compressor.torch.quantization.config.GPTQConfig(dtype: str = 'int', bits: int = 4, use_sym: bool = True, group_size: int = 32, use_mse_search: bool = False, export_compressed_model: bool = False, use_layer_wise: bool = False, model_path: str = '', use_double_quant: bool = False, double_quant_dtype: str = 'int', double_quant_bits: int = 8, double_quant_use_sym: bool = False, double_quant_group_size: int = 256, act_order: bool = False, percdamp: float = 0.01, block_size: int = 2048, static_groups: bool = False, white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] | None = DEFAULT_WHITE_LIST)[source]

Config class for GPTQ.

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. https://arxiv.org/abs/2210.17323

neural_compressor.torch.quantization.config.get_default_gptq_config() GPTQConfig[source]

Generate the default gptq config.

Returns:

the default gptq config.

class neural_compressor.torch.quantization.config.HQQConfig(bits: int = 4, group_size: int = 64, quant_zero: bool = True, quant_scale: bool = False, scale_quant_group_size: int = 128, skip_lm_head: bool = True, white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] | None = DEFAULT_WHITE_LIST)[source]

The base config for all algorithm configs.

neural_compressor.torch.quantization.config.get_default_hqq_config() HQQConfig[source]

Generate the default HQQ config.

Returns:

the default HQQ config.