neural_compressor.onnxrt.quantization.config
Module Contents
Classes
Config class for round-to-nearest weight-only quantization. |
|
Config class for gptq weight-only quantization. |
|
Config class for awq weight-only quantization. |
|
Smooth quant quantization config. |
Functions
|
Generate the default rtn config. |
|
Generate the default gptq config. |
|
Generate the default awq config. |
|
Generate the default smooth quant config. |
Attributes
|
- class neural_compressor.onnxrt.quantization.config.RTNConfig(weight_dtype: str = 'int', weight_bits: int = 4, weight_group_size: int = 32, weight_sym: bool = True, act_dtype: str = 'fp32', accuracy_level: int = 0, providers: List[str] = ['CPUExecutionProvider'], layer_wise_quant: bool = False, white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] = DEFAULT_WHITE_LIST)[source]
Config class for round-to-nearest weight-only quantization.
- neural_compressor.onnxrt.quantization.config.get_default_rtn_config() RTNConfig [source]
Generate the default rtn config.
- Returns:
the default rtn config.
- class neural_compressor.onnxrt.quantization.config.GPTQConfig(weight_dtype: str = 'int', weight_bits: int = 4, weight_group_size: int = 32, weight_sym: bool = True, act_dtype: str = 'fp32', accuracy_level: int = 0, percdamp: float = 0.01, blocksize: int = 128, actorder: bool = False, mse: bool = False, perchannel: bool = True, providers: List[str] = ['CPUExecutionProvider'], layer_wise_quant: bool = False, white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] = DEFAULT_WHITE_LIST)[source]
Config class for gptq weight-only quantization.
- neural_compressor.onnxrt.quantization.config.get_default_gptq_config() GPTQConfig [source]
Generate the default gptq config.
- Returns:
the default gptq config.
- class neural_compressor.onnxrt.quantization.config.AWQConfig(weight_dtype: str = 'int', weight_bits: int = 4, weight_group_size: int = 32, weight_sym: bool = True, act_dtype: str = 'fp32', accuracy_level: int = 0, enable_auto_scale: bool = True, enable_mse_search: bool = True, providers: List[str] = ['CPUExecutionProvider'], white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] = DEFAULT_WHITE_LIST)[source]
Config class for awq weight-only quantization.
- neural_compressor.onnxrt.quantization.config.get_default_awq_config() AWQConfig [source]
Generate the default awq config.
- Returns:
the default awq config.
- class neural_compressor.onnxrt.quantization.config.SmoohQuantConfig(alpha: float = 0.5, folding: bool = True, op_types: List[str] = ['Gemm', 'Conv', 'MatMul', 'FusedConv'], calib_iter: int = 100, scales_per_op: bool = True, auto_alpha_args: dict = {'alpha_min': 0.3, 'alpha_max': 0.7, 'alpha_step': 0.05, 'attn_method': 'min'}, providers: List[str] = ['CPUExecutionProvider'], white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] = DEFAULT_WHITE_LIST, **kwargs)[source]
Smooth quant quantization config.
- neural_compressor.onnxrt.quantization.config.get_default_sq_config() SmoohQuantConfig [source]
Generate the default smooth quant config.
- Returns:
the default smooth quant config.