`neural_compressor.onnxrt.quantization.config`

Module Contents

`RTNConfig`	Config class for round-to-nearest weight-only quantization.
`GPTQConfig`	Config class for gptq weight-only quantization.
`AWQConfig`	Config class for awq weight-only quantization.
`SmoohQuantConfig`	Smooth quant quantization config.

`get_default_rtn_config`(→ RTNConfig)	Generate the default rtn config.
`get_default_gptq_config`(→ GPTQConfig)	Generate the default gptq config.
`get_default_awq_config`(→ AWQConfig)	Generate the default awq config.
`get_default_sq_config`(→ SmoohQuantConfig)	Generate the default smooth quant config.

FRAMEWORK_NAME

class neural_compressor.onnxrt.quantization.config.RTNConfig(weight_dtype: str = 'int', weight_bits: int = 4, weight_group_size: int = 32, weight_sym: bool = True, act_dtype: str = 'fp32', accuracy_level: int = 0, providers: List[str] = ['CPUExecutionProvider'], layer_wise_quant: bool = False, white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] = DEFAULT_WHITE_LIST)[source]: Config class for round-to-nearest weight-only quantization.

neural_compressor.onnxrt.quantization.config.get_default_rtn_config() → RTNConfig[source]

Generate the default rtn config.

class neural_compressor.onnxrt.quantization.config.GPTQConfig(weight_dtype: str = 'int', weight_bits: int = 4, weight_group_size: int = 32, weight_sym: bool = True, act_dtype: str = 'fp32', accuracy_level: int = 0, percdamp: float = 0.01, blocksize: int = 128, actorder: bool = False, mse: bool = False, perchannel: bool = True, providers: List[str] = ['CPUExecutionProvider'], layer_wise_quant: bool = False, white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] = DEFAULT_WHITE_LIST)[source]: Config class for gptq weight-only quantization.

neural_compressor.onnxrt.quantization.config.get_default_gptq_config() → GPTQConfig[source]

Generate the default gptq config.

class neural_compressor.onnxrt.quantization.config.AWQConfig(weight_dtype: str = 'int', weight_bits: int = 4, weight_group_size: int = 32, weight_sym: bool = True, act_dtype: str = 'fp32', accuracy_level: int = 0, enable_auto_scale: bool = True, enable_mse_search: bool = True, providers: List[str] = ['CPUExecutionProvider'], white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] = DEFAULT_WHITE_LIST)[source]: Config class for awq weight-only quantization.

neural_compressor.onnxrt.quantization.config.get_default_awq_config() → AWQConfig[source]

Generate the default awq config.

class neural_compressor.onnxrt.quantization.config.SmoohQuantConfig(alpha: float = 0.5, folding: bool = True, op_types: List[str] = ['Gemm', 'Conv', 'MatMul', 'FusedConv'], calib_iter: int = 100, scales_per_op: bool = True, auto_alpha_args: dict = {'alpha_min': 0.3, 'alpha_max': 0.7, 'alpha_step': 0.05, 'attn_method': 'min'}, providers: List[str] = ['CPUExecutionProvider'], white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] = DEFAULT_WHITE_LIST, **kwargs)[source]: Smooth quant quantization config.

neural_compressor.onnxrt.quantization.config.get_default_sq_config() → SmoohQuantConfig[source]

Generate the default smooth quant config.