neural_compressor.onnxrt.quantization.config

Module Contents

Classes

RTNConfig

Config class for round-to-nearest weight-only quantization.

GPTQConfig

Config class for gptq weight-only quantization.

AWQConfig

Config class for awq weight-only quantization.

SmoohQuantConfig

Smooth quant quantization config.

Functions

get_default_rtn_config(→ RTNConfig)

Generate the default rtn config.

get_default_gptq_config(→ GPTQConfig)

Generate the default gptq config.

get_default_awq_config(→ AWQConfig)

Generate the default awq config.

get_default_sq_config(→ SmoohQuantConfig)

Generate the default smooth quant config.

Attributes

FRAMEWORK_NAME

class neural_compressor.onnxrt.quantization.config.RTNConfig(weight_dtype: str = 'int', weight_bits: int = 4, weight_group_size: int = 32, weight_sym: bool = True, act_dtype: str = 'fp32', accuracy_level: int = 0, providers: List[str] = ['CPUExecutionProvider'], layer_wise_quant: bool = False, white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] = DEFAULT_WHITE_LIST)[source]

Config class for round-to-nearest weight-only quantization.

neural_compressor.onnxrt.quantization.config.get_default_rtn_config() RTNConfig[source]

Generate the default rtn config.

Returns:

the default rtn config.

class neural_compressor.onnxrt.quantization.config.GPTQConfig(weight_dtype: str = 'int', weight_bits: int = 4, weight_group_size: int = 32, weight_sym: bool = True, act_dtype: str = 'fp32', accuracy_level: int = 0, percdamp: float = 0.01, blocksize: int = 128, actorder: bool = False, mse: bool = False, perchannel: bool = True, providers: List[str] = ['CPUExecutionProvider'], layer_wise_quant: bool = False, white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] = DEFAULT_WHITE_LIST)[source]

Config class for gptq weight-only quantization.

neural_compressor.onnxrt.quantization.config.get_default_gptq_config() GPTQConfig[source]

Generate the default gptq config.

Returns:

the default gptq config.

class neural_compressor.onnxrt.quantization.config.AWQConfig(weight_dtype: str = 'int', weight_bits: int = 4, weight_group_size: int = 32, weight_sym: bool = True, act_dtype: str = 'fp32', accuracy_level: int = 0, enable_auto_scale: bool = True, enable_mse_search: bool = True, providers: List[str] = ['CPUExecutionProvider'], white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] = DEFAULT_WHITE_LIST)[source]

Config class for awq weight-only quantization.

neural_compressor.onnxrt.quantization.config.get_default_awq_config() AWQConfig[source]

Generate the default awq config.

Returns:

the default awq config.

class neural_compressor.onnxrt.quantization.config.SmoohQuantConfig(alpha: float = 0.5, folding: bool = True, op_types: List[str] = ['Gemm', 'Conv', 'MatMul', 'FusedConv'], calib_iter: int = 100, scales_per_op: bool = True, auto_alpha_args: dict = {'alpha_min': 0.3, 'alpha_max': 0.7, 'alpha_step': 0.05, 'attn_method': 'min'}, providers: List[str] = ['CPUExecutionProvider'], white_list: List[neural_compressor.common.utils.OP_NAME_OR_MODULE_TYPE] = DEFAULT_WHITE_LIST, **kwargs)[source]

Smooth quant quantization config.

neural_compressor.onnxrt.quantization.config.get_default_sq_config() SmoohQuantConfig[source]

Generate the default smooth quant config.

Returns:

the default smooth quant config.