neural_compressor.transformers.utils.quantization_config

Intel Neural Compressor Transformers-like Config.

Classes

QuantizationMethod

str(object='') -> str

INCQuantizationConfigMixin

Mixin class for quantization config.

RtnConfig

Mixin class for quantization config.

GPTQConfig

Mixin class for quantization config.

AwqConfig

Mixin class for quantization config.

TeqConfig

Mixin class for quantization config.

AutoRoundConfig

Mixin class for quantization config.

Module Contents

class neural_compressor.transformers.utils.quantization_config.QuantizationMethod[source]

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

class neural_compressor.transformers.utils.quantization_config.INCQuantizationConfigMixin[source]

Mixin class for quantization config.

class neural_compressor.transformers.utils.quantization_config.RtnConfig(bits: int = 4, group_size: int = 32, compute_dtype: Any = None, scale_dtype: Any = None, sym: bool = True, use_layer_wise: bool = None, quant_lm_head: bool = False, **kwargs)[source]

Mixin class for quantization config.

class neural_compressor.transformers.utils.quantization_config.GPTQConfig(bits: int = 4, tokenizer: Any = None, dataset: str = 'NeelNanda/pile-10k', batch_size: int = 8, group_size: int = 32, compute_dtype: Any = None, scale_dtype: Any = None, sym: bool = True, blocksize: int = 128, damp_percent: float = 0.1, desc_act: bool = False, n_samples: int = 128, seq_len: int = 2048, static_groups: bool = False, use_mse_search: bool = False, true_sequential: bool = False, use_layer_wise: bool = None, quant_lm_head: bool = False, **kwargs)[source]

Mixin class for quantization config.

class neural_compressor.transformers.utils.quantization_config.AwqConfig(bits: int = 4, tokenizer: Any = None, dataset: str = 'NeelNanda/pile-10k', group_size: int = 32, compute_dtype: Any = None, weight_dtype: Any = None, scale_dtype: Any = None, use_layer_wise: bool = None, n_samples: int = 128, seq_len: int = 2048, auto_scale: bool = True, auto_clip: bool = True, zero_point: bool = True, absorb_layer_dict: dict = {}, quant_lm_head: bool = False, backend: str = None, **kwargs)[source]

Mixin class for quantization config.

class neural_compressor.transformers.utils.quantization_config.TeqConfig(bits: int = 4, tokenizer: Any = None, dataset: str = 'NeelNanda/pile-10k', group_size: int = 32, compute_dtype: Any = None, weight_dtype: Any = None, scale_dtype: Any = None, use_layer_wise: bool = None, n_samples: int = 128, seq_len: int = 2048, sym: bool = True, absorb_layer_dict: dict = {}, quant_lm_head: bool = False, **kwargs)[source]

Mixin class for quantization config.

class neural_compressor.transformers.utils.quantization_config.AutoRoundConfig(bits: int = 4, tokenizer: Any = None, dataset: str = 'NeelNanda/pile-10k', group_size: int = 128, compute_dtype: Any = None, weight_dtype: Any = None, scale_dtype: Any = None, sym: bool = False, lr: float = None, minmax_lr: float = None, disable_quanted_input: bool = True, n_samples: int = 128, seq_len: int = 2048, iters: int = 200, use_layer_wise: bool = None, quant_lm_head: bool = False, **kwargs)[source]

Mixin class for quantization config.