neural_compressor.transformers.utils.quantization_config
Intel Neural Compressor Transformers-like Config.
Classes
str(object='') -> str |
|
Mixin class for quantization config. |
|
Mixin class for quantization config. |
|
Mixin class for quantization config. |
|
Mixin class for quantization config. |
|
Mixin class for quantization config. |
|
Mixin class for quantization config. |
Module Contents
- class neural_compressor.transformers.utils.quantization_config.QuantizationMethod[source]
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
- class neural_compressor.transformers.utils.quantization_config.INCQuantizationConfigMixin[source]
Mixin class for quantization config.
- class neural_compressor.transformers.utils.quantization_config.RtnConfig(bits: int = 4, group_size: int = 32, compute_dtype: Any = None, scale_dtype: Any = None, sym: bool = True, use_layer_wise: bool = None, quant_lm_head: bool = False, **kwargs)[source]
Mixin class for quantization config.
- class neural_compressor.transformers.utils.quantization_config.GPTQConfig(bits: int = 4, tokenizer: Any = None, dataset: str = 'NeelNanda/pile-10k', batch_size: int = 8, group_size: int = 32, compute_dtype: Any = None, scale_dtype: Any = None, sym: bool = True, blocksize: int = 128, damp_percent: float = 0.1, desc_act: bool = False, n_samples: int = 128, seq_len: int = 2048, static_groups: bool = False, use_mse_search: bool = False, true_sequential: bool = False, use_layer_wise: bool = None, quant_lm_head: bool = False, **kwargs)[source]
Mixin class for quantization config.
- class neural_compressor.transformers.utils.quantization_config.AwqConfig(bits: int = 4, tokenizer: Any = None, dataset: str = 'NeelNanda/pile-10k', group_size: int = 32, compute_dtype: Any = None, weight_dtype: Any = None, scale_dtype: Any = None, use_layer_wise: bool = None, n_samples: int = 128, seq_len: int = 2048, auto_scale: bool = True, auto_clip: bool = True, zero_point: bool = True, absorb_layer_dict: dict = {}, quant_lm_head: bool = False, backend: str = None, **kwargs)[source]
Mixin class for quantization config.
- class neural_compressor.transformers.utils.quantization_config.TeqConfig(bits: int = 4, tokenizer: Any = None, dataset: str = 'NeelNanda/pile-10k', group_size: int = 32, compute_dtype: Any = None, weight_dtype: Any = None, scale_dtype: Any = None, use_layer_wise: bool = None, n_samples: int = 128, seq_len: int = 2048, sym: bool = True, absorb_layer_dict: dict = {}, quant_lm_head: bool = False, **kwargs)[source]
Mixin class for quantization config.
- class neural_compressor.transformers.utils.quantization_config.AutoRoundConfig(bits: int = 4, tokenizer: Any = None, dataset: str = 'NeelNanda/pile-10k', group_size: int = 128, compute_dtype: Any = None, weight_dtype: Any = None, scale_dtype: Any = None, sym: bool = False, lr: float = None, minmax_lr: float = None, disable_quanted_input: bool = True, n_samples: int = 128, seq_len: int = 2048, iters: int = 200, use_layer_wise: bool = None, quant_lm_head: bool = False, **kwargs)[source]
Mixin class for quantization config.