`neural_compressor.torch.algorithms.weight_only.autoround`

Module Contents

Classes

AutoRoundQuantizer

The base quantizer for all algorithm quantizers.

Functions

get_autoround_default_run_fn(model, tokenizer[, ...])

Perform calibration for quantization.

class neural_compressor.torch.algorithms.weight_only.autoround.AutoRoundQuantizer(weight_config: dict = {}, enable_full_range: bool = False, batch_size: int = 8, amp: bool = True, device=None, lr_scheduler=None, use_quant_input: bool = True, enable_minmax_tuning: bool = True, lr: float = None, minmax_lr: float = None, low_gpu_mem_usage: bool = True, iters: int = 200, seqlen: int = 2048, n_samples: int = 512, sampler: str = 'rand', seed: int = 42, n_blocks: int = 1, gradient_accumulate_steps: int = 1, not_use_best_mse: bool = False, dynamic_max_gap: int = -1, scale_dtype='fp32')[source]

The base quantizer for all algorithm quantizers.

The Quantizer unifies the interfaces across various quantization algorithms, including GPTQ, RTN, etc. Given a float model, Quantizer apply the quantization algorithm to the model according to the quant_config.

To implement a new quantization algorithm,, inherit from Quantizer and implement the following methods:

prepare: prepare a given model for convert.
convert: convert a prepared model to a quantized model.

Note: quantize and execute are optional for new quantization algorithms.

neural_compressor.torch.algorithms.weight_only.autoround.get_autoround_default_run_fn(model, tokenizer, dataset_name='NeelNanda/pile-10k', n_samples=512, seqlen=2048, seed=42, bs=8, dataset_split: str = 'train', dataloader=None)[source]

Perform calibration for quantization.

This method calibrates the model for quantization by processing a specified number of samples from the calibration dataset. It ensures that the data is properly formatted and feeds it to the model. If the number of samples processed is less than the specified number, it logs a warning. If no samples are processed, it logs an error and exits.

Parameters:: n_samples (int) – The number of samples to use for calibration.

neural_compressor.torch.algorithms.weight_only.autoround

Module Contents

Classes

Functions

`neural_compressor.torch.algorithms.weight_only.autoround`