:orphan:

:py:mod:`neural_compressor.torch.algorithms.weight_only.autoround`
==================================================================

.. py:module:: neural_compressor.torch.algorithms.weight_only.autoround


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   neural_compressor.torch.algorithms.weight_only.autoround.AutoRoundQuantizer


Functions
~~~~~~~~~

.. autoapisummary::

   neural_compressor.torch.algorithms.weight_only.autoround.get_autoround_default_run_fn


.. py:class:: AutoRoundQuantizer(quant_config: dict = None, enable_full_range: bool = False, batch_size: int = 8, amp: bool = True, device=None, lr_scheduler=None, use_quant_input: bool = True, enable_minmax_tuning: bool = True, lr: float = None, minmax_lr: float = None, low_gpu_mem_usage: bool = True, iters: int = 200, seqlen: int = 2048, n_samples: int = 512, sampler: str = 'rand', seed: int = 42, n_blocks: int = 1, gradient_accumulate_steps: int = 1, not_use_best_mse: bool = False, dynamic_max_gap: int = -1, scale_dtype='fp32')


   The base quantizer for all algorithm quantizers.

   The `Quantizer` unifies the interfaces across various quantization algorithms, including GPTQ, RTN, etc.
   Given a float model, `Quantizer` apply the quantization algorithm to the model according to the `quant_config`.

   To implement a new quantization algorithm,, inherit from `Quantizer` and implement the following methods:
       - `prepare`: prepare a given model for convert.
       - `convert`: convert a prepared model to a quantized model.
   Note: `quantize` and `execute` are optional for new quantization algorithms.


.. py:function:: get_autoround_default_run_fn(model, tokenizer, dataset_name='NeelNanda/pile-10k', n_samples=512, seqlen=2048, seed=42, bs=8, dataset_split: str = 'train', dataloader=None)

   Perform calibration for quantization.

   This method calibrates the model for quantization by processing a specified
   number of samples from the calibration dataset. It ensures that the data is
   properly formatted and feeds it to the model. If the number of samples processed
   is less than the specified number, it logs a warning. If no samples are processed,
   it logs an error and exits.

   :param n_samples: The number of samples to use for calibration.
   :type n_samples: int