:orphan:

:py:mod:`neural_compressor.torch.algorithms.weight_only.utility`
================================================================

.. py:module:: neural_compressor.torch.algorithms.weight_only.utility


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   neural_compressor.torch.algorithms.weight_only.utility.GraphTrace


Functions
~~~~~~~~~

.. autoapisummary::

   neural_compressor.torch.algorithms.weight_only.utility.quantize_4bit
   neural_compressor.torch.algorithms.weight_only.utility.qdq_weight_asym
   neural_compressor.torch.algorithms.weight_only.utility.qdq_weight_sym
   neural_compressor.torch.algorithms.weight_only.utility.qdq_weight_actor
   neural_compressor.torch.algorithms.weight_only.utility.quant_tensor
   neural_compressor.torch.algorithms.weight_only.utility.search_clip
   neural_compressor.torch.algorithms.weight_only.utility.quant_weight_w_scale
   neural_compressor.torch.algorithms.weight_only.utility.model_forward
   neural_compressor.torch.algorithms.weight_only.utility.forward_wrapper
   neural_compressor.torch.algorithms.weight_only.utility.move_input_to_device
   neural_compressor.torch.algorithms.weight_only.utility.set_module
   neural_compressor.torch.algorithms.weight_only.utility.fetch_module
   neural_compressor.torch.algorithms.weight_only.utility.get_absorb_layers
   neural_compressor.torch.algorithms.weight_only.utility.get_parent
   neural_compressor.torch.algorithms.weight_only.utility.get_module
   neural_compressor.torch.algorithms.weight_only.utility.get_block_prefix
   neural_compressor.torch.algorithms.weight_only.utility.get_example_input
   neural_compressor.torch.algorithms.weight_only.utility.replace_forward
   neural_compressor.torch.algorithms.weight_only.utility.recover_forward
   neural_compressor.torch.algorithms.weight_only.utility.get_module_input_output


Attributes
~~~~~~~~~~

.. autoapisummary::

   neural_compressor.torch.algorithms.weight_only.utility.NF4
   neural_compressor.torch.algorithms.weight_only.utility.FP4_BNB
   neural_compressor.torch.algorithms.weight_only.utility.FP4_E2M1
   neural_compressor.torch.algorithms.weight_only.utility.NF4_BIT
   neural_compressor.torch.algorithms.weight_only.utility.FP4_BNB_BIT
   neural_compressor.torch.algorithms.weight_only.utility.FP4_E2M1_BIT
   neural_compressor.torch.algorithms.weight_only.utility.FLOAT_MAPPING
   neural_compressor.torch.algorithms.weight_only.utility.INT_MAPPING


.. py:function:: quantize_4bit(tensor, quantile=1.0, dtype='nf4', return_int=False, **kwargs)

   Quantize tensor to NF4/FP4 data type.

   :param tensor: input tensor
   :param quantile: percentile of clip. Defaults to 1.0.
   :type quantile: float, optional
   :param dtype: data type. Defaults to 'nf4'.
   :type dtype: str, optional
   :param return_int: whether return int data. Defaults to False.
   :type return_int: bool, optional

   :returns: fake quantized tensor
   :rtype: q_tensor


.. py:function:: qdq_weight_asym(weight, bits=4, quantile=1.0, return_int=False, **kwargs)

   Quant and dequant tensor with asym schema.

   :param weight: input weight
   :param bits: bits. Defaults to 4.
   :type bits: int, optional
   :param quantile: percentile of clip. Defaults to 1.0.
   :type quantile: float, optional
   :param return_int: Choose return fp32 or int8/uint8 data.
                      Defaults to False.
   :type return_int: bool, optional

   :returns: qdq weight
   :rtype: output


.. py:function:: qdq_weight_sym(weight, bits=4, quantile=1.0, return_int=False, full_range=False, **kwargs)

   Quant and dequant tensor with sym schema.

   :param weight: input weight
   :param bits: bits. Defaults to 4.
   :type bits: int, optional
   :param quantile: percentile of clip. Defaults to 1.0.
   :type quantile: float, optional
   :param return_int: Choose return fp32 or int8/uint8 data.
                      Defaults to False.
   :type return_int: bool, optional
   :param full_range: Choose sym range whether use -2**(bits-1).
                      For example: 4 bit
                          scale = amax / 8 if full_range else amax / 7
                          If True, scale = -scale if abs(min)> abs(max) else scale
                          Defaults to False.
   :type full_range: bool, optional

   :returns: qdq weight
   :rtype: output


.. py:function:: qdq_weight_actor(weight, bits, scheme, quantile=1.0, dtype='int', return_int=False, full_range=False, **kwargs)

   Quant and dequant tensor per channel. It is an in-place op.

   :param weight: input weight
   :param bits: bits. Defaults to 4.
   :type bits: int, optional
   :param quantile: percentile of clip. Defaults to 1.0.
   :type quantile: float, optional
   :param dtype: select from int, nf4, fp4. Defaults to int.
   :type dtype: str, optional
   :param return_int: Choose return fp32 or int8/uint8 data.
                      Defaults to False.
   :type return_int: bool, optional
   :param full_range: Choose sym range whether use -2**(bits-1).
   :type full_range: bool, optional

   :returns: qdq weight
   :rtype: output


.. py:function:: quant_tensor(weight, bits=4, group_size=-1, scheme='asym', quantile=1.0, dtype='int', return_int=False, full_range=False, **kwargs)

   Quant and dequant tensor with group size. It's an in-place function.

   :param weight: input weight
   :param bits: bits. Defaults to 4.
   :type bits: int, optional
   :param group_size: how many elements share one scale/zp. Defaults to -1.
   :type group_size: int, optional
   :param scheme: sym or asym. Defaults to "asym".
   :type scheme: str, optional
   :param quantile: percentile of clip. Defaults to 1.0.
   :type quantile: float, optional
   :param dtype: select from int, nf4, fp4. Defaults to int.
   :type dtype: str, optional
   :param return_int: Choose return fp32 or int8/uint8 data.
                      Defaults to False.
   :type return_int: bool, optional
   :param full_range: Choose sym range whether use -2**(bits-1).
   :type full_range: bool, optional

   :returns: qdq weight.
   :rtype: output


.. py:function:: search_clip(m, bits=4, group_size=32, scheme='asym', dtype='int', enable_full_range=False)

   Search best clip range of each linear in current block. It's not an in-place function.

   :param m: torch module.
   :type m: torch.nn.Module
   :param bits: num bits.
   :type bits: int, optional
   :param group_size: how many elements share one scale/zp.
   :type group_size: int, optional
   :param scheme: sym or asym.
   :type scheme: str, optional
   :param dtype: select from int, nf4, fp4. Defaults to int.
   :type dtype: str, optional
   :param enable_full_range: Choose sym range whether use -2**(bits-1).
   :type enable_full_range: bool, optional

   :returns: best percentile of clip
   :rtype: best_clip_ratio (float)


.. py:function:: quant_weight_w_scale(weight, scale, zp=None, group_size=-1, dtype='int')

   Quant and dequant tensor with group size. It's an in-place function.

   :param weight: input weight
   :param scale: scale
   :param zp: zero point
   :param group_size: how many elements share one scale/zp. Defaults to -1.
   :type group_size: int, optional
   :param dtype: data type, for NF4 FP4

   :returns: int weight.
   :rtype: output


.. py:function:: set_module(model, key, new_module)

   Set new module into model by key name.

   :param model: original model
   :type model: torch.nn.Module
   :param key: module name to be replaced
   :type key: str
   :param new_module: new module to be inserted
   :type new_module: torch.nn.Module


.. py:function:: fetch_module(model, op_name)

   Get module with a given op name.

   :param model: the input model.
   :type model: object
   :param op_name: name of op.
   :type op_name: str

   :returns: module (object).


.. py:function:: get_absorb_layers(model, example_inputs, supported_layers=['Linear'], folding=False)

   Get absorb_to_layer and no_absorb_layer.

   :param model: input model
   :type model: torch.nn.Module
   :param example_inputs: example_inputs
   :param supported_layers: supported_layers. Defaults to ['Linear'].
   :type supported_layers: list, optional
   :param folding: whether allow self-absorption. Defaults to False.
   :type folding: bool, optional

   :returns: dict of absorb_to_layer. eg. {absorb, [absorbed_1, xx]}
             no_absorb_layers: list of no_absorb_layers
   :rtype: absorb_to_layer


.. py:function:: get_module(model, key)

   Get module from model by key name.

   :param model: original model
   :type model: torch.nn.Module
   :param key: module name to be replaced
   :type key: str


.. py:function:: get_block_prefix(model)

   Get prefix and number of blocks.

   :param model: input model
   :type model: torch.nn.Module

   :returns: block_list name in model
             block_num(int): number of block in block_list
   :rtype: block_prefix(str)


.. py:function:: get_example_input(dataloader, i=1)

   Get the example input.

   :param dataloader: calibration dataset.
   :type dataloader: object

   :returns: example_inp (object).


.. py:function:: replace_forward(model)

   Replace forward to get the input args and kwargs of first block for AWQ algorithm.

   :param model: input model.
   :type model: torch.nn.Module

   :raises ValueError: to avoid inference of rest parts in model.

   :returns: model with replaced forward.
   :rtype: torch.nn.Module


.. py:function:: recover_forward(model)

   Recover model and block forward for AWQ algorithm.

   :param model: input model.
   :type model: torch.nn.Module

   :returns: model with recovered forward.
   :rtype: torch.nn.Module


.. py:function:: get_module_input_output(model, module_hook_config={}, dataloader=None, iters=-1, calib_func=None, input_func=None, output_func=None)

   A help function to get input and output tensor of modules in module_name_list.

   :param model: torch model.
   :param module_hook_config: required module name for input/output. Defaults to {}.
                              For example:
                                  module_hook_config = {
                                      'fc1': ['output'],
                                      'fc2': ['input', 'output']
                                  }
   :type module_hook_config: dict, optional
   :param dataloader: dataloader for model input.
   :param iters: iterations for inference.
   :param calib_func: a custom inference function to replace dataloader and iters.
   :param input_func: preprocess input for less memory usage
   :param output_func: preprocess output for less memory usage

   :returns:

             recorded input_values, output_values.
                 for example:
                     {'fc1':
                         {'input': [], 'output': []},
                     }
   :rtype: total_values