:orphan:

:py:mod:`neural_compressor.onnxrt.algorithms.weight_only.utility`
=================================================================

.. py:module:: neural_compressor.onnxrt.algorithms.weight_only.utility


Module Contents
---------------


Functions
~~~~~~~~~

.. autoapisummary::

   neural_compressor.onnxrt.algorithms.weight_only.utility.make_matmul_weight_only_node
   neural_compressor.onnxrt.algorithms.weight_only.utility.prepare_inputs
   neural_compressor.onnxrt.algorithms.weight_only.utility.pad_tensor
   neural_compressor.onnxrt.algorithms.weight_only.utility.quant_tensor
   neural_compressor.onnxrt.algorithms.weight_only.utility.qdq_tensor


.. py:function:: make_matmul_weight_only_node(node: onnx.NodeProto, weight_shape: tuple, num_bits: int, group_size: int, k_blocks: int, q_weight: numpy.array, scale: numpy.array, zero_point: numpy.array, accuracy_level: int = 0)

   Build MatMulFpQ4/MatMulNBits node.

   :param node: original matmul node
   :type node: onnx.NodeProto
   :param weight_shape: original weight shape
   :type weight_shape: tuple
   :param num_bits: number of bits used to represent weights.
   :type num_bits: int
   :param group_size: how many elements share one scale/zp
   :type group_size: int
   :param k_blocks: block number
   :type k_blocks: int
   :param q_weight: quantized weight
   :type q_weight: np.array
   :param scale: scale
   :type scale: np.array
   :param zero_point: zero point
   :type zero_point: np.array
   :param accuracy_level: accuracy level.
                          Support 0 (unset), 1(fp32 compute type of jblas kernel),
                          2 (fp16 compute type of jblas kernel), 3 (bf16 compute type of jblas kernel),
                          4 (int8 compute type of jblas kernel) Defaults to 0.
   :type accuracy_level: int, optional

   :returns: MatMulFpQ4 or MatMulNBits node
             new_inits: initializers of the new node
   :rtype: matmul_weight_only_node


.. py:function:: prepare_inputs(model, data_reader, providers)

   Prepare inputs for weight only quantization.

   :param model: onnx model.
   :type model: ModelProto or ONNXModel
   :param data_reader: a calibration data reader.
   :type data_reader: CalibrationDataReader
   :param providers: providers to use.
   :type providers: list

   :returns: prepared inputs.
             so: session options
   :rtype: inputs


.. py:function:: pad_tensor(weight, group_size, k_blocks)

   Pad tensor rowi so that it can be is divisible by group_size.

   :param weight: weight
   :type weight: array
   :param group_size: how many elements share one scale/zp
   :type group_size: int
   :param k_blocks: the number of block
   :type k_blocks: int

   :returns: paded weight
   :rtype: weight


.. py:function:: quant_tensor(data: numpy.array, num_bits: int = 4, group_size: int = 32, scheme: str = 'asym', dtype: str = 'int', ratio: float = 1.0)

   Quantize tensor per group.

   :param data: input weight
   :type data: np.array
   :param num_bits: number of bits used to represent weights. Defaults to 4.
   :type num_bits: int, optional
   :param group_size: how many elements share one scale/zp. Defaults to 4.
   :type group_size: int, optional
   :param scheme: _quantization scheme. Defaults to "asym".
   :type scheme: str, optional
   :param dtype: data type. Defaults to "int".
   :type dtype: str, optional
   :param ratio: percentile of clip. Defaults to 1.0.
   :type ratio: float, optional

   :returns: quantized weight
             scale: scale
             zero_point: zero point
   :rtype: output


.. py:function:: qdq_tensor(data: numpy.array, num_bits: int = 4, group_size: int = 32, scheme: str = 'asym', dtype: str = 'int', ratio: float = 1.0)

   Quant dequant tensor per group.

   :param data: input weight
   :type data: np.array
   :param num_bits: number of bits used to represent weights. Defaults to 4.
   :type num_bits: int, optional
   :param group_size: how many elements share one scale/zp. Defaults to 32.
   :type group_size: int, optional
   :param scheme: quantization scheme. Defaults to "asym".
   :type scheme: str, optional
   :param dtype: data type. Defaults to "int".
   :type dtype: str, optional
   :param ratio: percentile of clip. Defaults to 1.0.
   :type ratio: float, optional

   :returns: quant-dequant weight
   :rtype: output