:py:mod:`neural_compressor.adaptor.ox_utils.smooth_quant` ========================================================= .. py:module:: neural_compressor.adaptor.ox_utils.smooth_quant .. autoapi-nested-parse:: SmoothQuant for onnxrt adaptor. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: neural_compressor.adaptor.ox_utils.smooth_quant.ORTSmoothQuant Functions ~~~~~~~~~ .. autoapisummary:: neural_compressor.adaptor.ox_utils.smooth_quant.get_quant_dequant_output neural_compressor.adaptor.ox_utils.smooth_quant.make_sub_graph neural_compressor.adaptor.ox_utils.smooth_quant.quant_dequant_data .. py:function:: get_quant_dequant_output(model, input_data, output_data, reduce_range, backend) Get loss between fp32 output and QDQ output. :param model: model :type model: object :param input_data: fp32 input :type input_data: numpy.ndarray :param output_data: fp32 output :type output_data: numpy.ndarray :param reduce_range: use 7 bit or not :type reduce_range: bool :param backend: execution provider :type backend: str .. py:function:: make_sub_graph(node, inits, input_data, output_data, reduce_range, opset, ir_version) Build a model with the specific node. :param node: node :type node: object :param inits: initializer inputs of this node :type inits: list :param input_data: fp32 input :type input_data: numpy.ndarray :param output_data: fp32 output :type output_data: numpy.ndarray :param reduce_range: use 7 bit or not :type reduce_range: bool :param opset: opset of the model :type opset: object :param ir_version: ir_version of the model :type ir_version: object .. py:function:: quant_dequant_data(data, reduce_range=False, qType=3, scheme='sym') Quantize and then dequantize data. :param data: target data :type data: numpy.ndarray :param reduce_range: use 7 bit or not :type reduce_range: bool :param qType: data type :type qType: int :param scheme: sym or asym quantization :type scheme: str .. py:class:: ORTSmoothQuant(model, dataloader, reduce_range=False, backend='CPUExecutionProvider') Fake input channel quantization. For more details please refer to: [1] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models [2] SPIQ: Data-Free Per-Channel Static Input Quantization We only support inplace mode which means the model weights will be changed, you can call recover function to recover the weights if needed.