:orphan: :py:mod:`neural_compressor.torch.algorithms.smooth_quant.utility` ================================================================= .. py:module:: neural_compressor.torch.algorithms.smooth_quant.utility Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: neural_compressor.torch.algorithms.smooth_quant.utility.TorchSmoothQuant Functions ~~~~~~~~~ .. autoapisummary:: neural_compressor.torch.algorithms.smooth_quant.utility.get_quantizable_ops_recursively neural_compressor.torch.algorithms.smooth_quant.utility.check_cfg_and_qconfig neural_compressor.torch.algorithms.smooth_quant.utility.get_module neural_compressor.torch.algorithms.smooth_quant.utility.set_module neural_compressor.torch.algorithms.smooth_quant.utility.update_sq_scale neural_compressor.torch.algorithms.smooth_quant.utility.reshape_scale_as_weight neural_compressor.torch.algorithms.smooth_quant.utility.reshape_in_channel_to_last neural_compressor.torch.algorithms.smooth_quant.utility.reshape_scale_as_input neural_compressor.torch.algorithms.smooth_quant.utility.register_autotune .. py:function:: get_quantizable_ops_recursively(model, example_inputs, alpha, act_algo, inplace=True) Get all quantizable ops from model. :param model: input model :type model: object :param example_inputs: used to trace torch model. :type example_inputs: dict|list|tuple|torch.Tensor :param alpha: smoothquant alpha. :type alpha: float|str :param act_algo: activation algorithm, minmax or kl. :type act_algo: str :param inplace: whether to carry out model transformations in-place. Defaults to True. :type inplace: bool :returns: list of tuples of op_name and op_type. cfgs (dict): dict of configuration :rtype: quantizable_ops (list) .. py:function:: check_cfg_and_qconfig(tune_cfg, cfgs, op_infos_from_cfgs, output_tensor_ids_op_name, smooth_quant=False) Check configs and quantization configs. :param tune_cfg: dictionary of quantization configuration. :type tune_cfg: dict :param cfgs: the input configs. :type cfgs: dict :param op_infos_from_cfgs: op infos from configs. :type op_infos_from_cfgs: dict :param output_tensor_ids_op_name: dictionary of output tensor op names. :type output_tensor_ids_op_name: dict :returns: cfgs (dict). .. py:function:: get_module(model, key) Get module from model by key name. :param model: original model :type model: torch.nn.Module :param key: module name to be replaced :type key: str .. py:function:: set_module(model, key, new_module) Set new module into model by key name. :param model: original model :type model: torch.nn.Module :param key: module name to be replaced :type key: str :param new_module: new module to be inserted :type new_module: torch.nn.Module .. py:function:: update_sq_scale(ipex_config_path, smoothquant_scale_info) Update ipex_config.json with smoothquant scale info generated by our algorithm. :param ipex_config_path: a path to temporary ipex_config.json file. :type ipex_config_path: str :param smoothquant_scale_info: a dict contains smoothquant scale info. :type smoothquant_scale_info: dict .. py:function:: reshape_scale_as_weight(layer, scale) Reshape the scale for weight input channel, depthwise output channel :param layer: torch module :param scale: orig scale :return: reshaped scale. .. py:function:: reshape_in_channel_to_last(layer_name, model) Move the input channel to the last dim :param layer_name: Layer name :return: The reshaped weight. .. py:function:: reshape_scale_as_input(layer, scale) Reshape the scale for input feature in channel :param layer: :param scale: :return: .. py:function:: register_autotune(name) Class decorator to register a smoothquant auto-tune subclass. :return: the class of register .. py:class:: TorchSmoothQuant(model, dataloader=None, example_inputs=None, q_func=None, traced_model=None, scale_sharing=True, record_max_info=False) Fake input channel quantization, for more details please refer to [1] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models [2] SPIQ: Data-Free Per-Channel Static Input Quantization Currently, we only handle the layers whose smooth scale could be absorbed, we will support other layers later. We only support inplace mode which means the model weights will be changed, you can call recover function to recover the weights if needed