neural_compressor.torch.algorithms.smooth_quant.utility

Module Contents

Classes

TorchSmoothQuant

Fake input channel quantization, for more details please refer to

Functions

check_cfg_and_qconfig(tune_cfg, cfgs, ...[, smooth_quant])

Check configs and quantization configs.

get_module(model, key)

Get module from model by key name.

set_module(model, key, new_module)

Set new module into model by key name.

update_sq_scale(ipex_config_path, smoothquant_scale_info)

Update ipex_config.json with smoothquant scale info generated by our algorithm.

reshape_scale_as_weight(layer, scale)

Reshape the scale for weight input channel, depthwise output channel

reshape_in_channel_to_last(layer_name, model)

Move the input channel to the last dim

reshape_scale_as_input(layer, scale)

Reshape the scale for input feature in channel

register_autotune(name)

Class decorator to register a smoothquant auto-tune subclass.

neural_compressor.torch.algorithms.smooth_quant.utility.check_cfg_and_qconfig(tune_cfg, cfgs, op_infos_from_cfgs, output_tensor_ids_op_name, smooth_quant=False)[source]

Check configs and quantization configs.

Parameters:
  • tune_cfg (dict) – dictionary of quantization configuration.

  • cfgs (dict) – the input configs.

  • op_infos_from_cfgs (dict) – op infos from configs.

  • output_tensor_ids_op_name (dict) – dictionary of output tensor op names.

Returns:

cfgs (dict).

neural_compressor.torch.algorithms.smooth_quant.utility.get_module(model, key)[source]

Get module from model by key name.

Parameters:
  • model (torch.nn.Module) – original model

  • key (str) – module name to be replaced

neural_compressor.torch.algorithms.smooth_quant.utility.set_module(model, key, new_module)[source]

Set new module into model by key name.

Parameters:
  • model (torch.nn.Module) – original model

  • key (str) – module name to be replaced

  • new_module (torch.nn.Module) – new module to be inserted

neural_compressor.torch.algorithms.smooth_quant.utility.update_sq_scale(ipex_config_path, smoothquant_scale_info)[source]

Update ipex_config.json with smoothquant scale info generated by our algorithm.

Parameters:
  • ipex_config_path (str) – a path to temporary ipex_config.json file.

  • smoothquant_scale_info (dict) – a dict contains smoothquant scale info.

neural_compressor.torch.algorithms.smooth_quant.utility.reshape_scale_as_weight(layer, scale)[source]

Reshape the scale for weight input channel, depthwise output channel :param layer: torch module :param scale: orig scale :return: reshaped scale.

neural_compressor.torch.algorithms.smooth_quant.utility.reshape_in_channel_to_last(layer_name, model)[source]

Move the input channel to the last dim :param layer_name: Layer name :return: The reshaped weight.

neural_compressor.torch.algorithms.smooth_quant.utility.reshape_scale_as_input(layer, scale)[source]

Reshape the scale for input feature in channel :param layer:

Parameters:

scale

Returns:

neural_compressor.torch.algorithms.smooth_quant.utility.register_autotune(name)[source]

Class decorator to register a smoothquant auto-tune subclass.

Returns:

the class of register

class neural_compressor.torch.algorithms.smooth_quant.utility.TorchSmoothQuant(model, dataloader=None, example_inputs=None, q_func=None, traced_model=None, scale_sharing=True, record_max_info=False)[source]

Fake input channel quantization, for more details please refer to [1] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models [2] SPIQ: Data-Free Per-Channel Static Input Quantization Currently, we only handle the layers whose smooth scale could be absorbed, we will support other layers later.

We only support inplace mode which means the model weights will be changed, you can call recover function to recover the weights if needed