`neural_compressor.torch.algorithms.smooth_quant.utility`

Module Contents

Classes

TorchSmoothQuant

Fake input channel quantization, for more details please refer to

Functions

`check_cfg_and_qconfig`(tune_cfg, cfgs, ...[, smooth_quant])	Check configs and quantization configs.
`get_module`(model, key)	Get module from model by key name.
`set_module`(model, key, new_module)	Set new module into model by key name.
`update_sq_scale`(ipex_config_path, smoothquant_scale_info)	Update ipex_config.json with smoothquant scale info generated by our algorithm.
`reshape_scale_as_weight`(layer, scale)	Reshape the scale for weight input channel, depthwise output channel
`reshape_in_channel_to_last`(layer_name, model)	Move the input channel to the last dim
`reshape_scale_as_input`(layer, scale)	Reshape the scale for input feature in channel
`register_autotune`(name)	Class decorator to register a smoothquant auto-tune subclass.

neural_compressor.torch.algorithms.smooth_quant.utility.check_cfg_and_qconfig(tune_cfg, cfgs, op_infos_from_cfgs, output_tensor_ids_op_name, smooth_quant=False)[source]

Check configs and quantization configs.

Parameters:

tune_cfg (dict) – dictionary of quantization configuration.
cfgs (dict) – the input configs.
op_infos_from_cfgs (dict) – op infos from configs.
output_tensor_ids_op_name (dict) – dictionary of output tensor op names.

Returns:

cfgs (dict).

neural_compressor.torch.algorithms.smooth_quant.utility.get_module(model, key)[source]

Get module from model by key name.

Parameters:

model (torch.nn.Module) – original model
key (str) – module name to be replaced

neural_compressor.torch.algorithms.smooth_quant.utility.set_module(model, key, new_module)[source]

Set new module into model by key name.

Parameters:

model (torch.nn.Module) – original model
key (str) – module name to be replaced
new_module (torch.nn.Module) – new module to be inserted

neural_compressor.torch.algorithms.smooth_quant.utility.update_sq_scale(ipex_config_path, smoothquant_scale_info)[source]

Update ipex_config.json with smoothquant scale info generated by our algorithm.

Parameters:

ipex_config_path (str) – a path to temporary ipex_config.json file.
smoothquant_scale_info (dict) – a dict contains smoothquant scale info.

neural_compressor.torch.algorithms.smooth_quant.utility.reshape_scale_as_weight(layer, scale)[source]: Reshape the scale for weight input channel, depthwise output channel :param layer: torch module :param scale: orig scale :return: reshaped scale.

neural_compressor.torch.algorithms.smooth_quant.utility.reshape_in_channel_to_last(layer_name, model)[source]: Move the input channel to the last dim :param layer_name: Layer name :return: The reshaped weight.

neural_compressor.torch.algorithms.smooth_quant.utility.reshape_scale_as_input(layer, scale)[source]

Reshape the scale for input feature in channel :param layer:

Parameters:: scale –
Returns:

neural_compressor.torch.algorithms.smooth_quant.utility.register_autotune(name)[source]

Class decorator to register a smoothquant auto-tune subclass.

Returns:: the class of register

class neural_compressor.torch.algorithms.smooth_quant.utility.TorchSmoothQuant(model, dataloader=None, example_inputs=None, q_func=None, traced_model=None, scale_sharing=True, record_max_info=False)[source]

Fake input channel quantization, for more details please refer to [1] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models [2] SPIQ: Data-Free Per-Channel Static Input Quantization Currently, we only handle the layers whose smooth scale could be absorbed, we will support other layers later.

We only support inplace mode which means the model weights will be changed, you can call recover function to recover the weights if needed

neural_compressor.torch.algorithms.smooth_quant.utility

Module Contents

Classes

Functions

`neural_compressor.torch.algorithms.smooth_quant.utility`