neural_compressor.adaptor.torch_utils.util

Util Class and Functions.

Module Contents

Functions

move_input_device(input[, device])

Auto mapping input to device for all kinds of format.

forward_wrapper(model, input)

Model forward with device auto mapping.

get_embedding_contiguous(model)

This is a helper function for nn.Embedding, and it will get input contiguous.

is_fused_module(module)

This is a helper function for _propagate_qconfig_helper to detect if this module is fused.

collate_torch_preds(results)

Fetch collated results.

input2tuple(input)

This is a helper function to converting a inputting dict values or a list to a tuple.

append_attr(fx_model, model[, fx_white_list])

This is a helper method to append attributes for the symbolic traced model.

generate_activation_observer(scheme, algorithm[, ...])

This is a helper method to generate an activation observer.

check_cfg_and_qconfig(tune_cfg, cfgs, ...[, smooth_quant])

Check configs and quantization configs.

paser_cfgs(cfgs)

Parse configs.

get_quantizable_ops_from_cfgs(ops_name, ...)

Get quantizable ops from configs, combine fused ops as one op.

update_sq_scale(ipex_config_path, smoothquant_scale_info)

Update ipex_config.json with smoothquant scale info generated by our algorithm.

auto_copy(module)

Get an IPEX prepared model and return a fp32 model.

fetch_module(model, op_name)

Get module with a given op name.

set_module(model, op_name, new_module)

Set module with a given op name.

simple_inference(model, input)

Record model output tensor.

get_example_input(dataloader[, i])

Get the example input.

get_fallback_order(adaptor, fp32_model, dataloader, ...)

Get the fall back order for strategy.

get_mse_order_per_fp32(adaptor, model, example_inp, ...)

This is a helper method to check the mse influence to last module after QDQ(quant/dequant).

get_mse_order_per_int8(adaptor, fp32_model, ...)

This is a helper method to check the mse influence to last module after QDQ(quant/dequant).

get_torch_version()

Get torch version.

match_datatype_pattern(datatype[, pattern])

Check the datatype pattern.

calculate_quant_min_max(unsigned, num_bits)

Calculate the qmin and qmax according to the datatype.

get_depth(→ int)

Query the depth of the dict.

get_dict_at_depth(d, target_depth, result[, depth])

Get all sub-dicts that are at a specified depth in a nested dict.

get_element_under_depth(d, ops_lst)

Get all values in a nested dict.

get_op_type_by_name(op_name, quantizable_ops)

Get op type by op name.

collect_weight_info(model, q_config)

Collect weight info from q_config for dumping into qconfig.json.

get_module_input_output(model[, module_hook_config, ...])

A help function to get input and output tensor of modules in module_name_list.

get_absorb_layers(model, example_inputs[, ...])

Get absorb_to_layer and no_absorb_layer.

get_block_prefix(model)

Get prefix and number of blocks.

calibration(model[, dataloader, n_samples, calib_func])

Calibration with dataloader or calib_func.

get_hidden_states(model[, dataloader, n_samples, ...])

Get the input args and kwargs of first block.

neural_compressor.adaptor.torch_utils.util.move_input_device(input, device='cpu')[source]

Auto mapping input to device for all kinds of format.

Parameters:
  • input (torch.tensor) – input data

  • device (str, optional) – target device. Defaults to “cpu”.

Returns:

input data on target device

Return type:

input (torch.tensor)

neural_compressor.adaptor.torch_utils.util.forward_wrapper(model, input)[source]

Model forward with device auto mapping.

Parameters:
  • model (torch.nn.Module) – input model

  • input (torch.tensor) – input data

Returns:

output data

Return type:

output

neural_compressor.adaptor.torch_utils.util.get_embedding_contiguous(model)[source]

This is a helper function for nn.Embedding, and it will get input contiguous.

Parameters:

model (object) – the input model

Returns:

None

neural_compressor.adaptor.torch_utils.util.is_fused_module(module)[source]

This is a helper function for _propagate_qconfig_helper to detect if this module is fused.

Parameters:

module (object) – the input module

Returns:

is fused or not

Return type:

(bool)

neural_compressor.adaptor.torch_utils.util.collate_torch_preds(results)[source]

Fetch collated results.

Parameters:

result (list) – input result

Returns:

collated results

Return type:

collate_results (list)

neural_compressor.adaptor.torch_utils.util.input2tuple(input)[source]

This is a helper function to converting a inputting dict values or a list to a tuple.

Parameters:

input (list or dict) –

Returns:

A tuple.

neural_compressor.adaptor.torch_utils.util.append_attr(fx_model, model, fx_white_list=[])[source]

This is a helper method to append attributes for the symbolic traced model.

Parameters:
  • fx_model (torch.fx.GraphModule) – The symbolic traced model.

  • model (torch.nn.Module) – The original model.

Returns:

The symbolic traced model with additional attributes.

Return type:

fx_model (dir)

neural_compressor.adaptor.torch_utils.util.generate_activation_observer(scheme, algorithm, smooth_quant=False, smooth_quant_enable=False)[source]

This is a helper method to generate an activation observer.

Parameters:
  • scheme (str) – Quantization scheme to be used.

  • algorithm (str) – What algorithm for computing the quantization parameters based on.

Returns:

An observer.

neural_compressor.adaptor.torch_utils.util.check_cfg_and_qconfig(tune_cfg, cfgs, op_infos_from_cfgs, output_tensor_ids_op_name, smooth_quant=False)[source]

Check configs and quantization configs.

Parameters:
  • tune_cfg (dict) – dictionary of quantization configuration.

  • cfgs (dict) – the input configs.

  • op_infos_from_cfgs (dict) – op infos from configs.

  • output_tensor_ids_op_name (dict) – dictionary of output tensor op names.

Returns:

cfgs (dict).

neural_compressor.adaptor.torch_utils.util.paser_cfgs(cfgs)[source]

Parse configs.

Parameters:

cfgs (dict) – the input configs.

Returns:

list of op names. tune_cfg (dict): dictionary of quantization configuration. op_infos_from_cfgs (dict): op infos from configs. output_tensor_ids_op_name (dict): dictionary of output tensor op names.

Return type:

ops_name (list)

neural_compressor.adaptor.torch_utils.util.get_quantizable_ops_from_cfgs(ops_name, op_infos_from_cfgs, input_tensor_ids_op_name)[source]

Get quantizable ops from configs, combine fused ops as one op.

Parameters:
  • ops_name (list) – list of op names.

  • op_infos_from_cfgs (dict) – op infos from configs.

  • input_tensor_ids_op_name (dict) – dictionary of input tensor op names.

Returns:

cfgs (dict).

neural_compressor.adaptor.torch_utils.util.update_sq_scale(ipex_config_path, smoothquant_scale_info)[source]

Update ipex_config.json with smoothquant scale info generated by our algorithm.

Parameters:
  • ipex_config_path (str) – a path to temporary ipex_config.json file.

  • smoothquant_scale_info (dict) – a dict contains smoothquant scale info.

neural_compressor.adaptor.torch_utils.util.auto_copy(module)[source]

Get an IPEX prepared model and return a fp32 model.

Parameters:

module (object) – IPEX prepared model.

Returns:

fp32 model.

neural_compressor.adaptor.torch_utils.util.fetch_module(model, op_name)[source]

Get module with a given op name.

Parameters:
  • model (object) – the input model.

  • op_name (str) – name of op.

Returns:

module (object).

neural_compressor.adaptor.torch_utils.util.set_module(model, op_name, new_module)[source]

Set module with a given op name.

Parameters:
  • model (object) – the input model.

  • op_name (str) – name of op.

  • new_module (object) – the input model.

Returns:

module (object).

neural_compressor.adaptor.torch_utils.util.simple_inference(model, input)[source]

Record model output tensor.

Parameters:
  • model (object) – the input model.

  • input (object) –

Returns:

output (object).

neural_compressor.adaptor.torch_utils.util.get_example_input(dataloader, i=1)[source]

Get the example input.

Parameters:

dataloader (object) – calibration dataset.

Returns:

example_inp (object).

neural_compressor.adaptor.torch_utils.util.get_fallback_order(adaptor, fp32_model, dataloader, tune_cfg, confidence_batches, fallback=False, requantize_cfgs=None)[source]

Get the fall back order for strategy.

Parameters:
  • fp32_model (object) – the input model.

  • dataloader (torch.utils.data.DataLoader) – The calibration dataloader.

  • tune_cfg (dict) – dictionary of quantization configuration.

  • confidence_batches (int) – number of confidence batches.

  • fallback (bool) – if the order is fallback.

Returns:

The fallback order for strategy.

Return type:

ordered_ops (dict/list)

neural_compressor.adaptor.torch_utils.util.get_mse_order_per_fp32(adaptor, model, example_inp, tune_cfg)[source]

This is a helper method to check the mse influence to last module after QDQ(quant/dequant).

Parameters:
  • model (torch.fx.GraphModule/torch.nn.Module) – A torch model.

  • example_inp (object) – example inputs.

  • tune_cfg (dict) – dictionary of quantization configuration.

Returns:

The fallback order for strategy.

Return type:

fallback_order (dict/list)

neural_compressor.adaptor.torch_utils.util.get_mse_order_per_int8(adaptor, fp32_model, example_input, tune_cfg)[source]

This is a helper method to check the mse influence to last module after QDQ(quant/dequant).

Parameters:
  • model (torch.fx.GraphModule/torch.nn.Module) – A torch model.

  • example_inp (object) – example inputs.

  • tune_cfg (dict) – dictionary of quantization configuration.

Returns:

The fallback order for strategy.

Return type:

fallback_order (dict/list)

neural_compressor.adaptor.torch_utils.util.get_torch_version()[source]

Get torch version.

neural_compressor.adaptor.torch_utils.util.match_datatype_pattern(datatype, pattern=None)[source]

Check the datatype pattern.

neural_compressor.adaptor.torch_utils.util.calculate_quant_min_max(unsigned, num_bits)[source]

Calculate the qmin and qmax according to the datatype.

neural_compressor.adaptor.torch_utils.util.get_depth(d) int[source]

Query the depth of the dict.

neural_compressor.adaptor.torch_utils.util.get_dict_at_depth(d, target_depth, result, depth=0)[source]

Get all sub-dicts that are at a specified depth in a nested dict.

neural_compressor.adaptor.torch_utils.util.get_element_under_depth(d, ops_lst)[source]

Get all values in a nested dict.

neural_compressor.adaptor.torch_utils.util.get_op_type_by_name(op_name, quantizable_ops)[source]

Get op type by op name.

neural_compressor.adaptor.torch_utils.util.collect_weight_info(model, q_config)[source]

Collect weight info from q_config for dumping into qconfig.json.

qconfig.json example: ``` {

‘fc’: {

‘bits’: 4, ‘group_size’: 128, ‘scheme’: ‘asym’, ‘algorithm’: ‘RTN’

}

param q_config:

quantization configure

type q_config:

_type_

neural_compressor.adaptor.torch_utils.util.get_module_input_output(model, module_hook_config={}, dataloader=None, iters=-1, calib_func=None, input_func=None, output_func=None)[source]

A help function to get input and output tensor of modules in module_name_list.

Parameters:
  • model – torch model.

  • module_hook_config (dict, optional) –

    required module name for input/output. Defaults to {}. For example:

    module_hook_config = {

    ‘fc1’: [‘output’], ‘fc2’: [‘input’, ‘output’]

    }

  • dataloader – dataloader for model input.

  • iters – iterations for inference.

  • calib_func – a custom inference function to replace dataloader and iters.

  • input_func – preprocess input for less memory usage

  • output_func – preprocess output for less memory usage

Returns:

recorded input_values, output_values.
for example:
{‘fc1’:

{‘input’: [], ‘output’: []},

}

Return type:

total_values

neural_compressor.adaptor.torch_utils.util.get_absorb_layers(model, example_inputs, supported_layers=['Linear'], folding=False)[source]

Get absorb_to_layer and no_absorb_layer.

Parameters:
  • model (torch.nn.Module) – input model

  • example_inputs – example_inputs

  • supported_layers (list, optional) – supported_layers. Defaults to [‘Linear’].

  • folding (bool, optional) – whether allow self-absorption. Defaults to False.

Returns:

dict of absorb_to_layer. eg. {absorb, [absorbed_1, xx]} no_absorb_layers: list of no_absorb_layers

Return type:

absorb_to_layer

neural_compressor.adaptor.torch_utils.util.get_block_prefix(model)[source]

Get prefix and number of blocks.

Parameters:

model (torch.nn.Module) – input model

Returns:

block_list name in model block_num(int): number of block in block_list

Return type:

block_prefix(str)

neural_compressor.adaptor.torch_utils.util.calibration(model, dataloader=None, n_samples=128, calib_func=None)[source]

Calibration with dataloader or calib_func.

Parameters:
  • model (torch.nn.Module) – input model

  • dataloader – dataloader. Defaults to None.

  • n_samples (int, optional) – n_samples. Defaults to 128.

  • calib_func – calib_func. Defaults to None.

neural_compressor.adaptor.torch_utils.util.get_hidden_states(model, dataloader=None, n_samples=128, calib_func=None)[source]

Get the input args and kwargs of first block.

Parameters:
  • model (torch.nn.Module) – input model

  • dataloader (dataloader, optional) – input dataloader. Defaults to None.

  • n_samples (int, optional) – number samples from dataloader. Defaults to 128.

  • calib_func (func, optional) – a calib func to replace dataloader. Defaults to None.

Raises:

ValueError – to avoid inference of rest parts in model

Returns:

a list of input args of each batch total_block_kwargs(list): a list of input kwargs of each batch

Return type:

total_block_args(list)