neural_compressor.adaptor.torch_utils.util

Util Class and Functions.

Functions

`move_input_device`(input[, device])	Auto mapping input to device for all kinds of format.
`forward_wrapper`(model, input)	Model forward with device auto mapping.
`get_embedding_contiguous`(model)	This is a helper function for nn.Embedding, and it will get input contiguous.
`is_fused_module`(module)	This is a helper function for _propagate_qconfig_helper to detect if this module is fused.
`collate_torch_preds`(results)	Fetch collated results.
`input2tuple`(input)	This is a helper function to converting a inputting dict values or a list to a tuple.
`append_attr`(fx_model, model[, fx_white_list])	This is a helper method to append attributes for the symbolic traced model.
`generate_activation_observer`(scheme, algorithm[, ...])	This is a helper method to generate an activation observer.
`check_cfg_and_qconfig`(tune_cfg, cfgs, ...[, smooth_quant])	Check configs and quantization configs.
`paser_cfgs`(cfgs)	Parse configs.
`get_quantizable_ops_from_cfgs`(ops_name, ...)	Get quantizable ops from configs, combine fused ops as one op.
`update_sq_scale`(ipex_config_path, smoothquant_scale_info)	Update ipex_config.json with smoothquant scale info generated by our algorithm.
`auto_copy`(module)	Get an IPEX prepared model and return a fp32 model.
`fetch_module`(model, op_name)	Get module with a given op name.
`set_module`(model, op_name, new_module)	Set module with a given op name.
`simple_inference`(model, input)	Record model output tensor.
`get_example_input`(dataloader[, i])	Get the example input.
`get_fallback_order`(adaptor, fp32_model, dataloader, ...)	Get the fall back order for strategy.
`get_mse_order_per_fp32`(adaptor, model, example_inp, ...)	This is a helper method to check the mse influence to last module after QDQ(quant/dequant).
`get_mse_order_per_int8`(adaptor, fp32_model, ...)	This is a helper method to check the mse influence to last module after QDQ(quant/dequant).
`get_torch_version`()	Get torch version.
`match_datatype_pattern`(datatype[, pattern])	Check the datatype pattern.
`calculate_quant_min_max`(unsigned, num_bits)	Calculate the qmin and qmax according to the datatype.
`get_depth`(→ int)	Query the depth of the dict.
`get_dict_at_depth`(d, target_depth, result[, depth])	Get all sub-dicts that are at a specified depth in a nested dict.
`get_element_under_depth`(d, ops_lst)	Get all values in a nested dict.
`get_op_type_by_name`(op_name, quantizable_ops)	Get op type by op name.
`collect_weight_info`(model, q_config)	Collect weight info from q_config for dumping into qconfig.json.
`get_module_input_output`(model[, module_hook_config, ...])	A help function to get input and output tensor of modules in module_name_list.
`get_absorb_layers`(model, example_inputs[, ...])	Get absorb_to_layer and no_absorb_layer.
`get_block_prefix`(model)	Get prefix and number of blocks.
`calibration`(model[, dataloader, n_samples, calib_func])	Calibration with dataloader or calib_func.
`get_hidden_states`(model[, dataloader, n_samples, ...])	Get the input args and kwargs of first block.

Module Contents

neural_compressor.adaptor.torch_utils.util.move_input_device(input, device='cpu')[source]

Auto mapping input to device for all kinds of format.

Parameters:

input (torch.tensor) – input data
device (str, optional) – target device. Defaults to “cpu”.

Returns:

input data on target device

Return type:

input (torch.tensor)

neural_compressor.adaptor.torch_utils.util.forward_wrapper(model, input)[source]

Model forward with device auto mapping.

Parameters:

model (torch.nn.Module) – input model
input (torch.tensor) – input data

Returns:

output data

Return type:

output

neural_compressor.adaptor.torch_utils.util.get_embedding_contiguous(model)[source]

This is a helper function for nn.Embedding, and it will get input contiguous.

Parameters:: model (object) – the input model
Returns:: None

neural_compressor.adaptor.torch_utils.util.is_fused_module(module)[source]

This is a helper function for _propagate_qconfig_helper to detect if this module is fused.

Parameters:: module (object) – the input module
Returns:: is fused or not
Return type:: (bool)

neural_compressor.adaptor.torch_utils.util.collate_torch_preds(results)[source]

Fetch collated results.

Parameters:: result (list) – input result
Returns:: collated results
Return type:: collate_results (list)

neural_compressor.adaptor.torch_utils.util.input2tuple(input)[source]

This is a helper function to converting a inputting dict values or a list to a tuple.

Parameters:: input (list or dict)
Returns:: A tuple.

neural_compressor.adaptor.torch_utils.util.append_attr(fx_model, model, fx_white_list=[])[source]

This is a helper method to append attributes for the symbolic traced model.

Parameters:

fx_model (torch.fx.GraphModule) – The symbolic traced model.
model (torch.nn.Module) – The original model.

Returns:

The symbolic traced model with additional attributes.

Return type:

fx_model (dir)

neural_compressor.adaptor.torch_utils.util.generate_activation_observer(scheme, algorithm, smooth_quant=False, smooth_quant_enable=False)[source]

This is a helper method to generate an activation observer.

Parameters:

scheme (str) – Quantization scheme to be used.
algorithm (str) – What algorithm for computing the quantization parameters based on.

Returns:

An observer.

neural_compressor.adaptor.torch_utils.util.check_cfg_and_qconfig(tune_cfg, cfgs, op_infos_from_cfgs, output_tensor_ids_op_name, smooth_quant=False)[source]

Check configs and quantization configs.

Parameters:

tune_cfg (dict) – dictionary of quantization configuration.
cfgs (dict) – the input configs.
op_infos_from_cfgs (dict) – op infos from configs.
output_tensor_ids_op_name (dict) – dictionary of output tensor op names.

Returns:

cfgs (dict).

neural_compressor.adaptor.torch_utils.util.paser_cfgs(cfgs)[source]

Parse configs.

Parameters:: cfgs (dict) – the input configs.
Returns:: list of op names. tune_cfg (dict): dictionary of quantization configuration. op_infos_from_cfgs (dict): op infos from configs. output_tensor_ids_op_name (dict): dictionary of output tensor op names.
Return type:: ops_name (list)

neural_compressor.adaptor.torch_utils.util.get_quantizable_ops_from_cfgs(ops_name, op_infos_from_cfgs, input_tensor_ids_op_name)[source]

Get quantizable ops from configs, combine fused ops as one op.

Parameters:

ops_name (list) – list of op names.
op_infos_from_cfgs (dict) – op infos from configs.
input_tensor_ids_op_name (dict) – dictionary of input tensor op names.

Returns:

cfgs (dict).

neural_compressor.adaptor.torch_utils.util.update_sq_scale(ipex_config_path, smoothquant_scale_info)[source]

Update ipex_config.json with smoothquant scale info generated by our algorithm.

Parameters:

ipex_config_path (str) – a path to temporary ipex_config.json file.
smoothquant_scale_info (dict) – a dict contains smoothquant scale info.

neural_compressor.adaptor.torch_utils.util.auto_copy(module)[source]

Get an IPEX prepared model and return a fp32 model.

Parameters:: module (object) – IPEX prepared model.
Returns:: fp32 model.

neural_compressor.adaptor.torch_utils.util.fetch_module(model, op_name)[source]

Get module with a given op name.

Parameters:

model (object) – the input model.
op_name (str) – name of op.

Returns:

module (object).

neural_compressor.adaptor.torch_utils.util.set_module(model, op_name, new_module)[source]

Set module with a given op name.

Parameters:

model (object) – the input model.
op_name (str) – name of op.
new_module (object) – the input model.

Returns:

module (object).

neural_compressor.adaptor.torch_utils.util.simple_inference(model, input)[source]

Record model output tensor.

Parameters:

model (object) – the input model.
input (object)

Returns:

output (object).

neural_compressor.adaptor.torch_utils.util.get_example_input(dataloader, i=1)[source]

Get the example input.

Parameters:: dataloader (object) – calibration dataset.
Returns:: example_inp (object).

neural_compressor.adaptor.torch_utils.util.get_fallback_order(adaptor, fp32_model, dataloader, tune_cfg, confidence_batches, fallback=False, requantize_cfgs=None)[source]

Get the fall back order for strategy.

Parameters:

fp32_model (object) – the input model.
dataloader (torch.utils.data.DataLoader) – The calibration dataloader.
tune_cfg (dict) – dictionary of quantization configuration.
confidence_batches (int) – number of confidence batches.
fallback (bool) – if the order is fallback.

Returns:

The fallback order for strategy.

Return type:

ordered_ops (dict/list)

neural_compressor.adaptor.torch_utils.util.get_mse_order_per_fp32(adaptor, model, example_inp, tune_cfg)[source]

This is a helper method to check the mse influence to last module after QDQ(quant/dequant).

Parameters:

model (torch.fx.GraphModule/torch.nn.Module) – A torch model.
example_inp (object) – example inputs.
tune_cfg (dict) – dictionary of quantization configuration.

Returns:

The fallback order for strategy.

Return type:

fallback_order (dict/list)

neural_compressor.adaptor.torch_utils.util.get_mse_order_per_int8(adaptor, fp32_model, example_input, tune_cfg)[source]

This is a helper method to check the mse influence to last module after QDQ(quant/dequant).

Parameters:

model (torch.fx.GraphModule/torch.nn.Module) – A torch model.
example_inp (object) – example inputs.
tune_cfg (dict) – dictionary of quantization configuration.

Returns:

The fallback order for strategy.

Return type:

fallback_order (dict/list)

neural_compressor.adaptor.torch_utils.util.get_torch_version()[source]: Get torch version.

neural_compressor.adaptor.torch_utils.util.match_datatype_pattern(datatype, pattern=None)[source]: Check the datatype pattern.

neural_compressor.adaptor.torch_utils.util.calculate_quant_min_max(unsigned, num_bits)[source]: Calculate the qmin and qmax according to the datatype.

neural_compressor.adaptor.torch_utils.util.get_depth(d) → int[source]: Query the depth of the dict.

neural_compressor.adaptor.torch_utils.util.get_dict_at_depth(d, target_depth, result, depth=0)[source]: Get all sub-dicts that are at a specified depth in a nested dict.

neural_compressor.adaptor.torch_utils.util.get_element_under_depth(d, ops_lst)[source]: Get all values in a nested dict.

neural_compressor.adaptor.torch_utils.util.get_op_type_by_name(op_name, quantizable_ops)[source]: Get op type by op name.

neural_compressor.adaptor.torch_utils.util.collect_weight_info(model, q_config)[source]

Collect weight info from q_config for dumping into qconfig.json.

qconfig.json example: ``` {

‘fc’: {
‘bits’: 4, ‘group_size’: 128, ‘scheme’: ‘asym’, ‘algorithm’: ‘RTN’

}

param q_config:: quantization configure
type q_config:: _type_

neural_compressor.adaptor.torch_utils.util.get_module_input_output(model, module_hook_config={}, dataloader=None, iters=-1, calib_func=None, input_func=None, output_func=None)[source]

A help function to get input and output tensor of modules in module_name_list.

Parameters:

model – torch model.
module_hook_config (dict, optional) –
required module name for input/output. Defaults to {}. For example:

module_hook_config = {
‘fc1’: [‘output’], ‘fc2’: [‘input’, ‘output’]

}
dataloader – dataloader for model input.
iters – iterations for inference.
calib_func – a custom inference function to replace dataloader and iters.
input_func – preprocess input for less memory usage
output_func – preprocess output for less memory usage

Returns:

recorded input_values, output_values.

for example:

{‘fc1’:: {‘input’: [], ‘output’: []},

}

Return type:

total_values

neural_compressor.adaptor.torch_utils.util.get_absorb_layers(model, example_inputs, supported_layers=['Linear'], folding=False)[source]

Get absorb_to_layer and no_absorb_layer.

Parameters:

model (torch.nn.Module) – input model
example_inputs – example_inputs
supported_layers (list, optional) – supported_layers. Defaults to [‘Linear’].
folding (bool, optional) – whether allow self-absorption. Defaults to False.

Returns:

dict of absorb_to_layer. eg. {absorb, [absorbed_1, xx]} no_absorb_layers: list of no_absorb_layers

Return type:

absorb_to_layer

neural_compressor.adaptor.torch_utils.util.get_block_prefix(model)[source]

Get prefix and number of blocks.

Parameters:: model (torch.nn.Module) – input model
Returns:: block_list name in model block_num(int): number of block in block_list
Return type:: block_prefix(str)

neural_compressor.adaptor.torch_utils.util.calibration(model, dataloader=None, n_samples=128, calib_func=None)[source]

Calibration with dataloader or calib_func.

Parameters:

model (torch.nn.Module) – input model
dataloader – dataloader. Defaults to None.
n_samples (int, optional) – n_samples. Defaults to 128.
calib_func – calib_func. Defaults to None.

neural_compressor.adaptor.torch_utils.util.get_hidden_states(model, dataloader=None, n_samples=128, calib_func=None)[source]

Get the input args and kwargs of first block.

Parameters:

model (torch.nn.Module) – input model
dataloader (dataloader, optional) – input dataloader. Defaults to None.
n_samples (int, optional) – number samples from dataloader. Defaults to 128.
calib_func (func, optional) – a calib func to replace dataloader. Defaults to None.

Raises:

ValueError – to avoid inference of rest parts in model

Returns:

a list of input args of each batch total_block_kwargs(list): a list of input kwargs of each batch

Return type:

total_block_args(list)