neural_compressor.adaptor.torch_utils.util
Util Class and Functions.
Functions
|
Auto mapping input to device for all kinds of format. |
|
Model forward with device auto mapping. |
|
This is a helper function for nn.Embedding, and it will get input contiguous. |
|
This is a helper function for _propagate_qconfig_helper to detect if this module is fused. |
|
Fetch collated results. |
|
This is a helper function to converting a inputting dict values or a list to a tuple. |
|
This is a helper method to append attributes for the symbolic traced model. |
|
This is a helper method to generate an activation observer. |
|
Check configs and quantization configs. |
|
Parse configs. |
|
Get quantizable ops from configs, combine fused ops as one op. |
|
Update ipex_config.json with smoothquant scale info generated by our algorithm. |
|
Get an IPEX prepared model and return a fp32 model. |
|
Get module with a given op name. |
|
Set module with a given op name. |
|
Record model output tensor. |
|
Get the example input. |
|
Get the fall back order for strategy. |
|
This is a helper method to check the mse influence to last module after QDQ(quant/dequant). |
|
This is a helper method to check the mse influence to last module after QDQ(quant/dequant). |
Get torch version. |
|
|
Check the datatype pattern. |
|
Calculate the qmin and qmax according to the datatype. |
|
Query the depth of the dict. |
|
Get all sub-dicts that are at a specified depth in a nested dict. |
|
Get all values in a nested dict. |
|
Get op type by op name. |
|
Collect weight info from q_config for dumping into qconfig.json. |
|
A help function to get input and output tensor of modules in module_name_list. |
|
Get absorb_to_layer and no_absorb_layer. |
|
Get prefix and number of blocks. |
|
Calibration with dataloader or calib_func. |
|
Get the input args and kwargs of first block. |
Module Contents
- neural_compressor.adaptor.torch_utils.util.move_input_device(input, device='cpu')[source]
Auto mapping input to device for all kinds of format.
- Parameters:
input (torch.tensor) – input data
device (str, optional) – target device. Defaults to “cpu”.
- Returns:
input data on target device
- Return type:
input (torch.tensor)
- neural_compressor.adaptor.torch_utils.util.forward_wrapper(model, input)[source]
Model forward with device auto mapping.
- Parameters:
model (torch.nn.Module) – input model
input (torch.tensor) – input data
- Returns:
output data
- Return type:
output
- neural_compressor.adaptor.torch_utils.util.get_embedding_contiguous(model)[source]
This is a helper function for nn.Embedding, and it will get input contiguous.
- Parameters:
model (object) – the input model
- Returns:
None
- neural_compressor.adaptor.torch_utils.util.is_fused_module(module)[source]
This is a helper function for _propagate_qconfig_helper to detect if this module is fused.
- Parameters:
module (object) – the input module
- Returns:
is fused or not
- Return type:
(bool)
- neural_compressor.adaptor.torch_utils.util.collate_torch_preds(results)[source]
Fetch collated results.
- Parameters:
result (list) – input result
- Returns:
collated results
- Return type:
collate_results (list)
- neural_compressor.adaptor.torch_utils.util.input2tuple(input)[source]
This is a helper function to converting a inputting dict values or a list to a tuple.
- Parameters:
input (list or dict)
- Returns:
A tuple.
- neural_compressor.adaptor.torch_utils.util.append_attr(fx_model, model, fx_white_list=[])[source]
This is a helper method to append attributes for the symbolic traced model.
- Parameters:
fx_model (torch.fx.GraphModule) – The symbolic traced model.
model (torch.nn.Module) – The original model.
- Returns:
The symbolic traced model with additional attributes.
- Return type:
fx_model (dir)
- neural_compressor.adaptor.torch_utils.util.generate_activation_observer(scheme, algorithm, smooth_quant=False, smooth_quant_enable=False)[source]
This is a helper method to generate an activation observer.
- Parameters:
scheme (str) – Quantization scheme to be used.
algorithm (str) – What algorithm for computing the quantization parameters based on.
- Returns:
An observer.
- neural_compressor.adaptor.torch_utils.util.check_cfg_and_qconfig(tune_cfg, cfgs, op_infos_from_cfgs, output_tensor_ids_op_name, smooth_quant=False)[source]
Check configs and quantization configs.
- Parameters:
tune_cfg (dict) – dictionary of quantization configuration.
cfgs (dict) – the input configs.
op_infos_from_cfgs (dict) – op infos from configs.
output_tensor_ids_op_name (dict) – dictionary of output tensor op names.
- Returns:
cfgs (dict).
- neural_compressor.adaptor.torch_utils.util.paser_cfgs(cfgs)[source]
Parse configs.
- Parameters:
cfgs (dict) – the input configs.
- Returns:
list of op names. tune_cfg (dict): dictionary of quantization configuration. op_infos_from_cfgs (dict): op infos from configs. output_tensor_ids_op_name (dict): dictionary of output tensor op names.
- Return type:
ops_name (list)
- neural_compressor.adaptor.torch_utils.util.get_quantizable_ops_from_cfgs(ops_name, op_infos_from_cfgs, input_tensor_ids_op_name)[source]
Get quantizable ops from configs, combine fused ops as one op.
- Parameters:
ops_name (list) – list of op names.
op_infos_from_cfgs (dict) – op infos from configs.
input_tensor_ids_op_name (dict) – dictionary of input tensor op names.
- Returns:
cfgs (dict).
- neural_compressor.adaptor.torch_utils.util.update_sq_scale(ipex_config_path, smoothquant_scale_info)[source]
Update ipex_config.json with smoothquant scale info generated by our algorithm.
- Parameters:
ipex_config_path (str) – a path to temporary ipex_config.json file.
smoothquant_scale_info (dict) – a dict contains smoothquant scale info.
- neural_compressor.adaptor.torch_utils.util.auto_copy(module)[source]
Get an IPEX prepared model and return a fp32 model.
- Parameters:
module (object) – IPEX prepared model.
- Returns:
fp32 model.
- neural_compressor.adaptor.torch_utils.util.fetch_module(model, op_name)[source]
Get module with a given op name.
- Parameters:
model (object) – the input model.
op_name (str) – name of op.
- Returns:
module (object).
- neural_compressor.adaptor.torch_utils.util.set_module(model, op_name, new_module)[source]
Set module with a given op name.
- Parameters:
model (object) – the input model.
op_name (str) – name of op.
new_module (object) – the input model.
- Returns:
module (object).
- neural_compressor.adaptor.torch_utils.util.simple_inference(model, input)[source]
Record model output tensor.
- Parameters:
model (object) – the input model.
input (object)
- Returns:
output (object).
- neural_compressor.adaptor.torch_utils.util.get_example_input(dataloader, i=1)[source]
Get the example input.
- Parameters:
dataloader (object) – calibration dataset.
- Returns:
example_inp (object).
- neural_compressor.adaptor.torch_utils.util.get_fallback_order(adaptor, fp32_model, dataloader, tune_cfg, confidence_batches, fallback=False, requantize_cfgs=None)[source]
Get the fall back order for strategy.
- Parameters:
fp32_model (object) – the input model.
dataloader (torch.utils.data.DataLoader) – The calibration dataloader.
tune_cfg (dict) – dictionary of quantization configuration.
confidence_batches (int) – number of confidence batches.
fallback (bool) – if the order is fallback.
- Returns:
The fallback order for strategy.
- Return type:
ordered_ops (dict/list)
- neural_compressor.adaptor.torch_utils.util.get_mse_order_per_fp32(adaptor, model, example_inp, tune_cfg)[source]
This is a helper method to check the mse influence to last module after QDQ(quant/dequant).
- Parameters:
model (torch.fx.GraphModule/torch.nn.Module) – A torch model.
example_inp (object) – example inputs.
tune_cfg (dict) – dictionary of quantization configuration.
- Returns:
The fallback order for strategy.
- Return type:
fallback_order (dict/list)
- neural_compressor.adaptor.torch_utils.util.get_mse_order_per_int8(adaptor, fp32_model, example_input, tune_cfg)[source]
This is a helper method to check the mse influence to last module after QDQ(quant/dequant).
- Parameters:
model (torch.fx.GraphModule/torch.nn.Module) – A torch model.
example_inp (object) – example inputs.
tune_cfg (dict) – dictionary of quantization configuration.
- Returns:
The fallback order for strategy.
- Return type:
fallback_order (dict/list)
- neural_compressor.adaptor.torch_utils.util.match_datatype_pattern(datatype, pattern=None)[source]
Check the datatype pattern.
- neural_compressor.adaptor.torch_utils.util.calculate_quant_min_max(unsigned, num_bits)[source]
Calculate the qmin and qmax according to the datatype.
- neural_compressor.adaptor.torch_utils.util.get_dict_at_depth(d, target_depth, result, depth=0)[source]
Get all sub-dicts that are at a specified depth in a nested dict.
- neural_compressor.adaptor.torch_utils.util.get_element_under_depth(d, ops_lst)[source]
Get all values in a nested dict.
- neural_compressor.adaptor.torch_utils.util.get_op_type_by_name(op_name, quantizable_ops)[source]
Get op type by op name.
- neural_compressor.adaptor.torch_utils.util.collect_weight_info(model, q_config)[source]
Collect weight info from q_config for dumping into qconfig.json.
qconfig.json example: ``` {
- ‘fc’: {
‘bits’: 4, ‘group_size’: 128, ‘scheme’: ‘asym’, ‘algorithm’: ‘RTN’
}
- param q_config:
quantization configure
- type q_config:
_type_
- neural_compressor.adaptor.torch_utils.util.get_module_input_output(model, module_hook_config={}, dataloader=None, iters=-1, calib_func=None, input_func=None, output_func=None)[source]
A help function to get input and output tensor of modules in module_name_list.
- Parameters:
model – torch model.
module_hook_config (dict, optional) –
required module name for input/output. Defaults to {}. For example:
- module_hook_config = {
‘fc1’: [‘output’], ‘fc2’: [‘input’, ‘output’]
}
dataloader – dataloader for model input.
iters – iterations for inference.
calib_func – a custom inference function to replace dataloader and iters.
input_func – preprocess input for less memory usage
output_func – preprocess output for less memory usage
- Returns:
- recorded input_values, output_values.
- for example:
- {‘fc1’:
{‘input’: [], ‘output’: []},
}
- Return type:
total_values
- neural_compressor.adaptor.torch_utils.util.get_absorb_layers(model, example_inputs, supported_layers=['Linear'], folding=False)[source]
Get absorb_to_layer and no_absorb_layer.
- Parameters:
model (torch.nn.Module) – input model
example_inputs – example_inputs
supported_layers (list, optional) – supported_layers. Defaults to [‘Linear’].
folding (bool, optional) – whether allow self-absorption. Defaults to False.
- Returns:
dict of absorb_to_layer. eg. {absorb, [absorbed_1, xx]} no_absorb_layers: list of no_absorb_layers
- Return type:
absorb_to_layer
- neural_compressor.adaptor.torch_utils.util.get_block_prefix(model)[source]
Get prefix and number of blocks.
- Parameters:
model (torch.nn.Module) – input model
- Returns:
block_list name in model block_num(int): number of block in block_list
- Return type:
block_prefix(str)
- neural_compressor.adaptor.torch_utils.util.calibration(model, dataloader=None, n_samples=128, calib_func=None)[source]
Calibration with dataloader or calib_func.
- Parameters:
model (torch.nn.Module) – input model
dataloader – dataloader. Defaults to None.
n_samples (int, optional) – n_samples. Defaults to 128.
calib_func – calib_func. Defaults to None.
Get the input args and kwargs of first block.
- Parameters:
model (torch.nn.Module) – input model
dataloader (dataloader, optional) – input dataloader. Defaults to None.
n_samples (int, optional) – number samples from dataloader. Defaults to 128.
calib_func (func, optional) – a calib func to replace dataloader. Defaults to None.
- Raises:
ValueError – to avoid inference of rest parts in model
- Returns:
a list of input args of each batch total_block_kwargs(list): a list of input kwargs of each batch
- Return type:
total_block_args(list)