neural_compressor.torch.algorithms.static_quant.utility
Module Contents
Classes
The statistics printer. |
|
Detect the attention block and FFN block in transformer-based model. |
Functions
|
Check configs and quantization configs. |
|
This is a helper method to generate a dict containing activation observer info. |
|
Get all quantizable ops from model. |
|
The function is used for ipex warm-up inference. |
|
This is a function to dump quantizable ops of model to user. |
|
Query the depth of the dict. |
|
Get all sub-dicts that are at a specified depth in a nested dict. |
|
Get all values in a nested dict. |
|
Parse configs. |
|
Get quantizable ops from configs, combine fused ops as one op. |
- neural_compressor.torch.algorithms.static_quant.utility.check_cfg_and_qconfig(user_cfg, cfgs, op_infos_from_cfgs, output_tensor_ids_op_name)[source]
Check configs and quantization configs.
- Parameters:
user_cfg (dict) – quantization configuration for ops.
cfgs (dict) – configs loaded from ipex config path.
op_infos_from_cfgs (dict) – dict containing configs that have been parsed for each op.
output_tensor_ids_op_name (dict) – dict containing op names corresponding to ‘op_infos_from_cfgs’.
- Returns:
updated configs.
- Return type:
cfgs (dict)
- neural_compressor.torch.algorithms.static_quant.utility.generate_activation_observer(scheme, algorithm, smooth_quant=False, smooth_quant_enable=False)[source]
This is a helper method to generate a dict containing activation observer info.
- Parameters:
scheme (str) – Quantization scheme to be used.
algorithm (str) – What algorithm for computing the quantization parameters based on.
- Returns:
A dict containing observer info.zs
- neural_compressor.torch.algorithms.static_quant.utility.get_quantizable_ops_recursively(model, example_inputs)[source]
Get all quantizable ops from model.
- Parameters:
model (object) – input model
example_inputs (dict|list|tuple|torch.Tensor) – used to trace torch model.
- Returns:
list of tuples of op_name and op_type. cfgs (dict): dict of configuration
- Return type:
quantizable_ops (list)
- neural_compressor.torch.algorithms.static_quant.utility.simple_inference(q_model, example_inputs, iterations=1)[source]
The function is used for ipex warm-up inference.
- neural_compressor.torch.algorithms.static_quant.utility.dump_model_op_stats(user_cfg)[source]
This is a function to dump quantizable ops of model to user.
- Parameters:
user_cfg (dict) – quantization config
- Returns:
None
- neural_compressor.torch.algorithms.static_quant.utility.get_depth(d) int [source]
Query the depth of the dict.
- neural_compressor.torch.algorithms.static_quant.utility.get_dict_at_depth(d, target_depth, result, depth=0)[source]
Get all sub-dicts that are at a specified depth in a nested dict.
- neural_compressor.torch.algorithms.static_quant.utility.get_element_under_depth(d, ops_lst)[source]
Get all values in a nested dict.
- neural_compressor.torch.algorithms.static_quant.utility.paser_cfgs(cfgs)[source]
Parse configs.
- Parameters:
cfgs (dict) – the input configs.
- Returns:
list of op names. tune_cfg (dict): dictionary of quantization configuration. op_infos_from_cfgs (dict): op infos from configs. output_tensor_ids_op_name (dict): dictionary of output tensor op names.
- Return type:
ops_name (list)
- neural_compressor.torch.algorithms.static_quant.utility.get_quantizable_ops_from_cfgs(ops_name, op_infos_from_cfgs, input_tensor_ids_op_name)[source]
Get quantizable ops from configs, combine fused ops as one op.
- Parameters:
ops_name (list) – list of op names.
op_infos_from_cfgs (dict) – op infos from configs.
input_tensor_ids_op_name (dict) – dictionary of input tensor op names.
- Returns:
cfgs (dict).