neural_compressor.torch.algorithms.static_quant.utility

Module Contents

Classes

Statistics

The statistics printer.

TransformerBasedModelBlockPatternDetector

Detect the attention block and FFN block in transformer-based model.

Functions

check_cfg_and_qconfig(user_cfg, cfgs, ...)

Check configs and quantization configs.

generate_activation_observer(scheme, algorithm[, ...])

This is a helper method to generate a dict containing activation observer info.

get_quantizable_ops_recursively(model, example_inputs)

Get all quantizable ops from model.

simple_inference(q_model, example_inputs[, iterations])

The function is used for ipex warm-up inference.

dump_model_op_stats(user_cfg)

This is a function to dump quantizable ops of model to user.

get_depth(→ int)

Query the depth of the dict.

get_dict_at_depth(d, target_depth, result[, depth])

Get all sub-dicts that are at a specified depth in a nested dict.

get_element_under_depth(d, ops_lst)

Get all values in a nested dict.

paser_cfgs(cfgs)

Parse configs.

get_quantizable_ops_from_cfgs(ops_name, ...)

Get quantizable ops from configs, combine fused ops as one op.

neural_compressor.torch.algorithms.static_quant.utility.check_cfg_and_qconfig(user_cfg, cfgs, op_infos_from_cfgs, output_tensor_ids_op_name)[source]

Check configs and quantization configs.

Parameters:
  • user_cfg (dict) – quantization configuration for ops.

  • cfgs (dict) – configs loaded from ipex config path.

  • op_infos_from_cfgs (dict) – dict containing configs that have been parsed for each op.

  • output_tensor_ids_op_name (dict) – dict containing op names corresponding to ‘op_infos_from_cfgs’.

Returns:

updated configs.

Return type:

cfgs (dict)

neural_compressor.torch.algorithms.static_quant.utility.generate_activation_observer(scheme, algorithm, smooth_quant=False, smooth_quant_enable=False)[source]

This is a helper method to generate a dict containing activation observer info.

Parameters:
  • scheme (str) – Quantization scheme to be used.

  • algorithm (str) – What algorithm for computing the quantization parameters based on.

Returns:

A dict containing observer info.zs

neural_compressor.torch.algorithms.static_quant.utility.get_quantizable_ops_recursively(model, example_inputs)[source]

Get all quantizable ops from model.

Parameters:
  • model (object) – input model

  • example_inputs (dict|list|tuple|torch.Tensor) – used to trace torch model.

Returns:

list of tuples of op_name and op_type. cfgs (dict): dict of configuration

Return type:

quantizable_ops (list)

neural_compressor.torch.algorithms.static_quant.utility.simple_inference(q_model, example_inputs, iterations=1)[source]

The function is used for ipex warm-up inference.

neural_compressor.torch.algorithms.static_quant.utility.dump_model_op_stats(user_cfg)[source]

This is a function to dump quantizable ops of model to user.

Parameters:

user_cfg (dict) – quantization config

Returns:

None

neural_compressor.torch.algorithms.static_quant.utility.get_depth(d) int[source]

Query the depth of the dict.

neural_compressor.torch.algorithms.static_quant.utility.get_dict_at_depth(d, target_depth, result, depth=0)[source]

Get all sub-dicts that are at a specified depth in a nested dict.

neural_compressor.torch.algorithms.static_quant.utility.get_element_under_depth(d, ops_lst)[source]

Get all values in a nested dict.

neural_compressor.torch.algorithms.static_quant.utility.paser_cfgs(cfgs)[source]

Parse configs.

Parameters:

cfgs (dict) – the input configs.

Returns:

list of op names. tune_cfg (dict): dictionary of quantization configuration. op_infos_from_cfgs (dict): op infos from configs. output_tensor_ids_op_name (dict): dictionary of output tensor op names.

Return type:

ops_name (list)

neural_compressor.torch.algorithms.static_quant.utility.get_quantizable_ops_from_cfgs(ops_name, op_infos_from_cfgs, input_tensor_ids_op_name)[source]

Get quantizable ops from configs, combine fused ops as one op.

Parameters:
  • ops_name (list) – list of op names.

  • op_infos_from_cfgs (dict) – op infos from configs.

  • input_tensor_ids_op_name (dict) – dictionary of input tensor op names.

Returns:

cfgs (dict).

class neural_compressor.torch.algorithms.static_quant.utility.Statistics(data, header, field_names, output_handle=logger.info)[source]

The statistics printer.

class neural_compressor.torch.algorithms.static_quant.utility.TransformerBasedModelBlockPatternDetector(model: torch.nn.Module, pattern_lst: List[List[str | int]] = BLOCK_PATTERNS)[source]

Detect the attention block and FFN block in transformer-based model.