neural_compressor.adaptor.pytorch

Module Contents

Classes

TemplateAdaptor

Tample adaptor of PyTorch framework.

PyTorchAdaptor

Adaptor of PyTorch framework, all PyTorch API is in this class.

PyTorch_IPEXAdaptor

Adaptor of PyTorch framework with Intel PyTorch Extension,

PyTorch_FXAdaptor

Adaptor of PyTorch framework with FX graph mode, all PyTorch API is in this class.

PyTorchQuery

Base class that defines Query Interface.

Functions

get_ops_recursively(model, prefix[, ops])

This is a helper function for graph_info,

neural_compressor.adaptor.pytorch.get_ops_recursively(model, prefix, ops={})
This is a helper function for graph_info,

and it will get all ops from model.

Parameters:
  • model (object) – input model

  • prefix (string) – prefix of op name

  • ops (dict) – dict of ops from model {op name: type}.

Returns:

None

class neural_compressor.adaptor.pytorch.TemplateAdaptor(framework_specific_info)

Bases: neural_compressor.adaptor.adaptor.Adaptor

Tample adaptor of PyTorch framework.

Parameters:

framework_specific_info (dict) – dictionary of tuning configure from yaml file.

is_fused_module(module)
This is a helper function for _propagate_qconfig_helper to detecte

if this module is fused.

Parameters:

module (object) – input module

Returns:

is fused or not

Return type:

(bool)

calculate_hessian_trace(fp32_model, dataloader, q_model, criterion, enable_act=False)

Calculate hessian trace.

Parameters:
  • fp32_model – The original fp32 model.

  • criterion – The loss function for calculate the hessian trace. # loss = criterion(output, target)

  • dataloader – The dataloader for calculate the gradient.

  • q_model – The INT8 AMAP model.

  • enable_act – Enabling quantization error or not.

Returns:

(op_name, op_type); value: hessian trace.

Return type:

hessian_trace(Dict[Tuple, float]), key

class neural_compressor.adaptor.pytorch.PyTorchAdaptor(framework_specific_info)

Bases: TemplateAdaptor

Adaptor of PyTorch framework, all PyTorch API is in this class.

Parameters:

framework_specific_info (dict) – dictionary of tuning configure from yaml file.

quantize(tune_cfg, model, dataloader, q_func=None)

Execute the quantize process on the specified model.

Parameters:
  • tune_cfg (dict) – quantization config.

  • model (object) – model need to do quantization.

  • dataloader (object) – calibration dataset.

  • q_func (objext, optional) – training function for quantization aware training mode.

Returns:

quantized model

Return type:

(object)

evaluate(model, dataloader, postprocess=None, metrics=None, measurer=None, iteration=-1, tensorboard=False, fp32_baseline=False)

Execute the evaluate process on the specified model.

Parameters:
  • model (object) – model to run evaluation.

  • dataloader (object) – evaluation dataset.

  • postprocess (object, optional) – process function after evaluation.

  • metrics (list, optional) – list of metric function.

  • measurer (object, optional) – measurer function.

  • iteration (int, optional) – number of iterations to evaluate.

  • tensorboard (bool, optional) – dump output tensor to tensorboard summary files.

  • fp32_baseline (boolen, optional) – only for compare_label=False pipeline

Returns:

accuracy

Return type:

(object)

train(model, dataloader, optimizer_tuple, criterion_tuple, hooks, **kwargs)

Execute the train process on the specified model.

Parameters:
  • model (object) – model to run evaluation.

  • dataloader (object) – training dataset.

  • optimizer (tuple) – It is a tuple of (cls, parameters) for optimizer.

  • criterion (tuple) – It is a tuple of (cls, parameters) for criterion.

  • kwargs (dict, optional) – other parameters.

Returns:

None

is_fused_child(op_name)

This is a helper function for _post_eval_hook

Parameters:

op_name (string) – op name

Returns:

if this op is fused

Return type:

(bool)

is_fused_op(op_name)

This is a helper function for _post_eval_hook

Parameters:

op_name (string) – op name

Returns:

if this op is fused

Return type:

(bool)

is_last_fused_child(op_name)

This is a helper function for _post_eval_hook

Parameters:

op_name (string) – op name

Returns:

if this op is last fused op

Return type:

(bool)

save(model, path=None)

The function is used by tune strategy class for saving model.

Parameters:
  • model (object) – The model to saved.

  • path (string) – The path where to save.

inspect_tensor(model, dataloader, op_list=None, iteration_list=None, inspect_type='activation', save_to_disk=False)

The function is used by tune strategy class for dumping tensor info.

Parameters:
  • model (object) – The model to inspect.

  • dataloader (object) – The dataloader used to feed into.

  • op_list (list) – The op name in the fp32 model for dumpping.

  • iteration_list (list) – The iteration list containing iterations to dump.

  • inspect_type (str) – The valid value are ‘weight’, ‘activation’, ‘all’.

  • save_to_disk (bool) – Save to disk or memory.

Returns:

Numpy Array Dict {

’weight’: {

‘node0_name’: {‘weight0_name’: numpy.array, ‘bias0_name’: numpy.array, …}, ‘node1_name’: {‘weight1_name’: numpy.array, ‘bias1_name’: numpy.array, …}, …

}, ‘activation’: [

# iter 0 {

’node0_name’: {‘output0_name’: numpy.array, ‘output1_name’: numpy.array, …} ‘node1_name’: {‘output1_name’: numpy.array, ‘output1_name’: numpy.array, …} …

}, # iter 1 …

]

}

set_tensor(model, tensor_dict)

The function is used by tune strategy class for setting tensor back to model.

Parameters:
  • model (object) – The model to set tensor. Usually it is quantized model.

  • tensor_dict (dict) –

    The tensor dict to set. Note the numpy array contains float value, adaptor layer has the responsibility to quantize to int8 or int32 to set into the quantized model if needed. The dict format is something like: {

    ’weight0_name’: numpy.array, ‘bias0_name’: numpy.array, …

    }

query_fw_capability(model)

This is a helper function to get all quantizable ops from model.

Parameters:

model (object) – input model which is Neural Compressor model

Returns:

tuning capability for each op from model.

Return type:

q_capability (dictionary)

get_non_quant_modules(model_kwargs)

This is a helper function to get all non_quant_modules from customer and default.

Parameters:

model_kwargs (dictionary) – keyword args from Neural Compressor model

Returns:

non_quant_modules for model.

Return type:

custom_non_quant_dict (dictionary)

class neural_compressor.adaptor.pytorch.PyTorch_IPEXAdaptor(framework_specific_info)

Bases: TemplateAdaptor

Adaptor of PyTorch framework with Intel PyTorch Extension,

all PyTorch IPEX API is in this class.

Parameters:

framework_specific_info (dict) – dictionary of tuning configure from yaml file.

quantize(tune_cfg, model, dataloader, q_func=None)

Execute the quantize process on the specified model.

Parameters:
  • tune_cfg (dict) – quantization config.

  • model (object) – model need to do quantization, it is Neural Compressor model.

  • dataloader (object) – calibration dataset.

  • q_func (objext, optional) – training function for quantization aware training mode.

Returns:

quantized model

Return type:

(dict)

evaluate(model, dataloader, postprocess=None, metrics=None, measurer=None, iteration=-1, tensorboard=False, fp32_baseline=False)

Execute the evaluate process on the specified model.

Parameters:
  • model (object) – Neural Compressor model to run evaluation.

  • dataloader (object) – evaluation dataset.

  • postprocess (object, optional) – process function after evaluation.

  • metrics (list, optional) – list of metric function.

  • measurer (object, optional) – measurer function.

  • iteration (int, optional) – number of iterations to evaluate.

  • tensorboard (bool, optional) – dump output tensor to tensorboard summary files(IPEX unspport).

  • fp32_baseline (boolen, optional) – only for compare_label=False pipeline

Returns:

quantized model

Return type:

(dict)

query_fw_capability(model)

This is a helper function to get all quantizable ops from model.

Parameters:

model (object) – input model which is Neural Compressor model

Returns:

tuning capability for each op from model.

Return type:

q_capability (dictionary)

save(model, path=None)

The function is used by tune strategy class for set best configure in Neural Compressor model.

Args:

model (object): The Neural Compressor model which is best results. path (string): No used.

Returns:

None

inspect_tensor(model, dataloader, op_list=None, iteration_list=None, inspect_type='activation', save_to_disk=False)

The function is used by tune strategy class for dumping tensor info.

Parameters:
  • model (object) – The model to inspect.

  • dataloader (object) – The dataloader used to feed into.

  • op_list (list) – The op name in the fp32 model for dumpping.

  • iteration_list (list) – The iteration list containing iterations to dump.

  • inspect_type (str) – The valid value are ‘weight’, ‘activation’, ‘all’.

  • save_to_disk (bool) – Save to disk or memory.

Returns:

Numpy Array Dict {

’weight’: {

‘node0_name’: {‘weight0_name’: numpy.array, ‘bias0_name’: numpy.array, …}, ‘node1_name’: {‘weight1_name’: numpy.array, ‘bias1_name’: numpy.array, …}, …

}, ‘activation’: [

# iter 0 {

’node0_name’: {‘output0_name’: numpy.array, ‘output1_name’: numpy.array, …} ‘node1_name’: {‘output1_name’: numpy.array, ‘output1_name’: numpy.array, …} …

}, # iter 1 …

]

}

class neural_compressor.adaptor.pytorch.PyTorch_FXAdaptor(framework_specific_info)

Bases: TemplateAdaptor

Adaptor of PyTorch framework with FX graph mode, all PyTorch API is in this class.

Parameters:

framework_specific_info (dict) – dictionary of tuning configure from yaml file.

quantize(tune_cfg, model, dataloader, q_func=None)

Execute the quantize process on the specified model.

Parameters:
  • tune_cfg (dict) – quantization config.

  • model (object) – model need to do quantization.

  • dataloader (object) – calibration dataset.

  • q_func (objext, optional) – training function for quantization aware training mode.

Returns:

quantized model

Return type:

(object)

evaluate(model, dataloader, postprocess=None, metrics=None, measurer=None, iteration=-1, tensorboard=False, fp32_baseline=False)

Execute the evaluate process on the specified model.

Parameters:
  • model (object) – model to run evaluation.

  • dataloader (object) – evaluation dataset.

  • postprocess (object, optional) – process function after evaluation.

  • metric (object, optional) – metric function.

  • measurer (object, optional) – measurer function.

  • iteration (int, optional) – number of iterations to evaluate.

  • tensorboard (bool, optional) – dump output tensor to tensorboard summary files.

  • fp32_baseline (boolen, optional) – only for compare_label=False pipeline

Returns:

accuracy

Return type:

(object)

train(model, dataloader, optimizer_tuple, criterion_tuple, hooks, **kwargs)

Execute the train process on the specified model.

Parameters:
  • model (object) – model to run evaluation.

  • dataloader (object) – training dataset.

  • optimizer (tuple) – It is a tuple of (cls, parameters) for optimizer.

  • criterion (tuple) – It is a tuple of (cls, parameters) for criterion.

  • kwargs (dict, optional) – other parameters.

Returns:

None

static prepare_sub_graph(sub_module_list, fx_op_cfgs, model, prefix, is_qat=False, example_inputs=None)

Static method to prepare sub modules recursively.

Parameters:
  • sub_module_list (list) – contains the name of traceable sub modules

  • fx_op_cfgs (dict, QConfigMapping) – the configuration for prepare_fx quantization.

  • model (dir) – input model which is PyTorch model.

  • prefix (string) – prefix of op name

  • is_qat (bool) – whether it is a qat quantization

Returns:

output model which is a prepared PyTorch model.

Return type:

model (dir)

static convert_sub_graph(sub_module_list, model, prefix)

Static method to convert sub modules recursively.

Parameters:
  • sub_module_list (list) – contains the name of traceable sub modules

  • model (dir) – input model which is prepared PyTorch model.

  • prefix (string) – prefix of op name

Returns:

output model which is a converted PyTorch int8 model.

Return type:

model (dir)

query_fw_capability(model)

This is a helper function to get all quantizable ops from model.

Parameters:

model (object) – input model which is Neural Compressor model

Returns:

tuning capability for each op from model.

Return type:

q_capability (dictionary)

fuse_fx_model(model, is_qat)

This is a helper function to get fused fx model for PyTorch_FXAdaptor.

Parameters:
  • model (object) – input model which is Neural Compressor model.

  • is_qat (bool) – check quantization approach is qat or not.

Returns:

fused GraphModule model from torch.fx.

Return type:

fused_model (GraphModule)

calculate_op_sensitivity(model, dataloader, tune_cfg, output_op_names, confidence_batches, fallback=True, requantize_cfgs=None)
This is a helper function for query_fw_capability,

and it will get all quantizable ops from model.

Parameters:
  • model (object) – INC model containing fp32 model

  • dataloader (string) – dataloader contains real data.

  • tune_cfg (dict) – dictionary of tune configure for each op.

  • fallback (bool) – switch method in fallback stage and re-quantize stage

Returns:

sorted op list by sensitivity

Return type:

ops_lst (list)

class neural_compressor.adaptor.pytorch.PyTorchQuery(local_config_file=None)

Bases: neural_compressor.adaptor.query.QueryBackendCapability

Base class that defines Query Interface. Each adaption layer should implement the inherited class for specific backend on their own.

get_quantization_capability()

Get the supported op types’ quantization capability.

Returns:

A list composed of dictionary which key is precision and value is a dict that describes all op types’ quantization capability.

Return type:

[dictionary list]

get_op_types()

Get the supported op types by all precisions. :returns: A list composed of dictionary which key is precision

and value is the op types.

Return type:

[dictionary list]

get_op_types_by_precision(precision)

Get op types per precision :param precision: precision name :type precision: string

Returns:

A list composed of op type.

Return type:

[string list]