neural_compressor.adaptor.tensorflow

Tensorflow Adaptor Classes.

Module Contents

Classes

TensorFlowAdaptor

Adaptor Layer for stock tensorflow and spr-base.

Tensorflow_ITEXAdaptor

Tensorflow ITEX Adaptor Class.

TensorflowQuery

Tensorflow Query Capability Class.

class neural_compressor.adaptor.tensorflow.TensorFlowAdaptor(framework_specific_info)

Bases: neural_compressor.adaptor.adaptor.Adaptor

Adaptor Layer for stock tensorflow and spr-base.

train(model, dataloader, optimizer_tuple, criterion_tuple, hooks, postprocess, **kwargs)

Model training API.

Parameters:
  • model ([Graph, GraphDef or Path String]) – The model could be the graph, graph_def object, the frozen pb or ckpt/savedmodel folder path.

  • dataloader (generator) – generate the data and labels.

  • optimizer_tuple (tuple) – optimizers for model training.

  • criterion_tuple (tuple) – criterions for model training.

  • hooks (callback) – on_epoch_begin hook on_epoch_end hook.

  • postprocess (object) – process the result from the model.

Returns:

None.

evaluate(model, dataloader, postprocess=None, metrics=None, measurer=None, iteration=-1, tensorboard=False, fp32_baseline=False)

Evaluate the model for specified metric on validation dataset.

Parameters:
  • model ([Graph, GraphDef or Path String]) – The model could be the graph, graph_def object, the frozen pb or ckpt/savedmodel folder path.

  • dataloader (generator) – generate the data and labels.

  • postprocess (object, optional) – process the result from the model

  • metrics (list, optional) – Depends on model category. Defaults to None.

  • measurer (object, optional) – for precise benchmark measurement.

  • iteration (int, optional) – control steps of mini-batch

  • tensorboard (boolean, optional) – for tensorboard inspect tensor.

  • fp32_baseline (boolen, optional) – only for compare_label=False pipeline

Returns:

evaluation result, the larger is better.

Return type:

[float]

quantize(tune_cfg, model, data_loader, q_func=None)

Execute the quantize process on the specified model.

Parameters:
  • tune_cfg (dict) – quantization configuration

  • model (tf.compat.v1.GraphDef) – fp32 model

  • data_loader (generator) – generator the data and labels

  • q_func (optional) – training function for quantization aware training mode, which not enabled for tensorflow yet.

Returns:

the quantized model

Return type:

tf.compat.v1.GraphDef

query_fw_capability(model)

Collect the model-wise and op-wise configuration for quantization.

Parameters:

model (tf.compat.v1.GraphDef) – model definition.

Returns:

model-wise & op-wise configuration for quantization.

Return type:

[dict]

set_tensor(model, tensor_dict)

Quantize the bias and weight tensors in tensor_dict.

inspect_weight_and_bias(node_list, graph_def, graph_info, graph_node_name_mapping)

Inspect the weights and biases.

fused_node_mapping(node_list, pattern_mapping, graph_info, graph_node_name_mapping)

Create the mapping between first node and last node in fused sequence.

Parameters:
  • node_list – node name list

  • pattern_mapping – key: node name, val: node pattern mapping

  • graph_info – key: node name, val: node details

  • graph_node_name_mapping – key: node name, val: node

Returns:

key: first node name in fused seq, val: last node in fused seq fused_mapping_reverse: key: last node in fused seq, val: first node name in fused seq

Return type:

fused_mapping

inspect_activation(node_list, graph_def, graph_node_name_mapping, quantization_cfg, dataloader, iteration_list, graph_info)

Inspect the activation.

inspect_tensor(model, dataloader=None, op_list=[], iteration_list=[], inspect_type='activation', save_to_disk=False, save_path=None, quantization_cfg=None)

Dump the weight and activation(output) to local disk.

  1. create the correspondence between query node name and the actually output node name in graph_def

  2. get the weight and bias for the given node

  3. get the activation for the given node

4. save the tensor to disk :param model: int8/fp32 graph_def/TensorflowBaseModel :param dataloader: dataloader used during inspect activation :param op_list: op list to inspect :param iteration_list: iteration list to inspect, start from 1 :param inspect_type: activation/weight/all :param save_to_disk: dump to disk or not :param save_path: the dump path for inspect tensor :param quantization_cfg: quantization configuration for fused fp32 model and quantized model

Returns:

Dict
{
‘weight’: {

‘node0_name’: {‘weight0_name’: numpy.array, ‘bias0_name’: numpy.array, …}, ‘node1_name’: {‘weight1_name’: numpy.array, ‘bias1_name’: numpy.array, …}, …

}, ‘activation’: [

# iter 1:
{

‘node0_name’: {‘output0_name’: numpy.array, ‘output1_name’: numpy.array, …} ‘node1_name’: {‘output1_name’: numpy.array, ‘output1_name’: numpy.array, …} …

},

# iter 2:

{

… }

]

}

quantize_input(model)

Quantize the model to be able to take quantized input.

Remove graph QuantizedV2 op and move its input tensor to QuantizedConv2d and calculate the min-max scale.

Parameters:

model (tf.compat.v1.GraphDef) – The model to quantize input

Returns:

The quantized input model scale (float): The scale for dataloader to generate quantized input

Return type:

model (tf.compat.v1.GraphDef)

get_optype_wise_ability()

Get the op type wise capability by generating the union value of each op type.

Returns:

the key is op type while the value is the

detail configurations of activation and weight for this op type.

Return type:

[string dict]

save(model, path)

Save model to the path.

convert(model, source, destination)

The function is used to convert a source model format to another.

Parameters:
  • model (neural_compressor.model) – base model to be converted.

  • source (string) – The source model format.

  • destination (string) – The destination model format.

qat_convert(model, quantize_recipe=None)

Convert a fp32 ‘tf.keras’ model to be a int8 one with quantization aware training implementation.

Parameters:
  • model (tf.keras.Model) – The model to be quantized, expected to be a Keras Functional or Sequential model.

  • quantize_recipe (dict) – A dict that decide whether given layers should be quantized.

Returns:

Quantized model with fake quant nodes inserted.

Return type:

converted_model (tf.keras.Model)

recover_tuned_model(model, q_config)

Execute the recover process on the specified model.

Parameters:
  • tune_cfg (dict) – quantization configuration

  • model (tf.compat.v1.GraphDef) – fp32 model

  • q_config (dict) – recover configuration

Returns:

the quantized model

Return type:

tf.compat.v1.GraphDef

diagnosis_helper(fp32_model, quan_model, tune_cfg, save_path)

Tensorflow diagnosis helper function.

get_output_op_names(qmodel)

Get the oupur OPs’s names.

calculate_op_sensitivity(model, dataloader, tune_cfg, output_op_names, confidence_batches, fallback=True, requantize_cfgs=None)

Compute the op sensitivity.

The sensitivity metric is the mse between the output of the last quantized op of the quantized model and the output of its corresponding op in the fp32 model.

  1. Backup the tune cfg

  2. Fallback each int8 op and compute its mse if use fallback (with ‘fallback == True’),

or re-quantize each fp32 op(fallen back in the previous stage) and compute its MSE if not.

  1. Sorted op name list according to its MSE

Parameters:
  • fp32_model – The fp32 model.

  • dataloader – the dataloader with full dataset.

  • tune_cfg – tuning config

  • fallback – denote fallback stage or re-quantize stage

  • requantize_cfgs – the dict of tuning configs for all re-quantizable ops

Returns:

A list of op names, sorted by its MSE sensitivity.

class neural_compressor.adaptor.tensorflow.Tensorflow_ITEXAdaptor(framework_specific_info)

Bases: TensorFlowAdaptor

Tensorflow ITEX Adaptor Class.

quantize(tune_cfg, model, data_loader, q_func=None)

Execute the quantize process on the specified model.

Parameters:
  • tune_cfg (dict) – quantization configuration

  • model (tf.compat.v1.GraphDef) – fp32 model

  • data_loader (generator) – generator the data and labels

  • q_func (optional) – training function for quantization aware training mode, which not enabled for tensorflow yet.

Returns:

the quantized model

Return type:

tf.compat.v1.GraphDef

class neural_compressor.adaptor.tensorflow.TensorflowQuery(local_config_file=None, performance_only=False)

Bases: neural_compressor.adaptor.query.QueryBackendCapability

Tensorflow Query Capability Class.

get_version()

Get the current backend version infomation.

Returns:

version string.

Return type:

[string]

get_precisions()

Get supported precisions for current backend.

Returns:

the precisions’ name.

Return type:

[string list]

get_op_types()

Get the supported op types by all precisions.

Returns:

A list composed of dictionary which key is precision and value is the op types.

Return type:

[dictionary list]

get_fuse_patterns()

Get supported patterns by low precisions.

Returns:

A list composed of dictionary which key is precision and value is the supported patterns.

Return type:

[dictionary list]

get_quantization_capability()

Get the supported op types’ quantization capability.

Returns:

A list composed of dictionary which key is precision and value is a dict that describes all op types’ quantization capability.

Return type:

[dictionary list]

get_op_types_by_precision(precision)

Get op types per precision.

Parameters:

precision (string) – precision name

Returns:

A list composed of op type.

Return type:

[string list]

get_mixed_precision_combination()

Get the valid mixed precisions.

Returns:

valid precision list.

Return type:

[string list]

get_grappler_optimization_cfg()

Get grappler optimization configuration.

get_bf16_patterns()

Get BF16 pattern list.

Returns:

bf16 patter list.

Return type:

[List]

get_eightbit_patterns(qdq_enabled=False)

Get eightbit op wise sequences information.

Returns:

key is the op type while value is the list of sequences start

with the op type same as key value.

Return type:

[dictionary]

generate_internal_patterns()

Translate the patterns defined in the yaml to internal pattern expression.