neural_compressor.adaptor.tf_utils.graph_converter

Graph Converter Class.

Module Contents

Classes

GraphConverter

Graph Converter Class is used to generate the quantization graph.

class neural_compressor.adaptor.tf_utils.graph_converter.GraphConverter(model, qt_config={}, recipes={}, int8_sequences={}, fp32_ops=[], bf16_ops=[], data_loader=None, fake_quant=False, itex_mode=False, qdq_enabled=False, new_api=False, performance_only=False, use_bf16=False)

Graph Converter Class is used to generate the quantization graph.

convert()

Do convertion.

Including:
  1. optimize fp32_frozen_graph,

  2. quantize graph,

  3. calibration,

  4. fuse RequantizeOp with fused quantized conv, and so on.

  5. bf16 convert if the self.bf16_ops is not empty

Returns:

quantize()

Quantize graph only (without optimizing fp32 graph).

Including:
  1. quantize graph,

  2. calibration,

  3. fuse RequantizeOp with fused quantized conv, and so on.

Returns:

bf16_convert()

Convert fp32 nodes in bf16_node to bf16 dtype based on FP32 + INT8 mixed precision graph.

quantize_with_qdq_pattern()

Quantize model by inserting QDQ.

step 1: insert QDQ pairs and update node info step 2: convert Q-DQ-node-Q-DQ to Q-newAPI node-DQ