Framework YAML Configuration Files

  1. Introduction

  2. Supported Feature Matrix

  3. Get Started with Framework YAML Files

Introduction

Intel® Neural Compressor uses YAML files for quick and user-friendly configurations. There are two types of YAML files - user YAML files and framework YAML files, which are used in running user cases and setting up framework capabilities, respectively.

Here, we introduce the framework YAML file, which describes the behavior of a specific framework. There is a corresponding framework YAML file for each framework supported by Intel® Neural Compressor - TensorFlow , Intel® Extension for TensorFlow*, PyTorch, Intel® Extension for PyTorch*, ONNX Runtime, and MXNet.

Note: Before diving to the details, we recommend that the end users do NOT make modifications unless they have clear requirements that can only be met by modifying the attributes.

Supported Feature Matrix

Framework YAML Configuration Files
TensorFlow
PyTorch
ONNX
MXNet

Get started with Framework YAML Files

For the purpose of framework setup, let’s take a look at a tensorflow framework YAML file; other framework YAML files follow same syntax. A framework YAML file specifies following information and capabilities for current runtime framework. Let’s go through them one by one:

  • version: This specifies the supported versions.

  version:
    name: ['2.1.0', '2.2.0', '2.3.0', '2.4.0', '2.5.0', '2.6.0', '2.6.1', '2.6.2', '2.7.0', '2.8.0', '1.15.0-up1', '1.15.0-up2']
  • precisions: This defines the supported precisions of specific versions.

  precisions: 
    names: int8, uint8, bf16, fp32
    valid_mixed_precisions: []
  • op: This defines a list of valid OP types for each precision.

  ops: 
    int8: ['Conv2D', 'MatMul', 'ConcatV2', 'MaxPool', 'AvgPool']
    uint8: ['Conv2D', 'DepthwiseConv2dNative', 'MatMul', 'ConcatV2', 'MaxPool', 'AvgPool']
    bf16: ['Conv2D']  
    fp32: ['*'] # '*' means all op types
  • capabilities: This defines the quantization ability of specific ops, such as granularity, scheme, and algorithm. The activation assumes that input and output activations share the same data type by default, which is based on op semantics defined by frameworks.

  capabilities: 
    int8: {
          'Conv2D': {
            'weight': {
                        'dtype': ['int8', 'fp32'],
                        'scheme': ['sym'],
                        'granularity': ['per_channel','per_tensor'],
                        'algorithm': ['minmax']
                        },
            'activation': {
                        'dtype': ['int8', 'fp32'],
                        'scheme': ['sym'],
                        'granularity': ['per_tensor'],
                        'algorithm': ['minmax', 'kl']
                        }
                    },
          'MatMul': {
            'weight': {
                        'dtype': ['int8', 'fp32'],
                        'scheme': ['sym'],
                        'granularity': ['per_tensor'],
                        'algorithm': ['minmax']
                        },
            'activation': {
                        'dtype': ['int8', 'fp32'],
                        'scheme': ['asym', 'sym'],
                        'granularity': ['per_tensor'],
                        'algorithm': ['minmax']
                        }
                    },
          'default': {
            'activation': {
                        'dtype': ['uint8', 'fp32'],
                        'algorithm': ['minmax'],
                        'scheme': ['sym'],
                        'granularity': ['per_tensor']
                        }
                    },
          }

    uint8: {
          'Conv2D': {
            'weight': {
                        'dtype': ['int8', 'fp32'],
                        'scheme': ['sym'],
                        'granularity': ['per_channel','per_tensor'],
                        'algorithm': ['minmax']
                        },
            'activation': {
                        'dtype': ['uint8', 'fp32'],
                        'scheme': ['sym'],
                        'granularity': ['per_tensor'],
                        'algorithm': ['minmax', 'kl']
                        }
                    },
          'MatMul': {
            'weight': {
                        'dtype': ['int8', 'fp32'],
                        'scheme': ['sym'],
                        'granularity': ['per_tensor'],
                        'algorithm': ['minmax']
                        },
            'activation': {
                        'dtype': ['uint8', 'fp32'],
                        'scheme': ['asym', 'sym'],
                        'granularity': ['per_tensor'],
                        'algorithm': ['minmax']
                        }
                    },
          'default': {
            'activation': {
                        'dtype': ['uint8', 'fp32'],
                        'algorithm': ['minmax'],
                        'scheme': ['sym'],
                        'granularity': ['per_tensor']
                        }
                    },
          }
  • patterns: This defines the supported fusion sequence for each op.

  patterns: 
    fp32: [ 
        'Conv2D + Add + Relu',
        'Conv2D + Add + Relu6',
        'Conv2D + Relu',
        'Conv2D + Relu6',
        'Conv2D + BiasAdd'
        ]
    int8: [
        'Conv2D + BiasAdd',
        'Conv2D + BiasAdd + Relu',
        'Conv2D + BiasAdd + Relu6'
        ]
    uint8: [
        'Conv2D + BiasAdd + AddN + Relu',
        'Conv2D + BiasAdd + AddN + Relu6',
        'Conv2D + BiasAdd + AddV2 + Relu',
        'Conv2D + BiasAdd + AddV2 + Relu6',
        'Conv2D + BiasAdd + Add + Relu',
        'Conv2D + BiasAdd + Add + Relu6',
        'Conv2D + BiasAdd + Relu',
        'Conv2D + BiasAdd + Relu6',
        'Conv2D + Add + Relu',
        'Conv2D + Add + Relu6',
        'Conv2D + Relu',
        'Conv2D + Relu6',
        'Conv2D + BiasAdd',
        'DepthwiseConv2dNative + BiasAdd + Relu6',
        'DepthwiseConv2dNative + BiasAdd + Relu',
        'DepthwiseConv2dNative + Add + Relu6',
        'DepthwiseConv2dNative + BiasAdd',
        'MatMul + BiasAdd + Relu',
        'MatMul + BiasAdd',
  ]
  • grappler_optimization: This defines the grappler optimization.

  grappler_optimization: 
    pruning: True                                    # optional. grappler pruning optimizer,default value is True.
    shape: True                                      # optional. grappler shape optimizer,default value is True.
    constfold: False                                 # optional. grappler constant folding optimizer, default value is True.
    arithmetic: False                                # optional. grappler arithmetic optimizer,default value is False.
    dependency: True                                 # optional. grappler dependency optimizer,default value is True.
    debug_stripper: True                             # optional. grappler debug_stripper optimizer,default value is True.
    loop: True                                       # optional. grappler loop optimizer,default value is True.