Tune Advanced Auto Mixed Precision

Background

Numeric Stability

Using FP16 or BF16 will impact the model accuracy and lead to a Numeric Stability issue.

Some operations are numerically-safe for Float16/BFloat16. This means the operation based on FP16/BF16 has no obviously accuracy loss compared to FP32.

Some operations are numerically-dangerous for FP16/BF16. This means the operation based on FP16/BF16 has obviously accuracy loss compared to FP32.

Configuration List

In order to achieve faster performance with strong numeric stability, Advanced Auto Mixed Precision (AMP) maintains four lists: ALLOWLIST, DENYLIST, INFERLIST, and CLEARLIST that let you manually configure a balance of performance and accuracy with FP16/BF16 if the default configuration doesn’t provide the expected performance.

Set the lists according to the numerically-safe or numerically-dangerous type of the operations. The lists include the Operation Types of TensorFlow, and support fused Operations.

List Name Description
ALLOWLIST Include a set of operations that are always considered numerically-safe and performance-critical for FP16/BF16. The operations in ALLOWLIST are always converted to FP16/BF16.
DENYLIST Include a set of operations that are considered numerically-dangerous for execution in Float16/BFloat16). Additionally, they will affect the downstream nodes, making them numerically-dangerous too. For example, in graph: Exp -> Add, the Add is numerically-dangerous due to the Exp).
INFERLIST Include a set of operations that are considered numerically-safe for FP16/BF16, but they will be numerically-dangerous if impacted by an upstream node that is in the DENYLIST.
CLEARLIST Include a set of operations that have no numerically-significant effects for FP16/BF16, and can run in FP16/BF16. According to the downstream/upstream nodes’ numerically-safe property, they could be set to Float16/BFloat16 if desired. They are used to reduce conversion between FP16/BF16 and FP32 in graph to improve performance.

Example of Mix Precision by List

Here is an example to explain the principle.

List Node Index
ALLOWLIST 6, 9
DENYLIST 1, 11
INFERLIST 2, 4, 7, 10
CLEARLIST 3, 5, 8

amp_list.png

Steps:

I. Set every node’s property according to the configuration of list (including default and custom setting).

II. Add nodes whose type is in ALLOWLIST to allow set.

Node 6, 9.

III. Add nodes to deny set.

  • The nodes in DENYLIST are added to deny set: Node 1, 11.

  • The nodes in INFERLIST whose upstream nodes are in deny set (ignore upstream nodes which are in CLEARLIST): Node 2, 4.

  • The nodes in CLEARLIST whose upstream and downstream nodes are in deny set: Node 3.

IV. Add nodes to allow set.

  • The nodes in INFERLIST whose upstream nodes are in allow set: Node 7, 10.

  • The nodes in CLEARLIST whose upstream or downstream nodes are in allow set: Node 5, 8.

V. Change nodes data type and insert Cast nodes. Insert Cast nodes between deny set (FP32) and allow set (FP16/BF16), which convert data type between FP32 and FP16/BF16.

Rule to Improve Performance by the Configuration List

  • Adding more nodes to Allow will increase the performance.

  • Reducing Cast nodes in a different allow set will increase the performance.

Usage

You can set these lists manually to tune Advance AMP by using the Python API, or by setting environment variables to override the default settings. Values set using the Python API will override those set using environment variables. Settings are prioritized in this order: Python API > Environment Variable > Default Setting.

Python API

Create object:

import intel_extension_for_tensorflow as itex
auto_mixed_precision_options = itex.AutoMixedPrecosionOptions()
Python APIs Definition
itex.AutoMixedPrecosionOptions Use both 16-bit and 32-bit floating-point types during training, which makes models run faster and use less memory. Now, GPU supports both FP16 and BF16, and CPU only support BF16.

Python API Attribute & Environment Variable

Attribute Environment Variable Names Description
data_type ITEX_AUTO_MIXED_PRECISION_DATA_TYPE Low precision data type used in Advanced AMP
Three options: DEFAULT_DATA_TYPE,FLOAT16, BFLOAT16.
DEFAULT_DATA_TYPE is BF16 in CPU and GPU.
CPU only supports BF16, GPU supports both FP16 and BF16.
unsafe_force_all ITEX_AUTO_MIXED_PRECISION_UNSAFE_FORCE_ALL Convert all FP32 operations to FP16/BF16 operations.
Only support Float16 data type.
allowlist_add ITEX_AUTO_MIXED_PRECISION_ALLOWLIST_ADD String. The operation types list added to ALLOWLIST. Use "," to split multiple operation types.
denylist_add ITEX_AUTO_MIXED_PRECISION_DENYLIST_ADD String. The operation types list added to DENYLIST. Use "," to split multiple operation types.
inferlist_add ITEX_AUTO_MIXED_PRECISION_INFERLIST_ADD String. The operation types list added to INFERLIST. Use "," to split multiple operation types.
clearlist_add ITEX_AUTO_MIXED_PRECISION_CLEARLIST_ADD String. The operation types list added to CLEARLIST. Use "," to split multiple operation types.
allowlist_remove ITEX_AUTO_MIXED_PRECISION_ALLOWLIST_REMOVE String. The operation types list removed from ALLOWLIST. Use "," to split multiple operation types.
denylist_remove ITEX_AUTO_MIXED_PRECISION_DENYLIST_REMOVE String. The operation types list removed from DENYLIST. Use "," to split multiple operation types.
inferlist_remove ITEX_AUTO_MIXED_PRECISION_INFERLIST_REMOVE String. The operation types list removed from INFERLIST. Use "," to split multiple operation types.
clearlist_remove ITEX_AUTO_MIXED_PRECISION_CLEARLIST_REMOVE String. The operation types list removed from CLEARLIST. Use "," to split multiple operation types.

Notes:

Before adding an operation type to a list, remove it from the original list.

For example: AvgPool is in INFERLIST by default. To add it to ALLOWLIST, remove it from INFERLIST and add to ALLOWLIST.

Environment Variable Difference with Stock TensorFlow

Advanced AMP has many extra operations. For example, ITEX_AUTO_MIXED_PRECISION_DATA_TYPE lets you use different data types (FP16/BF16) to speed up the model.

The following table shows the corresponding relationship between Advanced AMP and TensorFlow AMP environment variable names.

Advanced AMP Environment Variable Name TensorFlow AMP Environment Variable Name
ITEX_AUTO_MIXED_PRECISION_DATA_TYPE N/A
ITEX_AUTO_MIXED_PRECISION_LOG_PATH TF_AUTO_MIXED_PRECISION_GRAPH_REWRITE_LOG_PATH
ITEX_AUTO_MIXED_PRECISION_UNSAFE_FORCE_ALL TF_AUTO_MIXED_PRECISION_GRAPH_REWRITE_LEVEL="UNSAFE_FORCE_ALL"
ITEX_AUTO_MIXED_PRECISION_ALLOWLIST_ADD TF_AUTO_MIXED_PRECISION_GRAPH_REWRITE_ALLOWLIST_ADD
ITEX_AUTO_MIXED_PRECISION_DENYLIST_ADD TF_AUTO_MIXED_PRECISION_GRAPH_REWRITE_DENYLIST_ADD
ITEX_AUTO_MIXED_PRECISION_INFERLIST_ADD TF_AUTO_MIXED_PRECISION_GRAPH_REWRITE_INFERLIST_ADD
ITEX_AUTO_MIXED_PRECISION_CLEARLIST_ADD TF_AUTO_MIXED_PRECISION_GRAPH_REWRITE_CLEARLIST_ADD
ITEX_AUTO_MIXED_PRECISION_ALLOWLIST_REMOVE TF_AUTO_MIXED_PRECISION_GRAPH_REWRITE_ALLOWLIST_REMOVE
ITEX_AUTO_MIXED_PRECISION_DENYLIST_REMOVE TF_AUTO_MIXED_PRECISION_GRAPH_REWRITE_DENYLIST_REMOVE
ITEX_AUTO_MIXED_PRECISION_INFERLIST_REMOVE TF_AUTO_MIXED_PRECISION_GRAPH_REWRITE_INFERLIST_REMOVE
ITEX_AUTO_MIXED_PRECISION_CLEARLIST_REMOVE TF_AUTO_MIXED_PRECISION_GRAPH_REWRITE_CLEARLIST_REMOVE

Usage

Steps:

I. Install Intel® Extension for TensorFlow* in running environment.

After Installing Intel® Extension for TensorFlow*, it will automatically activate as a plugin of stock TensorFlow.

Refer to installation instructions for more details.

II. Enable Advanced AMP.

With the default configuration, in most cases the Advanced AMP will balance accuracy with performance.

Python API Environment Variable
Basic (Default configuration) import intel_extension_for_tensorflow as itex

auto_mixed_precision_options = itex.AutoMixedPrecisionOptions()
auto_mixed_precision_options.data_type = itex.BFLOAT16 #itex.FLOAT16

graph_options = itex.GraphOptions()
graph_options.auto_mixed_precision_options=auto_mixed_precision_options
graph_options.auto_mixed_precision = itex.ON

config = itex.ConfigProto(graph_options=graph_options)
itex.set_config(config)
export ITEX_AUTO_MIXED_PRECISION=1
export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE="BFLOAT16" #"FLOAT16"

III. Use the Python API or environment variables to manually tune Advanced AMP for better performance, accuracy, or both.

Python API Environment Variable
Advanced Configuration auto_mixed_precision_options.allowlist_add= "AvgPool3D,AvgPool"
auto_mixed_precision_options.inferlist_remove = "AvgPool3D,AvgPool"
export ITEX_AUTO_MIXED_PRECISION_ALLOWLIST_ADD="AvgPool3D,AvgPool"
export ITEX_AUTO_MIXED_PRECISION_INFERLIST_REMOVE="AvgPool3D,AvgPool"

Example

End-to-end Example

Train a CNN model with Advanced AMP on GPU, and show the performance improvement. The following guide shows how to tune AMP manually. Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU

The first epoch may be slower because TensorFlow optimizes the model during the first run. In subsequent epochs, the run time will stabilize.

Tuning Performance Example on MobileNet

Advanced AMP already provides more aggressive sub-graph fusion in more models. To achieve better performance, you may manually tune the Advanced AMP configuration list. It allows more operations to be converted to lower precision. However, usually it cannot provide more fusion chances.

Here is an example using MobileNet to tune manually.

I. Export the optimized graph by Advanced AMP with default configuration.

Set environment variables:

export ITEX_AUTO_MIXED_PRECISION=1
export ITEX_AUTO_MIXED_PRECISION_LOG_PATH=/my/path/

After running the model inference by Intel® Extension for TensorFlow*, there will be 5 files in the path:

| Log File | Explain | |————– | ——— | | .graphdef_AutoMixedPrecision_1657011814330.pb|post-optimization graph in binary format| | .graphdef_AutoMixedPrecision_1657011814330.pb.txt|post-optimization graph in text format| | .graphdef_preop_1657011815538.pb|pre-optimization graph in binary format| | .graphdef_preop_1657011815538.pb.txt|pre-optimization graph in text format| | .paintbuckets_AutoMixedPrecision_1657011814330.txt|include detailed info of ALLOWLIST, DENYLIST, INFERLIST, CLEARLIST|

II. Check the operation data type.

Use a tool (such as Netron) to open the graph file and check the operations’ data type.

In MobileNet, only AvgPool and Softmax operations are not converted to BF16.

itex-amp-example1.png

III. Convert operations to BFloat16 manually

Move AvgPool and Softmax to Allow List:

Alternative 1: set by environment variables:

export ITEX_AUTO_MIXED_PRECISION_INFERLIST_REMOVE=Softmax,AvgPool
export ITEX_AUTO_MIXED_PRECISION_ALLOWLIST_ADD=Softmax,AvgPool

Alternative 2: set by python API:

auto_mixed_precision_options.allowlist_add= "AvgPool3D,AvgPool"
auto_mixed_precision_options.inferlist_remove = "AvgPool3D,AvgPool"

IV. Execute for Advanced AMP with updated configuration

Run the model inference by Intel® Extension for TensorFlow* with above configuration tuning list, the performance will increase a little without a drop in accuracy, because only 2 operations are converted to BF16, occupying a lower rate over the whole runtime. itex-amp-example2.png

V. Continue tuning the Advanced AMP configuration list

Repeat the above steps to tune Advanced AMP, until you reach the peak performance with desired accuracy.