Welcome to Intel® Extension for TensorFlow* documentation

Documentation

Overview
Infrastructure		Quick example			Examples			Releases
Performance data		Frequently asked questions						Contributing guidelines
Installation guide
Install for CPU		Install for XPU			Install by source build			Install Conda for GPU distributed
Features
Environment variables	Python API		Next Pluggable Device				CPU Thread Pool
Graph optimization	Custom operator		Advanced auto mixed precision				Operator override
INT8 quantization		XPUAutoShard		GPU profiler		CPU launcher		Weight prepack
Advanced topics
CPU practice guide		GPU practice guide			C++ API support		OpenXLA		Keras 3
Developer Guide
Extension design		Directory structure			Optimizations design			Custom Op

Highlights

Environment variables & Python API

Generally, the default configuration of Intel® Extension for TensorFlow* provides good performance without any code changes. Intel® Extension for TensorFlow* also provides simple frontend Python APIs and utilities for advanced users to get more optimized performance with only minor code changes for different kinds of application scenarios. Typically, you only need to add two or three clauses to the original code.
Next Pluggable Device (NPD)

The Next Pluggable Device (NPD) represents an advanced generation of TensorFlow plugin mechanisms. It not only facilitates a seamless integration of new accelerator plugins for registering devices with TensorFlow without requiring modifications to the TensorFlow codebase, but it also serves as a conduit to OpenXLA via its PJRT plugin. This innovative approach significantly streamlines the process of extending TensorFlow’s capabilities with new hardware accelerators, enhancing both efficiency and flexibility.
Advanced auto mixed precision (AMP)

Low precision data types bfloat16 and float16 are natively supported by the 3rd Generation Xeon® Scalable Processors, codenamed Cooper Lake, with AVX512 instruction set and the Intel® Data Center GPU, which further boosts performance and uses less memory. The lower-precision data types supported by Advanced Auto Mixed Precision (AMP) are fully enabled in Intel® Extension for TensorFlow*.
Graph optimization

Intel® Extension for TensorFlow* provides graph optimization to fuse specific operator patterns to a new single operator for better performance, such as Conv2D+ReLU or Linear+ReLU. The benefits of the fusions are delivered to users in a transparent fashion.
CPU Thread Pool

Intel® Extension for TensorFlow* uses OMP thread pool by default since it has better performance and scaling for most cases. For workloads with large inter-op concurrency, you can switch to use Eigen thread pool (default in TensorFlow) by setting the environment variable ITEX_OMP_THREADPOOL=0.
Operator optimization

Intel® Extension for TensorFlow* also optimizes operators and implements several customized operators for a performance boost. The itex.ops namespace is used to extend TensorFlow public APIs implementation for better performance.
GPU profiler

Intel® Extension for TensorFlow* provides support for TensorFlow Profiler. To enable the profiler, define three environment variables ( export ZE_ENABLE_TRACING_LAYER=1, export UseCyclesPerSecondTimer=1, export ENABLE_TF_PROFILER=1)
INT8 quantization

Intel® Extension for TensorFlow* co-works with Intel® Neural Compressor to provide compatible TensorFlow INT8 quantization solution support with equivalent user experience.
XPUAutoShard on GPU [Experimental]

Intel® Extension for TensorFlow* provides XPUAutoShard feature to automatically shard the input data and the TensorFlow graph, placing these data/graph shards on GPU devices to maximize the hardware usage.
OpenXLA

Intel® Extension for TensorFlow* adopts a uniform Device API PJRT as the supported device plugin mechanism to implement Intel GPU backend for OpenXLA support on TensorFlow frontend.
Keras 3 Keras 3 with TensorFlow comes with a significant enhancement - the Just-In-Time (JIT) compilation is enabled by default. This feature leverages the XLA (Accelerated Linear Algebra) compiler to optimize TensorFlow computations. See Keras 3 to avoid possible performance issues and error.