# Welcome to Intel® Extension for TensorFlow* documentation ## Documentation
Overview
Infrastructure Quick example Examples Releases
Performance data Frequently asked questions Contributing guidelines
Installation guide
Install for CPU Install for XPU Install by source build Install Conda for GPU distributed
Features
Environment variables Python API Next Pluggable Device CPU Thread Pool
Graph optimization Custom operator Advanced auto mixed precision Operator override
INT8 quantization XPUAutoShard GPU profiler CPU launcher Weight prepack
Advanced topics
CPU practice guide GPU practice guide C++ API support OpenXLA Keras 3
Developer Guide
Extension design Directory structure Optimizations design Custom Op
## Highlights * Environment variables & Python API Generally, the default configuration of Intel® Extension for TensorFlow\* provides good performance without any code changes. Intel® Extension for TensorFlow\* also provides simple frontend Python APIs and utilities for advanced users to get more optimized performance with only minor code changes for different kinds of application scenarios. Typically, you only need to add two or three clauses to the original code. * Next Pluggable Device (NPD) The Next Pluggable Device (NPD) represents an advanced generation of TensorFlow plugin mechanisms. It not only facilitates a seamless integration of new accelerator plugins for registering devices with TensorFlow without requiring modifications to the TensorFlow codebase, but it also serves as a conduit to OpenXLA via its PJRT plugin. This innovative approach significantly streamlines the process of extending TensorFlow's capabilities with new hardware accelerators, enhancing both efficiency and flexibility. * Advanced auto mixed precision (AMP) Low precision data types `bfloat16` and` float16` are natively supported by the `3rd Generation Xeon® Scalable Processors`, codenamed [Cooper Lake](https://ark.intel.com/content/www/us/en/ark/products/series/204098/3rd-generation-intel-xeon-scalable-processors.html), with `AVX512` instruction set and the Intel® Data Center GPU, which further boosts performance and uses less memory. The lower-precision data types supported by Advanced Auto Mixed Precision (AMP) are fully enabled in Intel® Extension for TensorFlow*. * Graph optimization Intel® Extension for TensorFlow\* provides graph optimization to fuse specific operator patterns to a new single operator for better performance, such as `Conv2D+ReLU` or `Linear+ReLU`. The benefits of the fusions are delivered to users in a transparent fashion. * CPU Thread Pool Intel® Extension for TensorFlow\* uses OMP thread pool by default since it has better performance and scaling for most cases. For workloads with large inter-op concurrency, you can switch to use Eigen thread pool (default in TensorFlow) by setting the environment variable `ITEX_OMP_THREADPOOL=0`. * Operator optimization Intel® Extension for TensorFlow\* also optimizes operators and implements several customized operators for a performance boost. The `itex.ops` namespace is used to extend TensorFlow public APIs implementation for better performance. * GPU profiler Intel® Extension for TensorFlow\* provides support for TensorFlow [Profiler](https://www.tensorflow.org/guide/profiler). To enable the profiler, define three environment variables ( `export ZE_ENABLE_TRACING_LAYER=1`, `export UseCyclesPerSecondTimer=1`, `export ENABLE_TF_PROFILER=1`) * INT8 quantization Intel® Extension for TensorFlow* co-works with [Intel® Neural Compressor](https://github.com/intel/neural-compressor) to provide compatible TensorFlow INT8 quantization solution support with equivalent user experience. * XPUAutoShard on GPU [Experimental] Intel® Extension for TensorFlow\* provides XPUAutoShard feature to automatically shard the input data and the TensorFlow graph, placing these data/graph shards on GPU devices to maximize the hardware usage. * OpenXLA Intel® Extension for TensorFlow\* adopts a uniform Device API PJRT as the supported device plugin mechanism to implement Intel GPU backend for OpenXLA support on TensorFlow frontend. * Keras 3 Keras 3 with TensorFlow comes with a significant enhancement - the Just-In-Time (JIT) compilation is enabled by default. This feature leverages the XLA (Accelerated Linear Algebra) compiler to optimize TensorFlow computations. See Keras 3 to avoid possible performance issues and error.