# Welcome to Intel® Extension for TensorFlow* documentation ## Documentation

Overview
Infrastructure		Quick example			Examples			Releases
Performance data		Frequently asked questions						Contributing guidelines
Installation guide
Install for CPU		Install for XPU			Install by source build			Install Conda for GPU distributed
Features
Environment variables	Python API		Next Pluggable Device				CPU Thread Pool
Graph optimization	Custom operator		Advanced auto mixed precision				Operator override
INT8 quantization		XPUAutoShard		GPU profiler		CPU launcher		Weight prepack
Advanced topics
CPU practice guide		GPU practice guide			C++ API support		OpenXLA		Keras 3
Developer Guide
Extension design		Directory structure			Optimizations design			Custom Op

## Highlights * Environment variables & Python API Generally, the default configuration of Intel® Extension for TensorFlow\* provides good performance without any code changes. Intel® Extension for TensorFlow\* also provides simple frontend Python APIs and utilities for advanced users to get more optimized performance with only minor code changes for different kinds of application scenarios. Typically, you only need to add two or three clauses to the original code. * Next Pluggable Device (NPD) The Next Pluggable Device (NPD) represents an advanced generation of TensorFlow plugin mechanisms. It not only facilitates a seamless integration of new accelerator plugins for registering devices with TensorFlow without requiring modifications to the TensorFlow codebase, but it also serves as a conduit to OpenXLA via its PJRT plugin. This innovative approach significantly streamlines the process of extending TensorFlow's capabilities with new hardware accelerators, enhancing both efficiency and flexibility. * Advanced auto mixed precision (AMP) Low precision data types `bfloat16` and` float16` are natively supported by the `3rd Generation Xeon® Scalable Processors`, codenamed [Cooper Lake](https://ark.intel.com/content/www/us/en/ark/products/series/204098/3rd-generation-intel-xeon-scalable-processors.html), with `AVX512` instruction set and the Intel® Data Center GPU, which further boosts performance and uses less memory. The lower-precision data types supported by Advanced Auto Mixed Precision (AMP) are fully enabled in Intel® Extension for TensorFlow*. * Graph optimization Intel® Extension for TensorFlow\* provides graph optimization to fuse specific operator patterns to a new single operator for better performance, such as `Conv2D+ReLU` or `Linear+ReLU`. The benefits of the fusions are delivered to users in a transparent fashion. * CPU Thread Pool Intel® Extension for TensorFlow\* uses OMP thread pool by default since it has better performance and scaling for most cases. For workloads with large inter-op concurrency, you can switch to use Eigen thread pool (default in TensorFlow) by setting the environment variable `ITEX_OMP_THREADPOOL=0`. * Operator optimization Intel® Extension for TensorFlow\* also optimizes operators and implements several customized operators for a performance boost. The `itex.ops` namespace is used to extend TensorFlow public APIs implementation for better performance. * GPU profiler Intel® Extension for TensorFlow\* provides support for TensorFlow [Profiler](https://www.tensorflow.org/guide/profiler). To enable the profiler, define three environment variables ( `export ZE_ENABLE_TRACING_LAYER=1`, `export UseCyclesPerSecondTimer=1`, `export ENABLE_TF_PROFILER=1`) * INT8 quantization Intel® Extension for TensorFlow* co-works with [Intel® Neural Compressor](https://github.com/intel/neural-compressor) to provide compatible TensorFlow INT8 quantization solution support with equivalent user experience. * XPUAutoShard on GPU [Experimental] Intel® Extension for TensorFlow\* provides XPUAutoShard feature to automatically shard the input data and the TensorFlow graph, placing these data/graph shards on GPU devices to maximize the hardware usage. * OpenXLA Intel® Extension for TensorFlow\* adopts a uniform Device API PJRT as the supported device plugin mechanism to implement Intel GPU backend for OpenXLA support on TensorFlow frontend. * Keras 3 Keras 3 with TensorFlow comes with a significant enhancement - the Just-In-Time (JIT) compilation is enabled by default. This feature leverages the XLA (Accelerated Linear Algebra) compiler to optimize TensorFlow computations. See Keras 3 to avoid possible performance issues and error.