Intel® Extension for TensorFlow* for C++

This guide shows how to build an Intel® Extension for TensorFlow* CC library from source and how to work with tensorflow_cc to build bindings for C/C++ languages on Ubuntu.

Requirements

Hardware Requirements

Verified Hardware Platforms:

Common Requirements

Install Bazel

To build Intel® Extension for TensorFlow*, install Bazel 5.3.0. Refer to install Bazel.

Here are the recommended commands:

$ wget https://github.com/bazelbuild/bazel/releases/download/5.3.0/bazel-5.3.0-installer-linux-x86_64.sh
$ bash bazel-5.3.0-installer-linux-x86_64.sh --user

Check Bazel is installed successfully and is version 5.3.0:

$ bazel --version

Download Source Code

$ git clone https://github.com/intel/intel-extension-for-tensorflow.git intel-extension-for-tensorflow
$ cd intel-extension-for-tensorflow/

Create a Conda Environment

  1. Install Conda.

  2. Create Virtual Running Environment

$ conda create -n itex_build python=3.10
$ conda activate itex_build

Note, we support Python versions 3.8 through 3.11.

Install TensorFlow

Install TensorFlow 2.15.0, and refer to Install TensorFlow for details.

$ pip install tensorflow==2.15.0

Check TensorFlow was installed successfully and is version 2.15.0:

$ python -c "import tensorflow as tf;print(tf.__version__)"

Extra Requirements for XPU/GPU Build Only

Install Intel GPU Driver

Install the Intel GPU Driver in the building server, which is needed to build with GPU support and AOT (Ahead-of-time compilation).

Refer to Install Intel GPU driver for details.

Note:

  1. Make sure to install developer runtime packages before building Intel® Extension for TensorFlow*.

  2. AOT (Ahead-of-time compilation)

    AOT is a compiling option that reduces the initialization time of GPU kernels at startup time by creating the binary code for a specified hardware platform during compiling. AOT will make the installation package larger but improve performance time.

    Without AOT, Intel® Extension for TensorFlow* will be translated to binary code for local hardware platform during startup. That will prolong startup time when using a GPU to several minutes or more.

    For more information, refer to Use AOT for Integrated Graphics (Intel GPU).

Install oneAPI Base Toolkit

We recommend you install the oneAPI base toolkit using sudo (or as root user) to the system directory /opt/intel/oneapi.

The following commands assume the oneAPI base tookit is installed in /opt/intel/oneapi. If you installed it in some other folder, please update the oneAPI path as appropriate.

Refer to Install oneAPI Base Toolkit Packages

The oneAPI base toolkit provides compiler and libraries needed by Intel® Extension for TensorFlow*.

Enable oneAPI components:

$ source /opt/intel/oneapi/compiler/latest/env/vars.sh
$ source /opt/intel/oneapi/mkl/latest/env/vars.sh

Build Intel® Extension for TensorFlow* CC library

Configure

Configure For CPU

Configure the system build by running the ./configure command at the root of your cloned Intel® Extension for TensorFlow* source tree.

$ ./configure

Choose n to build for CPU only. Refer to Configure Example.

Configure For GPU

Configure the system build by running the ./configure command at the root of your cloned Intel® Extension for TensorFlow* source tree. This script prompts you for the location of Intel® Extension for TensorFlow* dependencies and asks for additional build configuration options (path to DPC++ compiler, for example).

$ ./configure
  • Choose Y for Intel GPU support. Refer to Configure Example.

  • Specify the Location of Compiler (DPC++).

    Default is /opt/intel/oneapi/compiler/latest/linux/, which is the default installed path. Click Enter to confirm default location.

    If it’s differenct, confirm the compiler (DPC++) installed path and fill the correct path.

  • Specify the Ahead of Time (AOT) Compilation Platforms.

    Default is ‘’, which means no AOT.

    Fill one or more device type strings of special hardware platforms, such as ats-m150, acm-g11.

    Here is the list of GPUs we’ve verified:

GPU device type
Intel® Data Center GPU Flex Series 170 ats-m150
Intel® Data Center GPU Flex Series 140 ats-m75
Intel® Data Center GPU Max Series pvc
Intel® Arc™ A730M acm-g10
Intel® Arc™ A380 acm-g11

Please refer to the Available GPU Platforms section in the end of the Ahead of Time Compilation document for more device types or create an issue to ask support.

To get the full list of supported device types, use the OpenCL™ Offline Compiler (OCLOC) tool (which is installed as part of the GPU driver), and run the following command, please look for -device <device_type> field of the output:

ocloc compile --help
  • Choose to Build with oneMKL Support.

    We recommend choosing y.

    Default is /opt/intel/oneapi/mkl/latest, which is the default installed path. Click Enter to confirm default location.

    If it’s wrong, please confirm the oneMKL installed path and fill the correct path.

Build Source Code

For GPU support

$ bazel build -c opt --config=gpu //itex:libitex_gpu_cc.so

CC library location: <Path to intel-extension-for-tensorflow>/bazel-bin/itex/libitex_gpu_cc.so

NOTE: libitex_gpu_cc.so is depended on libitex_gpu_xetla.so, so libitex_gpu_xetla.so shoule be copied to the same diretcory of libitex_gpu_cc.so

$ cd <Path to intel-extension-for-tensorflow>
$ cp bazel-out/k8-opt-ST-*/bin/itex/core/kernels/gpu/libitex_gpu_xetla.so bazel-bin/itex/

For CPU support

$ bazel build -c opt --config=cpu //itex:libitex_cpu_cc.so

If you want to build with threadpool, you should add buid options --define=build_with_threadpool=true and environment variables ITEX_OMP_THREADPOOL=0

$ bazel build -c opt --config=cpu --define=build_with_threadpool=true //itex:libitex_cpu_cc.so

CC library location: <Path to intel-extension-for-tensorflow>/bazel-bin/itex/libitex_cpu_cc.so

NOTE: libitex_cpu_cc.so is depended on libiomp5.so, so libiomp5.so shoule be copied to the same diretcory of libitex_cpu_cc.so

$ cd <Path to intel-extension-for-tensorflow>
$ cp bazel-out/k8-opt-ST-*/bin/external/llvm_openmp/libiomp5.so bazel-bin/itex/

Prepare Tensorflow* CC library and header files

Option 2: Build from TensorFlow* source code

a. Prepare TensorFlow* source code

$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ git checkout origin/r2.14 -b r2.14

b. Build libtensorflow_cc.so

$ ./configure
$ bazel build --jobs 96 --config=opt //tensorflow:libtensorflow_cc.so
$ ls ./bazel-bin/tensorflow/libtensorflow_cc.so

libtensorflow_cc.so location: <Path to tensorflow>/bazel-bin/tensorflow/libtensorflow_cc.so

c. Create symbolic link for libtensorflow_framework.so

$ cd ./bazel-bin/tensorflow/
$ ln -s libtensorflow_framework.so.2 libtensorflow_framework.so

libtensorflow_framework.so location: <Path to tensorflow>/bazel-bin/tensorflow/libtensorflow_framework.so

c. Build Tensorflow header files

$ bazel build --config=opt tensorflow:install_headers
$ ls ./bazel-bin/tensorflow/include

Tensorflow header file location: <Path to tensorflow>/bazel-bin/tensorflow/include

Integrate the CC library

Linker

Configure the linker environmental variables with Intel® Extension for TensorFlow* CC library (libitex_gpu_cc.so or libitex_cpu_cc.so) path:

$ export LIBRARY_PATH=$LIBRARY_PATH:<Path to intel-extension-for-tensorflow>/bazel-bin/itex/
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<Path to intel-extension-for-tensorflow>/bazel-bin/itex/

Load

TensorFlow* has C API: TF_LoadPluggableDeviceLibrary to support the pluggable device library. To support Intel® Extension for TensorFlow* cc library, we need to modify the original C++ code:

a. Add the header file: "tensorflow/c/c_api_experimental.h".

#include "tensorflow/c/c_api_experimental.h"

b. Load libitex_gpu_cc.so or libitex_cpu_cc.so by TF_LoadPluggableDeviceLibrary.

TF_Status* status = TF_NewStatus();
TF_LoadPluggableDeviceLibrary(<lib_path>, status);

Example

The original simple example for using TensorFlow* C++ API.

// example.cc
#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"

int main() {
  using namespace tensorflow;
  using namespace tensorflow::ops;

  Scope root = Scope::NewRootScope();
  auto X = Variable(root, {5, 2}, DataType::DT_FLOAT);
  auto assign_x = Assign(root, X, RandomNormal(root, {5, 2}, DataType::DT_FLOAT));
  auto Y = Variable(root, {2, 3}, DataType::DT_FLOAT);
  auto assign_y = Assign(root, Y, RandomNormal(root, {2, 3}, DataType::DT_FLOAT));
  auto Z = Const(root, 2.f, {5, 3});
  auto V = MatMul(root, assign_x, assign_y);  
  auto VZ = Add(root, V, Z);

  std::vector<Tensor> outputs;
  ClientSession session(root);
  // Run and fetch VZ
  TF_CHECK_OK(session.Run({VZ}, &outputs));
  LOG(INFO) << "Output:\n" << outputs[0].matrix<float>();
  return 0;
}

The updated example with Intel® Extension for TensorFlow* enabled

// example.cc
#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"
+ #include "tensorflow/c/c_api_experimental.h"

int main() {
  using namespace tensorflow;
  using namespace tensorflow::ops;

+  TF_Status* status = TF_NewStatus();
+  string xpu_lib_path = "libitex_gpu_cc.so";
+  TF_LoadPluggableDeviceLibrary(xpu_lib_path.c_str(), status);
+  TF_Code code = TF_GetCode(status);
+  if ( code == TF_OK ) {
+      LOG(INFO) << "intel-extension-for-tensorflow load successfully!";
+  } else {
+      string status_msg(TF_Message(status));
+      LOG(WARNING) << "Could not load intel-extension-for-tensorflow, please check! " << status_msg;
+  }

  Scope root = Scope::NewRootScope();
  auto X = Variable(root, {5, 2}, DataType::DT_FLOAT);
  auto assign_x = Assign(root, X, RandomNormal(root, {5, 2}, DataType::DT_FLOAT));
  auto Y = Variable(root, {2, 3}, DataType::DT_FLOAT);
  auto assign_y = Assign(root, Y, RandomNormal(root, {2, 3}, DataType::DT_FLOAT));
  auto Z = Const(root, 2.f, {5, 3});
  auto V = MatMul(root, assign_x, assign_y);  
  auto VZ = Add(root, V, Z);

  std::vector<Tensor> outputs;
  ClientSession session(root);
  // Run and fetch VZ
  TF_CHECK_OK(session.Run({VZ}, &outputs));
  LOG(INFO) << "Output:\n" << outputs[0].matrix<float>();
  return 0;
}

Build and run

Place a Makefile file in the same directory of example.cc with the following contents:

  • Replace <TF_INCLUDE_PATH> with local Tensorflow* header file path. e.g. <Path to tensorflow_src>/tensorflow/include

  • Replace <TFCC_PATH> with local Tensorflow* CC library path. e.g. <Path to tensorflow_src>/tensorflow/

// Makefile
target = example_test
cc = g++
TF_INCLUDE_PATH = <TF_INCLUDE_PATH>
TFCC_PATH = <TFCC_PATH>
include = -I $(TF_INCLUDE_PATH)
lib = -L $(TFCC_PATH) -ltensorflow_framework -ltensorflow_cc
flag = -Wl,-rpath=$(TFCC_PATH) -std=c++17
source = ./example.cc
$(target): $(source)
	$(cc) $(source) -o $(target) $(include) $(lib) $(flag)
clean:
	rm $(target)
run:
	./$(target)

Go to the directory of example.cc and Makefile, then build and run example.

$ make
$ ./example_test

NOTE: For GPU support, please set up oneapi environment variables before running the example.

$ source /opt/intel/oneapi/compiler/latest/env/vars.sh
$ source /opt/intel/oneapi/mkl/latest/env/vars.sh