Intel® Extension for TensorFlow* for C++

This guide shows how to build an Intel® Extension for TensorFlow* CC library from source and how to work with tensorflow_cc to build bindings for C/C++ languages on Ubuntu.

Requirements

Hardware Requirements

Verified Hardware Platforms:

Common Requirements

Install Bazel

To build Intel® Extension for TensorFlow*, install Bazel 5.3.0. Refer to install Bazel.

Here are the recommended commands:

$ wget https://github.com/bazelbuild/bazel/releases/download/5.3.0/bazel-5.3.0-installer-linux-x86_64.sh
$ bash bazel-5.3.0-installer-linux-x86_64.sh --user

Check Bazel is installed successfully and is version 5.3.0:

$ bazel --version

Download Source Code

$ git clone https://github.com/intel/intel-extension-for-tensorflow.git intel-extension-for-tensorflow
$ cd intel-extension-for-tensorflow/

Create a Conda Environment

Install Conda.
Create Virtual Running Environment

$ conda create -n itex_build python=3.10
$ conda activate itex_build

Note, we support Python versions 3.8 through 3.11.

Install TensorFlow

Install TensorFlow 2.14.0, and refer to Install TensorFlow for details.

$ pip install tensorflow==2.14.0

Check TensorFlow was installed successfully and is version 2.14.0:

$ python -c "import tensorflow as tf;print(tf.__version__)"

Extra Requirements for XPU/GPU Build Only

Install Intel GPU Driver

Install the Intel GPU Driver in the building server, which is needed to build with GPU support and AOT (Ahead-of-time compilation).

Refer to Install Intel GPU driver for details.

Note:

Make sure to install developer runtime packages before building Intel® Extension for TensorFlow*.
AOT (Ahead-of-time compilation)

AOT is a compiling option that reduces the initialization time of GPU kernels at startup time by creating the binary code for a specified hardware platform during compiling. AOT will make the installation package larger but improve performance time.

Without AOT, Intel® Extension for TensorFlow* will be translated to binary code for local hardware platform during startup. That will prolong startup time when using a GPU to several minutes or more.

For more information, refer to Use AOT for Integrated Graphics (Intel GPU).

Install oneAPI Base Toolkit

We recommend you install the oneAPI base toolkit using sudo (or as root user) to the system directory /opt/intel/oneapi.

The following commands assume the oneAPI base tookit is installed in /opt/intel/oneapi. If you installed it in some other folder, please update the oneAPI path as appropriate.

Refer to Install oneAPI Base Toolkit Packages

The oneAPI base toolkit provides compiler and libraries needed by Intel® Extension for TensorFlow*.

Enable oneAPI components:

$ source /opt/intel/oneapi/compiler/latest/env/vars.sh
$ source /opt/intel/oneapi/mkl/latest/env/vars.sh

Build Intel® Extension for TensorFlow* CC library

Configure

Configure For CPU

Configure the system build by running the ./configure command at the root of your cloned Intel® Extension for TensorFlow* source tree.

$ ./configure

Choose n to build for CPU only. Refer to Configure Example.

Configure For GPU

Configure the system build by running the ./configure command at the root of your cloned Intel® Extension for TensorFlow* source tree. This script prompts you for the location of Intel® Extension for TensorFlow* dependencies and asks for additional build configuration options (path to DPC++ compiler, for example).

$ ./configure

Choose Y for Intel GPU support. Refer to Configure Example.
Specify the Location of Compiler (DPC++).

Default is /opt/intel/oneapi/compiler/latest/linux/, which is the default installed path. Click Enter to confirm default location.

If it’s differenct, confirm the compiler (DPC++) installed path and fill the correct path.
Specify the Ahead of Time (AOT) Compilation Platforms.

Default is ‘’, which means no AOT.

Fill one or more device type strings of special hardware platforms, such as ats-m150, acm-g11.

Here is the list of GPUs we’ve verified:

GPU	device type
Intel® Data Center GPU Flex Series 170	ats-m150
Intel® Data Center GPU Flex Series 140	ats-m75
Intel® Data Center GPU Max Series	pvc
Intel® Arc™ A730M	acm-g10
Intel® Arc™ A380	acm-g11

To learn how to get the device type, please refer to Use AOT for Integrated Graphics (Intel GPU) or create an issue to ask support.

Choose to Build with oneMKL Support.

We recommend choosing y.

Default is /opt/intel/oneapi/mkl/latest, which is the default installed path. Click Enter to confirm default location.

If it’s wrong, please confirm the oneMKL installed path and fill the correct path.

Build Source Code

For GPU support

$ bazel build -c opt --config=gpu //itex:libitex_gpu_cc.so

CC library location: <Path to intel-extension-for-tensorflow>/bazel-bin/itex/libitex_gpu_cc.so

NOTE: libitex_gpu_cc.so is depended on libitex_gpu_xetla.so, so libitex_gpu_xetla.so shoule be copied to the same diretcory of libitex_gpu_cc.so

$ cd <Path to intel-extension-for-tensorflow>
$ cp bazel-out/k8-opt-ST-*/bin/itex/core/kernels/gpu/libitex_gpu_xetla.so bazel-bin/itex/

For CPU support

$ bazel build -c opt --config=cpu //itex:libitex_cpu_cc.so

If you want to build with threadpool, you should add buid options --define=build_with_threadpool=true and environment variables ITEX_OMP_THREADPOOL=0

$ bazel build -c opt --config=cpu --define=build_with_threadpool=true //itex:libitex_cpu_cc.so

CC library location: <Path to intel-extension-for-tensorflow>/bazel-bin/itex/libitex_cpu_cc.so

NOTE: libitex_cpu_cc.so is depended on libiomp5.so, so libiomp5.so shoule be copied to the same diretcory of libitex_cpu_cc.so

$ cd <Path to intel-extension-for-tensorflow>
$ cp bazel-out/k8-opt-ST-*/bin/external/llvm_openmp/libiomp5.so bazel-bin/itex/

Prepare Tensorflow* CC library and header files

Option 1: Extract from Tensorflow* python package (Recommended)

a. Download Tensorflow* 2.14.0 python package

$ wget https://files.pythonhosted.org/packages/09/63/25e76075081ea98ec48f23929cefee58be0b42212e38074a9ec5c19e838c/tensorflow-2.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

b. Unzip Tensorflow* python package

$ unzip tensorflow-2.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -d tensorflow_src

c. Create symbolic link

$ cd ./tensorflow_src.13.0/tensorflow/
$ ln -s libtensorflow_cc.so.2 libtensorflow_cc.so
$ ln -s libtensorflow_framework.so.2 libtensorflow_framework.so

libtensorflow_cc.so location: <Path to tensorflow_src>/tensorflow/libtensorflow_cc.so

libtensorflow_framework.so location: <Path to tensorflow_src>/tensorflow/libtensorflow_framework.so

Tensorflow header file location: <Path to tensorflow_src>/tensorflow/include

Option 2: Build from TensorFlow* source code

a. Prepare TensorFlow* source code

$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ git checkout origin/r2.14 -b r2.14

b. Build libtensorflow_cc.so

$ ./configure
$ bazel build --jobs 96 --config=opt //tensorflow:libtensorflow_cc.so
$ ls ./bazel-bin/tensorflow/libtensorflow_cc.so

libtensorflow_cc.so location: <Path to tensorflow>/bazel-bin/tensorflow/libtensorflow_cc.so

c. Create symbolic link for libtensorflow_framework.so

$ cd ./bazel-bin/tensorflow/
$ ln -s libtensorflow_framework.so.2 libtensorflow_framework.so

libtensorflow_framework.so location: <Path to tensorflow>/bazel-bin/tensorflow/libtensorflow_framework.so

c. Build Tensorflow header files

$ bazel build --config=opt tensorflow:install_headers
$ ls ./bazel-bin/tensorflow/include

Tensorflow header file location: <Path to tensorflow>/bazel-bin/tensorflow/include

Integrate the CC library

Linker

Configure the linker environmental variables with Intel® Extension for TensorFlow* CC library (libitex_gpu_cc.so or libitex_cpu_cc.so) path:

$ export LIBRARY_PATH=$LIBRARY_PATH:<Path to intel-extension-for-tensorflow>/bazel-bin/itex/
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<Path to intel-extension-for-tensorflow>/bazel-bin/itex/

Load

TensorFlow* has C API: TF_LoadPluggableDeviceLibrary to support the pluggable device library. To support Intel® Extension for TensorFlow* cc library, we need to modify the original C++ code:

a. Add the header file: "tensorflow/c/c_api_experimental.h".

#include "tensorflow/c/c_api_experimental.h"

b. Load libitex_gpu_cc.so or libitex_cpu_cc.so by TF_LoadPluggableDeviceLibrary.

TF_Status* status = TF_NewStatus();
TF_LoadPluggableDeviceLibrary(<lib_path>, status);

Example

The original simple example for using TensorFlow* C++ API.

// example.cc
#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"

int main() {
  using namespace tensorflow;
  using namespace tensorflow::ops;

  Scope root = Scope::NewRootScope();
  auto X = Variable(root, {5, 2}, DataType::DT_FLOAT);
  auto assign_x = Assign(root, X, RandomNormal(root, {5, 2}, DataType::DT_FLOAT));
  auto Y = Variable(root, {2, 3}, DataType::DT_FLOAT);
  auto assign_y = Assign(root, Y, RandomNormal(root, {2, 3}, DataType::DT_FLOAT));
  auto Z = Const(root, 2.f, {5, 3});
  auto V = MatMul(root, assign_x, assign_y);  
  auto VZ = Add(root, V, Z);

  std::vector<Tensor> outputs;
  ClientSession session(root);
  // Run and fetch VZ
  TF_CHECK_OK(session.Run({VZ}, &outputs));
  LOG(INFO) << "Output:\n" << outputs[0].matrix<float>();
  return 0;
}

The updated example with Intel® Extension for TensorFlow* enabled

// example.cc
#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"
+ #include "tensorflow/c/c_api_experimental.h"

int main() {
  using namespace tensorflow;
  using namespace tensorflow::ops;

+  TF_Status* status = TF_NewStatus();
+  string xpu_lib_path = "libitex_gpu_cc.so";
+  TF_LoadPluggableDeviceLibrary(xpu_lib_path.c_str(), status);
+  TF_Code code = TF_GetCode(status);
+  if ( code == TF_OK ) {
+      LOG(INFO) << "intel-extension-for-tensorflow load successfully!";
+  } else {
+      string status_msg(TF_Message(status));
+      LOG(WARNING) << "Could not load intel-extension-for-tensorflow, please check! " << status_msg;
+  }

  Scope root = Scope::NewRootScope();
  auto X = Variable(root, {5, 2}, DataType::DT_FLOAT);
  auto assign_x = Assign(root, X, RandomNormal(root, {5, 2}, DataType::DT_FLOAT));
  auto Y = Variable(root, {2, 3}, DataType::DT_FLOAT);
  auto assign_y = Assign(root, Y, RandomNormal(root, {2, 3}, DataType::DT_FLOAT));
  auto Z = Const(root, 2.f, {5, 3});
  auto V = MatMul(root, assign_x, assign_y);  
  auto VZ = Add(root, V, Z);

  std::vector<Tensor> outputs;
  ClientSession session(root);
  // Run and fetch VZ
  TF_CHECK_OK(session.Run({VZ}, &outputs));
  LOG(INFO) << "Output:\n" << outputs[0].matrix<float>();
  return 0;
}

Build and run

Place a Makefile file in the same directory of example.cc with the following contents:

Replace <TF_INCLUDE_PATH> with local Tensorflow* header file path. e.g. <Path to tensorflow_src>/tensorflow/include
Replace <TFCC_PATH> with local Tensorflow* CC library path. e.g. <Path to tensorflow_src>/tensorflow/

// Makefile
target = example_test
cc = g++
TF_INCLUDE_PATH = <TF_INCLUDE_PATH>
TFCC_PATH = <TFCC_PATH>
include = -I $(TF_INCLUDE_PATH)
lib = -L $(TFCC_PATH) -ltensorflow_framework -ltensorflow_cc
flag = -Wl,-rpath=$(TFCC_PATH) -std=c++17
source = ./example.cc
$(target): $(source)
	$(cc) $(source) -o $(target) $(include) $(lib) $(flag)
clean:
	rm $(target)
run:
	./$(target)

Go to the directory of example.cc and Makefile, then build and run example.

$ make
$ ./example_test

NOTE: For GPU support, please set up oneapi environment variables before running the example.

$ source /opt/intel/oneapi/compiler/latest/env/vars.sh
$ source /opt/intel/oneapi/mkl/latest/env/vars.sh