Distributed Training Example with Intel® Optimization for Horovod* on Intel® GPU

Dependency

Setup Running Environment

Create Virtual Environment

python -m venv env_itex
source source env_itex/bin/activate

Install

pip install intel-extension-for-tensorflow[xpu]
pip install intel-optimization-for-horovod
pip install gin gin-config tensorflow-addons tensorflow-model-optimization tensorflow-datasets

Prepare Example Code

Clone Horovod Repo

git clone https://github.com/horovod/horovod.git
cd horovod/examples/tensorflow2

Download Patch

wget https://github.com/intel/intel-extension-for-tensorflow/raw/main/examples/train_horovod/mnist/tensorflow2_keras_mnist.patch

Apply Patch for Intel GPU

git apply tensorflow2_keras_mnist.patch

Notes: Please refer to tensorflow2_keras_mnist.py for other changes about how to enable horovod.

Execution

Enable oneAPI

Run:

source /opt/intel/oneapi/setvars.sh

Note: to install oneAPI base toolkit, refer to Intel GPU Software Installation

Check Device Count (Optional)

Run:

mpirun -np 1 -prepend-rank -ppn 1 python tensorflow2_keras_mnist.py

Check how many devices (XPUs) in local machine according output of above command, like:

...
XPU count is 2
XPU: PhysicalDevice(name='/physical_device:XPU:0', device_type='XPU')
XPU: PhysicalDevice(name='/physical_device:XPU:1', device_type='XPU')

The XPUs are 2 according to above log.

Notes: In some Intel GPU (like Intel® Data Center GPU Max Series), there are more than 1 tile per GPU card. A tile is considered as 1 XPU device. So there are total 4 XPUs if there are 2 GPU cards and 2 tiles per GPU card.

Running Command

For 2 XPUs:

mpirun -np 2 -prepend-rank -ppn 2 python ./tensorflow2_keras_mnist.py

For 4 XPUs:

mpirun -np 4 -prepend-rank -ppn 4 python ./tensorflow2_keras_mnist.py

Output

...
[0] Horovod size 2
[0] XPU count is 2
[0] XPU: PhysicalDevice(name='/physical_device:XPU:0', device_type='XPU')
[0] XPU: PhysicalDevice(name='/physical_device:XPU:1', device_type='XPU')
...
[1] 2023-05-18 13:54:12.006950: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type XPU is enabled.
[0] 2023-05-18 13:54:12.163161: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type XPU is enabled.
[1] 2023-05-18 13:54:13.940695: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type XPU is enabled.
[0] 2023-05-18 13:54:14.107809: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type XPU is enabled.
[0] 2023-05-18 13:54:14.163517: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type XPU is enabled.
[1] 2023-05-18 13:54:14.163517: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type XPU is enabled.
...
0] Epoch 1/24
250/250 [==============================] - Xs YYms/step - loss: X.XXXX - accuracy: Y.YYYY - lr: Z.ZZZZ
[0] Epoch 2/24
250/250 [==============================] - Xs YYms/step - loss: X.XXXX - accuracy: Y.YYYY - lr: Z.ZZZZ

...