Accelerate Mask R-CNN Training on Intel GPU
Introduction
Intel® Extension for TensorFlow* is compatible with stock TensorFlow*. This example shows Mask R-CNN Training. It contains single-tile training scripts and multi-tile training scripts with horovod.
Install the Intel® Extension for TensorFlow* in legacy running environment, Tensorflow will execute the Training on Intel GPU.
Hardware Requirements
Verified Hardware Platforms:
Intel® Data Center GPU Max Series
Prerequisites
Model Code change
To get better performance, instead of installing the official repository, you can apply the patch and install it as shown here:
git clone https://github.com/NVIDIA/DeepLearningExamples.git
cd DeepLearningExamples/TensorFlow2/Segmentation/MaskRCNN
git checkout c481324031ecf0f70f8939516c02e16cac60446d
git apply patch # When applying this patch, please move it to the above MaskRCNN dir first.
Prepare for GPU
Refer to Prepare.
Setup Running Environment
You can use ./pip_set_env.sh
to setup for GPU. It contains the following two steps: creating virtual environment and installing python packages.
Create Virtual Environment
python -m venv env_itex
source env_itex/bin/activate
Install
pip install --upgrade pip
pip install --upgrade intel-extension-for-tensorflow[xpu]
pip install opencv-python-headless pybind11
pip install pycocotools
pip install -e "git+https://github.com/NVIDIA/dllogger#egg=dllogger"
Enable Running Environment
Enable oneAPI running environment (only for GPU) and virtual running environment.
For GPU, refer to Running
Prepare Dataset
Assume current_dir is examples/train_maskrcnn/DeepLearningExamples/TensorFlow2/Segmentation/MaskRCNN
. So as the following parts.
Download and preprocess the COCO 2017 dataset.
cd dataset
bash download_and_preprocess_coco.sh ./data
Execute the Example
Here we provide single-tile training scripts and multi-tile training scripts with horovod. The datatype can be float32 or bfloat16.
DATASET_DIR=./data
OUTPUT_DIR=/the/path/to/output_dir
Single tile with fp32
python main.py train \
--data_dir $DATASET_DIR \
--model_dir=$OUTPUT_DIR \
--train_batch_size 4 \
--seed=0 --use_synthetic_data \
--epochs 1 --steps_per_epoch 20 --log_every=1 --log_warmup_steps=1
Single tile with bf16, it requires
--amp
flag.
python main.py train \
--data_dir $DATASET_DIR \
--model_dir=$OUTPUT_DIR \
--train_batch_size 4 \
--amp --seed=0 --use_synthetic_data \
--epochs 1 --steps_per_epoch 20 --log_every=1 --log_warmup_steps=1
Multi-tile with horovod.
Install intel-optimization-for-horovod
.
pip install intel-optimization-for-horovod
Default datatype is fp32. You can use --amp
flag for bf16.
mpirun -np 2 -prepend-rank -ppn 2 \
python main.py train \
--data_dir $DATASET_DIR \
--model_dir=$OUTPUT_DIR \
--train_batch_size 4 \
--seed=0 --use_synthetic_data \
--epochs 1 --steps_per_epoch 20 --log_every=1 --log_warmup_steps=1
Note: Only distributed workload needs intel-optimization-for-horovod
. Please uninstall it if you want to run single tile workload.
FAQ
If you get the following error log, refer to Enable Running Environment to Enable oneAPI running environment.
tensorflow.python.framework.errors_impl.NotFoundError: libmkl_sycl.so.2: cannot open shared object file: No such file or directory