# Accelerate Mask R-CNN Training on Intel GPU

## Introduction

Intel® Extension for TensorFlow* is compatible with stock TensorFlow*. 
This example shows Mask R-CNN Training. It contains single-tile training scripts and multi-tile training scripts with horovod.

Install the Intel® Extension for TensorFlow* in legacy running environment, Tensorflow will execute the Training on Intel GPU.

## Hardware Requirements

Verified Hardware Platforms:

 - Intel® Data Center GPU Max Series

## Prerequisites

### Model Code change

To get better performance, instead of installing the official repository, you can apply the patch and install it as shown here:

```
git clone https://github.com/NVIDIA/DeepLearningExamples.git
cd DeepLearningExamples/TensorFlow2/Segmentation/MaskRCNN
git checkout c481324031ecf0f70f8939516c02e16cac60446d
git apply patch  # When applying this patch, please move it to the above MaskRCNN dir first.
```

### Prepare for GPU

Refer to [Prepare](../common_guide_running.html#prepare).

### Setup Running Environment

You can use `./pip_set_env.sh` to setup for GPU. It contains the following two steps: creating virtual environment and installing python packages.

+ Create Virtual Environment

```
python -m venv env_itex
source env_itex/bin/activate
```

+ Install

```
pip install --upgrade pip
pip install --upgrade intel-extension-for-tensorflow[xpu]
pip install opencv-python-headless pybind11
pip install pycocotools
pip install -e "git+https://github.com/NVIDIA/dllogger#egg=dllogger"
```

### Enable Running Environment

Enable oneAPI running environment (only for GPU) and virtual running environment.

   * For GPU, refer to [Running](../common_guide_running.html#running)

### Prepare Dataset

Assume current_dir is `examples/train_maskrcnn/DeepLearningExamples/TensorFlow2/Segmentation/MaskRCNN`. So as the following parts.

+ Download and preprocess the [COCO 2017 dataset](http://cocodataset.org/#download).

```
cd dataset
bash download_and_preprocess_coco.sh ./data
```

## Execute the Example

Here we provide single-tile training scripts and multi-tile training scripts with horovod. The datatype can be float32 or bfloat16.

```
DATASET_DIR=./data
OUTPUT_DIR=/the/path/to/output_dir
```

+ Single tile with fp32

```
python main.py train \
--data_dir $DATASET_DIR \
--model_dir=$OUTPUT_DIR \
--train_batch_size 4 \
--seed=0 --use_synthetic_data \
--epochs 1 --steps_per_epoch 20 --log_every=1 --log_warmup_steps=1
```

+ Single tile with bf16, it requires `--amp` flag.

```
python main.py train \
--data_dir $DATASET_DIR \
--model_dir=$OUTPUT_DIR \
--train_batch_size 4 \
--amp --seed=0 --use_synthetic_data \
--epochs 1 --steps_per_epoch 20 --log_every=1 --log_warmup_steps=1
```

+ Multi-tile with horovod.

Install `intel-optimization-for-horovod`.
```
pip install intel-optimization-for-horovod
```
Default datatype is fp32. You can use `--amp` flag for bf16.

```
mpirun -np 2 -prepend-rank -ppn 2 \
python main.py train \
--data_dir $DATASET_DIR \
--model_dir=$OUTPUT_DIR \
--train_batch_size 4 \
--seed=0 --use_synthetic_data \
--epochs 1 --steps_per_epoch 20 --log_every=1 --log_warmup_steps=1
```

**Note:** Only distributed workload needs `intel-optimization-for-horovod`. Please uninstall it if you want to run single tile workload.

## FAQ

1. If you get the following error log, refer to [Enable Running Environment](#Enable-Running-Environment) to Enable oneAPI running environment.

``` 
tensorflow.python.framework.errors_impl.NotFoundError: libmkl_sycl.so.2: cannot open shared object file: No such file or directory
```