# Accelerate AlexNet by Quantization with Intel® Extenstion for Tensorflow* ## Background Low-precision inference can speed up inference obviously, by converting the FP32 model to INT8 or BF16 model. Intel provides hardware technology to accelerate the low precision model on Intel CPU & GPU: 1. Intel® Deep Learning Boost: It is present in the Second Generation Intel® Xeon® Scalable Processors and newer Xeon®, which supports to speed up INT8 and BF16 model by hardware. 2. Intel GPU which supports INT8. Intel® Neural Compressor helps the user to simplify the processing to convert the FP32 model to INT8. At the same time, Intel® Neural Compressor will tune the quantization method to reduce the accuracy loss, which is a big blocker for low-precision inference. Intel® Neural Compressor is released in Intel® AI Analytics Toolkit and works with Intel® Optimization of TensorFlow*. Please refer to the official website for detailed info and news: [https://github.com/intel/neural-compressor](https://github.com/intel/neural-compressor) ## Introduction By Intel® Extenstion for Tensorflow*, it's easy to quantize FP32 model to INT8 model and be accelerated on Intel CPU and GPU. The example reuses the existed End-To-End example: [Intel® Neural Compressor Sample for TensorFlow](https://github.com/intel/neural-compressor/tree/master/examples/notebook/tensorflow/alexnet_mnist) provided by Intel® Neural Compressor, to show a pipeline to build up a CNN model to recognize handwriting number and speed up AI model with quantization by Intel® Neural Compressor. The original example is designed to run on Intel CPU. After installing Intel® Extenstion for Tensorflow*, it could run on Intel CPU and GPU. All steps follow this existed example. **There is no any code to be changed.** Please read the example guide for detailed information. We will learn the acceleration of AI inference by Intel AI technology: 1. Intel® Deep Learning Boost on CPU 2. Intel GPU which supports INT8 3. Intel® Neural Compressor 4. Intel® Extenstion for Tensorflow* ## Hardware Environment The example can run on Intel CPU & GPU by Intel® Extenstion for Tensorflow*. ### CPU This demo is recommended to use 2nd Generation Intel® Xeon® Scalable Processors or newer, which include: 1. Intel® AVX512 instruction to speed up training & inference AI model. 2. Intel® Deep Learning Boost: Vector Neural Network Instruction (VNNI) to accelerate AI/DL Inference with INT8/BF16 Model. With Intel® Deep Learning Boost, the performance will be increased obviously. Without it, maybe it's 1.x times of FP32. #### Intel® DevCloud If you have no such CPU support Intel® Deep Learning Boost, you could register to Intel® DevCloud and try this example on new Xeon with Intel® Deep Learning Boost freely. To learn more about working with Intel® DevCloud, please refer to [Intel® DevCloud](https://www.intel.com/content/www/us/en/developer/tools/devcloud/overview.html) ### GPU Support: Intel® Data Center Flex Series GPU. For local server, please install the GPU driver and oneAPI packages by refer to [Intel GPU Software Installation](/docs/install/install_for_gpu.md). For Intel® DevCloud, the GPU driver and oneAPI packages are already installed. ## Running Environment ### Set up Base Running Environment Please refer to the example: [Intel® Neural Compressor Sample for TensorFlow](https://github.com/intel/neural-compressor/tree/master/examples/notebook/tensorflow/alexnet_mnist) to setup running environment. There are new requirements: 1. Python should be 3.9 or newer version. 2. Tensorflow should be 2.10.0 or newer version. ### Set up Intel® Extenstion for Tensorflow* Please install Intel® Extenstion for Tensorflow* in the running envrionment: 1. CPU ``` python -m pip install --upgrade intel-extension-for-tensorflow[cpu] ``` 2. GPU ``` python -m pip install --upgrade intel-extension-for-tensorflow[gpu] ``` ## Execute Please refer to the example: [Intel® Neural Compressor Sample for TensorFlow](https://github.com/intel/neural-compressor/tree/master/examples/notebook/tensorflow/alexnet_mnist) to execute the sample code and check the result.