Performance =========== ## Overview This page shows performance boost with Intel® Extension for PyTorch\* on several popular topologies. ## Performance Data for Intel® AI Data Center Products Find the latest performance data for 4th gen Intel® Xeon® Scalable processors and 3rd gen Intel® Xeon® processors, including detailed hardware and software configurations, at [Intel® Developer Zone article](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/performance.html). ## INT8 with v1.11 ### Performance Numbers
| Hardware | Workload1 | Precision | Throughput Inference2 | Realtime Inference3 | Model Type | Dataset | Input Data Shape | Tunable Parameters | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Batch Size | Boost Ratio | Batch Size | Boost Ratio | |||||||
| Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz | ResNet50 | INT8 | 80 | 1.83x | 1 | 1.44x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; inference scripts |
| SSD-ResNet34 | INT8 | 80 | 2.16x | 1 | 1.83x | Computer Vision | COCO | Input shape [3, 1200, 1200] |
Default memory allocator; Intel(R) OpenMP; inference scripts |
|
| ResNext 32x16d | INT8 | 80 | 1.81x | 1 | 1.21x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; inference scripts |
|
| VGG-11 | INT8 | 80 | 1.75x | 1 | 1.19x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; inference scripts |
|
| ShuffleNetv2_x1.0 | INT8 | 80 | 2.07x | 1 | 1.47x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; |
|
| BERT-Large | INT8 | 80 | 2.78x | 1 | 2.04x | NLP | Squad | max_seq_len=384 Task: Question Answering |
Jemalloc; Intel(R) OpenMP; inference scripts |
|
| Bert-Base | INT8 | 80 | 2.05x | 1 | 1.96x | NLP | MRPC | max_seq_len=128 Task: Text Classification |
Jemalloc; Intel(R) OpenMP; inference scripts |
|
| DistilBERT-Base | INT8 | 80 | 2.12x | 1 | 1.57x | NLP | Squad | max_seq_len=384 Task: Question Answering |
Jemalloc; Intel(R) OpenMP; inference scripts |
|
| Workload | Metric | FP32 | INT8 | INT8/FP32 |
|---|---|---|---|---|
| BERT-base_text_classification | f1 | 0.81 | 0.81 | 99.79% |
| BERT-Large | f1 | 93.16 | 93.02 | 99.85% |
| Distilbert-base | f1 | 86.84 | 86.13 | 99.19% |
| ResNet50 | Top1 | 76.15 | 75.98 | 99.78% |
| ResNext 32x16d | Top1 | 84.17 | 84.05 | 99.86% |
| SSD-ResNet34 | mAP | 0.200 | 0.199 | 99.48% |
| VGG11 | Top1 | 69.04 | 67.96 | 98.44% |
| Shufflenetv2_x1.0 | Top1 | 69.36 | 67.92 | 97.93%1 |
| Hardware | Workload1 | Precision | Throughput Inference2 | Real-time Inference3 | Model Type | Dataset | Input Data Shape | Tunable Parameters | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Batch Size | Boost Ratio | Batch Size | Boost Ratio | |||||||
| AWS EC2 C6i.2xlarge | ResNet50 | Float32 | 64 | 1.24x | 1 | 1.31x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; inference scripts |
| ResNext 32x16d | Float32 | 64 | 1.07x | 1 | 1.05x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; inference scripts |
|
| VGG-11 | Float32 | 64 | 1.15x | 1 | 1.21x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; inference scripts |
|
| ShuffleNetv2_x1.0 | Float32 | 64 | 1.12x | 1 | 1.30x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; |
|
| MobileNet v2 | Float32 | 64 | 1.08x | 1 | 1.12x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; |
|
| BERT-Large | Float32 | 64 | 1.05x | 1 | 1.03x | NLP | Squad | max_seq_len=384 Task: Question Answering |
Default memory allocator; Intel(R) OpenMP; inference scripts; Recommend to set auto_kernel_selection to ON when seq_len exceeds 64 |
|
| Bert-Base | Float32 | 64 | 1.08x | 1 | 1.09x | NLP | MRPC | max_seq_len=128 Task: Text Classification |
Jemalloc; Intel(R) OpenMP; inference scripts; Recommend to set auto_kernel_selection to ON when seq_len exceeds 128 |
|
| Hardware | Workload1 | Precision | Throughput Inference2 | Real-time Inference3 | Model Type | Dataset | Input Data Shape | Tunable Parameters | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Batch Size | Boost Ratio | Batch Size | Boost Ratio | |||||||
| Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz | ResNet50 | Float32 | 80 | 1.39x | 1 | 1.35x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; inference scripts |
| SSD-ResNet34 | Float32 | 160 | 1.55x | 1 | 1.06x | Computer Vision | COCO | Input shape [3, 1200, 1200] |
Default memory allocator; Intel(R) OpenMP; inference scripts |
|
| ResNext 32x16d | Float32 | 80 | 1.08x | 1 | 1.08x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; inference scripts |
|
| Faster R-CNN ResNet50 FPN | Float32 | 80 | 1.71x | 1 | 1.07x | Computer Vision | COCO | Input shape [3, 1200, 1200] |
Default memory allocator; Intel(R) OpenMP; inference scripts |
|
| VGG-11 | Float32 | 160 | 1.20x | 1 | 1.13x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; inference scripts |
|
| ShuffleNetv2_x1.0 | Float32 | 160 | 1.32x | 1 | 1.20x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; |
|
| MobileNet v2 | Float32 | 160 | 1.48x | 1 | 1.12x | Computer Vision | ImageNet | Input shape [3, 224, 224] |
Default memory allocator; Intel(R) OpenMP; |
|
| DLRM | Float32 | 80 | 1.11x | 1 | - | Recommendation | Terabyte | - | Default memory allocator; Intel(R) OpenMP; inference scripts |
|
| BERT-Large | Float32 | 80 | 1.14x | 1 | 1.02x | NLP | Squad | max_seq_len=384 Task: Question Answering |
Default memory allocator; Intel(R) OpenMP; inference scripts; Recommend to set auto_kernel_selection to ON when seq_len exceeds 64 |
|
| Bert-Base | Float32 | 160 | 1.10x | 1 | 1.33x | NLP | MRPC | max_seq_len=128 Task: Text Classification |
Jemalloc; Intel(R) OpenMP; inference scripts; Recommend to set auto_kernel_selection to ON when seq_len exceeds 128 |
|
| Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz | BERT-Large | BFloat16 | 56 | 1.67x | 1 | 1.45x | NLP | Squad | max_seq_len=384 Task: Question Answering |
Jemalloc; Intel(R) OpenMP; inference scripts |
| Bert-Base | BFloat16 | 112 | 1.77x | 1 | 1.18x | NLP | MRPC | max_seq_len=128 Task: Text Classification |
Jemalloc; Intel(R) OpenMP; inference scripts |
|