Performance

Overview

This page shows performance boost with Intel® Extension for PyTorch* on several popular topologies.

Performance Numbers

Hardware Workload1 Precision Throughput Inference2 Realtime Inference3 Model Type Dataset Misc.
Batch Size Boost Ratio Batch Size Boost Ratio
Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz ResNet50 Float32 80 1.39x 1 1.35x Computer Vision ImageNet Input shape
[3, 224, 224]
SSD-ResNet34 Float32 160 1.55x 1 1.06x Computer Vision COCO Input shape
[3, 1200, 1200]
ResNext 32x16d Float32 80 1.08x 1 1.08x Computer Vision ImageNet Input shape
[3, 224, 224]
Faster R-CNN ResNet50 FPN Float32 80 1.71x 1 1.07x Computer Vision COCO Input shape
[3, 1200, 1200]
VGG-11 Float32 160 1.20x 1 1.13x Computer Vision ImageNet Input shape
[3, 224, 224]
ShuffleNetv2_x1.0 Float32 160 1.32x 1 1.20x Computer Vision ImageNet Input shape
[3, 224, 224]
MobileNet v2 Float32 160 1.48x 1 1.12x Computer Vision ImageNet Input shape
[3, 224, 224]
DLRM Float32 80 1.11x 1 - Recommendation Terabyte -
BERT-Large Float32 80 1.14x 1 1.02x NLP Squad max_seq_len=384
Task: Question Answering
Bert-Base Float32 160 1.10x 1 1.33x NLP MRPC max_seq_len=128
Task: Text Classification
Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz BERT-Large BFloat16 56 1.67x 1 1.45x NLP Squad max_seq_len=384
Task: Question Answering
Bert-Base BFloat16 112 1.77x 1 1.18x NLP MRPC max_seq_len=128
Task: Text Classification

1. Model Zoo for Intel® Architecture
2. Throughput inference runs with single instance per socket.
3. Realtime inference runs with multiple instances, 4 cores per instance.

Note: Performance numbers with stock PyTorch are measured with its most performant configuration.

Configuration

Software Version

Software Version
PyTorch v1.10.1
Intel® Extension for PyTorch* v1.10.100

Hardware Configuration

3rd Generation Intel® Xeon® Scalable Processors Products formerly Cooper Lake
CPU Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz
Number of nodes 1 1
Number of sockets 2 2
Cores/Socket 40 28
Threads/Core 2 2
uCode 0xd0002a0 0x700001c
Hyper-Threading ON ON
TurboBoost ON ON
BIOS version 04.12.02 WLYDCRB1.SYS.0016.P29.2006080250
Number of DDR Memory slots 16 12
Capacity of DDR memory per slot 16GB 64GB
DDR frequency 3200 3200
Total Memory/Node (DDR+DCPMM) 256GB 768GB
Host OS CentOS Linux release 8.4.2105 Ubuntu 18.04.4 LTS
Host Kernel 4.18.0-305.10.2.el8_4.x86_64 4.15.0-76-generic
Docker OS Ubuntu 18.04.5 LTS Ubuntu 18.04.5 LTS
Spectre-Meltdown Mitigation Mitigated Mitigated