Validated Models
Intel® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in example tables, and the performance/accuracy results is available here.
Validated Quantization Examples
1.1. TensorFlow Models with TensorFlow 2.16.1
1.2. Keras Models with keras 2.15.1
1.3. PyTorch Models with Torch 2.3.0+cpu in PTQ Mode
1.4. PyTorch Models with Torch 2.3.0+cpu in QAT Mode
Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime
Validated Quantization Examples
System summary: Test by Intel on 7/22/2024. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s]), BIOS EGSDCRB1.SYS.0081.D18.2205301336, microcode 0x2b000590,
Ubuntu 24.04 LTS, gcc (GCC) 13.2.0 (Ubuntu 13.2.0-23ubuntu4), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16.
Using 1 socket, 4 cores/instance, 14 instances and batch size 1 to benchmark most of the model.
Performance varies by use, configuration and other factors.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
TensorFlow Models with TensorFlow 2.16.1
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet50 v1.0 | pb | 74.11% | 74.27% | -0.22% | 1732.92 | 578.88 | 2.99x |
ResNet50 v1.5 | pb | 76.25% | 76.46% | -0.28% | 1535.20 | 530.00 | 2.90x |
ResNet101 | pb | 77.52% | 76.45% | 1.41% | 1048.36 | 384.02 | 2.73x |
Inception V1 | pb | 70.45% | 69.74% | +1.03% | 2079.24 | 927.82 | 2.24x |
Inception V2 | pb | 74.33% | 73.97% | +0.49% | 1644.36 | 840.53 | 1.96x |
Inception V3 | pb | 76.72% | 76.75% | -0.03% | 1076.10 | 401.89 | 2.68x |
Inception V4 | pb | 80.13% | 80.27% | -0.18% | 704.96 | 199.28 | 3.54x |
Inception ResNet V2 | pb | 80.25% | 80.40% | -0.18% | 313.97 | 178.27 | 1.76x |
DenseNet-161 | pb | 76.29% | 76.29% | +0.00% | 279.20 | 214.03 | 1.30x |
MobileNet V1 | pb | 71.79% | 70.96% | +1.18% | 4199.13 | 1506.68 | 2.79x |
MobileNet V2 | pb | 72.48% | 71.76% | +1.01% | 2170.39 | 1445.05 | 1.50x |
VGG16 | pb | 72.69% | 70.89% | +2.55% | 1388.62 | 203.39 | 6.83x |
VGG19 | pb | 72.67% | 71.01% | +2.33% | 1236.12 | 169.74 | 7.28x |
ResNet50 | pb | 69.09% | 69.03% | +0.09% | 411.79 | 284.53 | 1.45x |
ResNetV2 50 | pb | 70.37% | 69.64% | +1.05% | 779.42 | 539.54 | 1.44x |
ResNetV2 101 | pb | 72.64% | 71.87% | +1.08% | 492.00 | 295.77 | 1.66x |
ResNetV2 152 | pb | 73.12% | 72.37% | +1.04% | 348.39 | 205.72 | 1.69x |
ViT | pb | 81.39% | 81.92% | -0.64% | 230.53 | 132.66 | 1.74x |
SSD ResNet50 V1 | pb | 37.91% | 38.00% | -0.24% | 135.71 | 28.75 | 4.72x |
SSD MobileNet V1 | pb | 23.00% | 23.13% | -0.57% | 1237.70 | 719.30 | 1.72x |
SSD ResNet50 v1 | ckpt | 37.88% | 38.00% | -0.31% | 130.54 | 22.05 | 5.92x |
SSD MobileNet v1 | ckpt | 22.96% | 23.13% | -0.71% | 1234.56 | 529.34 | 2.33x |
Faster R-CNN ResNet101 | pb | 30.32% | 30.39% | -0.22% | 144.21 | 22.64 | 6.37x |
Faster R-CNN ResNet50 | pb | 26.61% | 26.59% | +0.09% | 164.55 | 28.38 | 5.80x |
YOLOv3 | pb | 83.28% | 82.35% | +1.12% | 247.56 | 81.45 | 3.04x |
BERT large SQuAD | pb | 92.44% | 92.99% | -0.58% | 49.17 | 17.52 | 2.81x |
BERT large SQuAD (ONNX Model Zoo) | pb | 92.36% | 92.98% | -0.67% | 45.06 | 17.55 | 2.57x |
Transformer LT | pb | 25.82% | 25.86% | -0.15% | 28.99 | 15.77 | 1.84x |
Transformer lt MLPerf | pb | 27.13% | 27.17% | -0.13% | 10.27 | 5.08 | 2.02x |
Mask R-CNN Inception V2 | pb | 28.46% | 28.73% | -0.91% | 195.68 | 50.72 | 3.86x |
Mask R-CNN Inception V2 | ckpt | 28.46% | 28.73% | -0.91% | 206.14 | 47.04 | 4.38x |
Keras Models with keras 2.15.1
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
Inception ResNet V2 | pb | 80.25% | 80.40% | -0.18% | 313.97 | 178.27 | 1.76x |
Inception V3 | pb | 76.72% | 76.75% | -0.03% | 1076.10 | 401.89 | 2.68x |
MobileNet V2 | pb | 71.49% | 71.76% | -0.37% | 947.44 | 779.51 | 1.22x |
ResNet101 | pb | 77.52% | 76.45% | +1.41% | 1048.36 | 384.02 | 2.73x |
ResNet50 | pb | 69.09% | 69.03% | +0.09% | 411.79 | 284.53 | 1.45x |
ResNet50 | pb | 78.07% | 78.12% | -0.06% | 680.56 | 498.08 | 1.37x |
ResNetV2 101 | pb | 72.64% | 71.87% | +1.08% | 492.00 | 295.77 | 1.66x |
ResNetV2 50 | pb | 70.37% | 69.64% | +1.05% | 779.42 | 539.54 | 1.44x |
VGG16 | pb | 72.69% | 70.89% | +2.55% | 1388.62 | 203.39 | 6.83x |
VGG19 | pb | 72.67% | 71.01% | +2.33% | 1236.12 | 169.74 | 7.28x |
PyTorch Models with Torch 2.3.0+cpu in PTQ Mode
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet18 | static | 69.59% | 69.76% | -0.24% | 1707.52 | 602.47 | 2.83x |
EfficientNet-B3 | static | 77.78% | 78.54% | -0.98% | 513.82 | 360.02 | 1.43x |
PeleeNet | static | 71.83% | 72.10% | -0.37% | 837.83 | 541.66 | 1.55x |
ResNet50 | static | 75.98% | 76.15% | -0.21% | 1135.22 | 311.47 | 3.64x |
Inception V3 | static | 69.46% | 69.52% | -0.09% | 948.03 | 322.55 | 2.94x |
ResNeSt50 | static | 80.76% | 81.04% | -0.35% | 406.11 | 39.66 | 10.24x |
ResNeXt101_32x8d | static | 78.92% | 79.31% | -0.49% | 582.22 | 106.73 | 5.45x |
YOLO V3 | static | 55.10% | 54.93% | +0.31% | 156.29 | 60.30 | 2.59x |
Roberta base MRPC | static | 93.14% | 93.59% | -0.48% | 396.85 | 176.80 | 2.24x |
CamemBERT base MRPC | static | 88.58% | 89.28% | -0.78% | 405.37 | 182.87 | 2.22x |
DistilBERT base MRPC | static | 90.64% | 90.27% | +0.41% | 799.05 | 346.50 | 2.31x |
DistilBERT base MRPC | dynamic | 90.02% | 90.27% | -0.28% | 705.91 | 348.16 | 2.03x |
ALBERT base MRPC | static | 92.28% | 92.28% | 0.00% | 350.78 | 164.32 | 2.13x |
Xlm Roberta MRPC | static | 87.80% | 88.62% | -0.93% | 396.06 | 175.96 | 2.25x |
Xlm Roberta MRPC | dynamic | 88.54% | 88.24% | +0.35% | 381.19 | 175.96 | 2.17x |
BERT base MRPC | static | 89.59% | 90.42% | -0.91% | 402.42 | 177.73 | 2.26x |
BERT base COLA | static | 53.47% | 53.39% | +0.16% | 395.25 | 177.02 | 2.23x |
BERT base STSB | static | 87.61% | 88.05% | -0.49% | 397.62 | 177.23 | 2.24x |
BERT base SST-2 | static | 91.97% | 92.32% | -0.37% | 407.66 | 182.93 | 2.23x |
BERT large COLA | static | 63.39% | 63.35% | +0.06% | 147.86 | 56.01 | 2.64x |
BERT base RTE | static | 71.84% | 72.56% | -1.00% | 397.83 | 177.40 | 2.24x |
BERT large MRPC | static | 90.07% | 90.38% | -0.34% | 146.84 | 52.97 | 2.77x |
BERT large QNLI | static | 91.12% | 91.54% | -0.46% | 394.51 | 176.92 | 2.23x |
BERT large RTE | static | 73.65% | 74.01% | -0.49% | 148.84 | 55.83 | 2.67x |
Funnel MRPC | 91.94% | 92.25% | -0.34% | 294.76 | 187.41 | 1.57x | |
BERT large SQuAD | static | 92.34% | 93.16% | -0.88% | 50.21 | 18.69 | 2.69x |
lvwerra/pegasus-samsum | static | 42.32% | 42.67% | -0.82% | 102.73 | 37.99 | 2.70x |
ResNet18 PT2E | static | 69.49% | 69.76% | -0.39% | 1873.51 | 1106.97 | 1.69x |
OPT-125M PT2E | static | 37.07% | 37.90% | -2.20% | 42.09 | 29.68 | 1.42x |
PyTorch Models with Torch 2.3.0+cpu in QAT Mode
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet18 | static | 69.74% | 69.76% | -0.03% | 1717.59 | 602.65 | 2.85x |
ResNet50 | static | 76.03% | 76.15% | -0.15% | 1091.62 | 305.83 | 3.57x |
ResNeXt101_32x8d | static | 79.31% | 79.31% | 0.00% | 584.54 | 107.38 | 5.44x |
PyTorch Models with Torch 2.3.0+cpu in IPEX Mode
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
bert-large-uncased-whole-word-masking-finetuned-squad | static | 93.01% | 93.16% | -0.16% | 150.05 | 22.42 | 6.69x |
distilbert-base-uncased-distilled-squad | static | 86.10% | 86.84% | -0.85% | 1034.60 | 151.13 | 6.85x |
ONNX Models with ONNX Runtime 1.18.1
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet50 V1.5 | qlinearops | 72.18% | 72.29% | -0.16% | 1495.72 | 715.94 | 2.09x |
ResNet50 V1.5 | qdq | 72.13% | 72.29% | -0.23% | 1547.30 | 717.03 | 2.16x |
ResNet50 V1.5 MLPerf | qlinearops | 76.15% | 76.46% | -0.41% | 1365.56 | 718.55 | 1.90x |
ResNet50 V1.5 MLPerf | qdq | 76.13% | 76.46% | -0.44% | 1445.75 | 718.96 | 2.01x |
ResNet50 V1.5 (ONNX Model Zoo) | qlinearops | 74.77% | 74.99% | -0.29% | 1574.38 | 749.36 | 2.10x |
ResNet50 V1.5 (ONNX Model Zoo) | qdq | 74.78% | 74.99% | -0.27% | 1564.15 | 755.58 | 2.07x |
VGG16 | qlinearops | 66.55% | 66.69% | -0.20% | 526.57 | 162.64 | 3.24x |
VGG16 | qdq | 66.62% | 66.69% | -0.11% | 520.09 | 172.42 | 3.02x |
VGG16 (ONNX Model Zoo) | qlinearops | 72.37% | 72.40% | -0.04% | 558.81 | 162.87 | 3.43x |
VGG16 (ONNX Model Zoo) | qdq | 72.36% | 72.40% | -0.04% | 556.58 | 176.92 | 3.15x |
MobileNet V3 MLPerf | qlinearops | 75.51% | 75.74% | -0.30% | 5421.72 | 2578.08 | 2.10x |
MobileNet V3 MLPerf | qdq | 75.51% | 75.74% | -0.30% | 5382.87 | 2567.48 | 2.10x |
ShuffleNet V2 (ONNX Model Zoo) | qlinearops | 66.13% | 66.36% | -0.36% | 6426.22 | 3725.69 | 1.72x |
ShuffleNet V2 (ONNX Model Zoo) | qdq | 66.22% | 66.36% | -0.22% | 6534.24 | 3707.74 | 1.76x |
GoogleNet (ONNX Model Zoo) | qlinearops | 67.69% | 67.79% | -0.14% | 1842.90 | 1137.58 | 1.62x |
GoogleNet (ONNX Model Zoo) | qdq | 67.71% | 67.79% | -0.11% | 1818.99 | 1136.37 | 1.60x |
SqueezeNet (ONNX Model Zoo) | qlinearops | 56.49% | 56.87% | -0.67% | 9521.99 | 5530.36 | 1.72x |
SqueezeNet (ONNX Model Zoo) | qdq | 56.49% | 56.87% | -0.67% | 9391.07 | 5519.79 | 1.70x |
CaffeNet (ONNX Model Zoo) | qlinearops | 56.26% | 56.30% | -0.07% | 2949.36 | 893.77 | 3.30x |
CaffeNet (ONNX Model Zoo) | qdq | 56.26% | 56.30% | -0.08% | 2847.24 | 901.15 | 3.16x |
AlexNet (ONNX Model Zoo) | qlinearops | 54.73% | 54.79% | -0.10% | 2070.17 | 816.71 | 2.53x |
AlexNet (ONNX Model Zoo) | qdq | 54.71% | 54.79% | -0.14% | 2059.13 | 844.97 | 2.44x |
ZFNet (ONNX Model Zoo) | qlinearops | 55.83% | 55.96% | -0.24% | 858.76 | 461.25 | 1.86x |
ZFNet (ONNX Model Zoo) | qdq | 55.87% | 55.96% | -0.16% | 853.77 | 457.91 | 1.86x |
Inception V1 (ONNX Model Zoo) | qlinearops | 67.23% | 67.24% | -0.02% | 1891.36 | 1205.95 | 1.57x |
Inception V1 (ONNX Model Zoo) | qdq | 67.23% | 67.24% | -0.02% | 1879.27 | 1202.19 | 1.56x |
BEiT (ONNX Model Zoo) | qlinearops | 85.07% | 85.28% | -0.25% | 205.15 | 126.59 | 1.62x |
EfficientNet (ONNX Model Zoo) | qlinearops | 77.02% | 77.11% | -0.12% | 2428.32 | 1344.03 | 1.81x |
EfficientNet (ONNX Model Zoo) | qdq | 76.99% | 77.11% | -0.16% | 2286.73 | 1307.18 | 1.75x |
DenseNet (ONNX Model Zoo) | qlinearops | 60.53% | 60.96% | -0.71% | 626.26 | 499.76 | 1.25x |
SSD MobileNet V1 (ONNX Model Zoo) | qlinearops | 22.96% | 23.02% | -0.27% | 1121.43 | 841.32 | 1.33x |
SSD MobileNet V1 (ONNX Model Zoo) | qdq | 22.96% | 23.02% | -0.27% | 1048.50 | 798.22 | 1.31x |
DUC (ONNX Model Zoo) | qlinearops | 81.62% | 81.92% | -0.37% | 9.26 | 4.99 | 1.86x |
Ultra Face (ONNX Model Zoo) | qlinearops | 83.33% | 83.65% | -0.38% | 8993.58 | 1988.46 | 4.52x |
Emotion FERPlus (ONNX Model Zoo) | qlinearops | 7.94% | 8.00% | -0.70% | 6113.74 | 3087.50 | 1.98x |
ArcFace (ONNX Model Zoo) | qlinearops | 99.82% | 99.80% | +0.02% | 442.85 | 230.75 | 1.92x |
BERT base MRPC | qlinearops | 85.54% | 86.03% | -0.57% | 483.81 | 219.45 | 2.20x |
BERT base MRPC | qdq | 85.54% | 86.03% | -0.57% | 485.08 | 218.33 | 2.22x |
BERT base MRPC | integerops | 85.29% | 86.03% | -0.85% | 684.46 | 218.86 | 3.13x |
DistilBERT base MRPC | qdq | 84.07% | 84.56% | -0.58% | 633.28 | 399.31 | 1.59x |
DistilBERT base MRPC | integerops | 85.54% | 84.56% | +1.16% | 1388.44 | 401.08 | 3.46x |
Mobile bert MRPC | qdq | 85.54% | 86.28% | -0.85% | 505.62 | 387.43 | 1.31x |
Mobile bert MRPC | integerops | 85.54% | 86.28% | -0.85% | 565.46 | 386.39 | 1.46x |
Roberta base MRPC | integerops | 90.93% | 89.95% | +1.09% | 702.17 | 219.50 | 3.20x |
BERT SQuAD (ONNX Model Zoo) | integerops | 80.29% | 80.67% | -0.47% | 242.58 | 97.71 | 2.48x |
MobileBERT SQuAD MLPerf (ONNX Model Zoo) | integerops | 89.87% | 90.03% | -0.17% | 151.69 | 125.35 | 1.21x |
GPT2 lm head WikiText (ONNX Model Zoo) | integerops | 31.98% | 29.00% | +10.31% | 17.96 | 10.21 | 1.76x |
BERT base uncased MRPC (HuggingFace) | qlinearops | 90.21% | 90.42% | -0.23% | 434.65 | 210.58 | 2.06x |
BERT base uncased MRPC (HuggingFace) | integerops | 89.58% | 90.42% | -0.93% | 708.66 | 210.74 | 3.36x |
Roberta base MRPC (HuggingFace) | qlinearops | 91.00% | 91.38% | -0.41% | 431.37 | 211.03 | 2.04x |
Roberta base MRPC (HuggingFace) | integerops | 90.85% | 91.38% | -0.58% | 711.11 | 210.71 | 3.37x |
XLM Roberta base MRPC (HuggingFace) | qlinearops | 89.37% | 90.10% | -0.81% | 334.88 | 211.56 | 1.58x |
XLM Roberta base MRPC (HuggingFace) | integerops | 89.66% | 90.10% | -0.50% | 401.99 | 211.43 | 1.90x |
Camembert base MRPC (HuggingFace) | qlinearops | 89.28% | 89.28% | 0.00% | 282.30 | 213.33 | 1.32x |
Camembert base MRPC (HuggingFace) | integerops | 89.19% | 89.28% | -0.10% | 707.22 | 214.23 | 3.30x |
MiniLM L12 H384 uncased MRPC (HuggingFace) | qlinearops | 90.13% | 90.97% | -0.93% | 1188.05 | 578.35 | 2.05x |
MiniLM L12 H384 uncased MRPC (HuggingFace) | integerops | 91.07% | 90.97% | +0.10% | 1285.13 | 576.04 | 2.23x |
DistilBERT base uncased SST-2 (HuggingFace) | qlinearops | 90.71% | 91.06% | -0.38% | 1259.69 | 396.60 | 3.18x |
DistilBERT base uncased SST-2 (HuggingFace) | integerops | 90.25% | 91.06% | -0.88% | 914.63 | 395.09 | 2.32x |
Albert base v2 SST-2 (HuggingFace) | qlinearops | 92.09% | 92.32% | -0.25% | 284.62 | 210.52 | 1.35x |
Albert base v2 SST-2 (HuggingFace) | integerops | 91.74% | 92.32% | -0.62% | 284.69 | 210.00 | 1.36x |
MiniLM L6 H384 uncased SST-2 (HuggingFace) | qlinearops | 89.45% | 90.14% | -0.76% | 2172.98 | 1121.66 | 1.94x |
MiniLM L6 H384 uncased SST-2 (HuggingFace) | integerops | 89.91% | 90.14% | -0.26% | 2326.27 | 1114.57 | 2.09x |
BERT base cased MRPC (HuggingFace) | qlinearops | 87.70% | 88.29% | -0.67% | 494.96 | 210.80 | 2.35x |
BERT base cased MRPC (HuggingFace) | integerops | 88.19% | 88.29% | -0.12% | 714.61 | 210.99 | 3.39x |
Electra small discriminator MRPC (HuggingFace) | qlinearops | 89.92% | 89.83% | +0.09% | 1998.71 | 1115.18 | 1.79x |
Electra small discriminator MRPC (HuggingFace) | integerops | 89.27% | 89.83% | -0.63% | 2202.81 | 1121.41 | 1.96x |
BERT mini MRPC (HuggingFace) | qlinearops | 86.21% | 86.52% | -0.35% | 5767.23 | 3254.79 | 1.77x |
BERT mini MRPC (HuggingFace) | integerops | 86.16% | 86.52% | -0.41% | 6354.66 | 3424.42 | 1.86x |
Xlnet base cased MRPC (HuggingFace) | qlinearops | 90.05% | 89.86% | +0.21% | 121.24 | 95.56 | 1.27x |
Xlnet base cased MRPC (HuggingFace) | integerops | 89.58% | 89.86% | -0.31% | 123.06 | 95.60 | 1.29x |
BART large MRPC (HuggingFace) | integerops | 92.36% | 91.20% | +1.28% | 126.14 | 51.06 | 2.47x |
DeBERTa v3 base MRPC (HuggingFace) | integerops | 92.39% | 92.23% | +0.17% | 193.16 | 153.16 | 1.26x |
Spanbert SQuAD (HuggingFace) | qlinearops | 91.14% | 91.98% | -0.91% | 81.96 | 43.36 | 1.89x |
Spanbert SQuAD (HuggingFace) | integerops | 91.40% | 91.98% | -0.63% | 101.71 | 43.37 | 2.35x |
Bert base multilingual cased SQuAD (HuggingFace) | qlinearops | 88.42% | 89.13% | -0.79% | 86.33 | 43.27 | 2.00x |
Bert base multilingual cased SQuAD (HuggingFace) | integerops | 88.70% | 89.13% | -0.48% | 101.78 | 43.24 | 2.35x |
DistilBert base uncased SQuAD (HuggingFace) | qlinearops | 86.33% | 86.86% | -0.62% | 120.71 | 69.72 | 1.73x |
DistilBert base uncased SQuAD (HuggingFace) | integerops | 86.05% | 86.86% | -0.94% | 203.71 | 69.68 | 2.92x |
BERT large uncased whole word masking SQuAD (HuggingFace) | qlinearops | 92.34% | 93.16% | -0.88% | 31.81 | 12.94 | 2.46x |
BERT large uncased whole word masking SQuAD (HuggingFace) | integerops | 92.99% | 93.16% | -0.18% | 35.83 | 12.94 | 2.77x |
Roberta large SQuAD v2 (HuggingFace) | qlinearops | 89.03% | 89.02% | +0.02% | 17.61 | 13.27 | 1.33x |
Roberta large SQuAD v2 (HuggingFace) | integerops | 89.04% | 89.02% | +0.02% | 35.85 | 13.26 | 2.70x |
GPT2 WikiText (HuggingFace) | qlinearops | 30.25% | 29.00% | +4.33% | 13.85 | 10.17 | 1.36x |
GPT2 WikiText (HuggingFace) | integerops | 29.68% | 29.00% | +2.36% | 14.64 | 10.09 | 1.45x |
DistilGPT2 WikiText (HuggingFace) | qlinearops | 44.93% | 43.43% | +3.46% | 21.80 | 17.13 | 1.27x |
DistilGPT2 WikiText (HuggingFace) | integerops | 44.62% | 43.43% | +2.74% | 23.02 | 17.09 | 1.35x |
LayoutLMv3 FUNSD (HuggingFace) | integerops | 90.07% | 90.49% | -0.46% | 39.50 | 28.00 | 1.41x |
CodeBert (HuggingFace) | qlinearops | 64.97% | 65.41% | -0.67% | 75.69 | 45.10 | 1.68x |
CodeBert (HuggingFace) | integerops | 64.93% | 65.41% | -0.73% | 94.47 | 45.10 | 2.09x |
FCN (ONNX Model Zoo) | qlinearops | 64.54% | 64.98% | -0.67% | 25.83 | 12.90 | 2.00x |
FCN (ONNX Model Zoo) | qdq | 64.54% | 64.98% | -0.67% | 25.97 | 12.99 | 2.00x |
Validated Pruning Examples
Model | TaskDataset | Dense Accuracy Sparse Accuracy |
Relative Drop | Sparsity ratio Sparsity Pattern |
Comments Balanced or unbalanced ratio |
---|---|---|---|---|---|
Bert-Mini | question answeringSQuAD-v1.1 | f1=76.87f1=76.2 | -0.80% | 80%structured 4x1 | snip momentumunbalanced |
Bert-Mini | question answeringSQuAD-v1.1 | f1=76.87f1=76.2 | -0.80% | 80%structured 4x1 | snip momentumunbalanced |
Bert-Mini | question answeringSQuAD-v1.1 | f1=76.87f1=77.62 | +0.98% | 50%structured 2:4 | snip momentumbalanced |
Distilbert-base-uncased | question answeringSQuAD-v1.1 | f1=86.90f1=86.15 | -0.86% | 80%structured 4x1 | snip momentumunbalanced |
Distilbert-base-uncased | question answeringSQuAD-v1.1 | f1=86.90f1=87.50 | +0.69% | 50%structured 2:4 | snip momentumbalanced |
Bert-base-uncased | question answeringSQuAD-v1.1 | f1=88.59f1=87.78 | -0.92% | 80%structured 4x1 | snip momentumunbalanced |
Bert-base-uncased | question answeringSQuAD-v1.1 | f1=88.59f1=89.40 | +0.91% | 50%structured 2:4 | snip momentumbalanced |
Bert-large | question answeringSQuAD-v1.1 | f1=91.23f1=90.91 | -0.35% | 80%structured 4x1 | snip momentumunbalanced |
Bert-large | question answeringSQuAD-v1.1 | f1=91.23f1=91.67 | +0.48% | 50%structured 2:4 | snip momentumbalanced |
Bert-Mini | text classificationMRPC | f1=87.52f1=87.22 | -0.34% | 90%structured 4x1 | snip momentumunbalanced |
Bert-Mini | text classificationMRPC | f1=87.52f1=87.33 | -0.22% | 90%structured 4x1 | snip momentumbalanced |
Bert-Mini | text classificationMRPC | f1=87.52f1=86.89 | -0.72% | 50%structured 2:4 | snip momentumbalanced |
Bert-Mini | text classificationMRPC | f1=87.52f1=86.8 | -0.83% | 60%structured per channel | snip momentumunbalanced |
Distilbert-base-uncased | text classificationMRPC | f1=90.26f1=89.85 | -0.46% | 90%structured 4x1 | snip momentumunbalanced |
Distilbert-base-uncased | text classificationMRPC | f1=90.26f1=90.88 | +0.69% | 50%structured 2:4 | snip momentumbalanced |
Bert-Mini | text classificationSST-2 | accuracy=87.61accuracy=86.92 | -0.79% | 90%structured 4x1 | snip momentumunbalanced |
Bert-Mini | text classificationSST-2 | accuracy=87.61accuracy=87.73 | +0.14% | 50%structured 2:4 | snip momentumbalanced |
Bert-Mini | text classificationSST-2 | accuracy=87.61accuracy=86.92 | -0.79% | 50%structured per channel | snip momentumunbalanced |
ResNet50 | image recognitionImageNet | top1 acc = 78.95top1 acc = 80.10 | -1.43% | 75%structured 2x1 | snip momentumunbalanced |
YOLO-v5s6 | object detectionCOCO | AP0.50:0.95/AP0.50=0.404/0.6AP0.50:0.95/AP0.50=0.393/0.584 | -2.72% | 80%unstructured | snip momentumunbalanced |
Bert-Large | question answeringSQuAD-v1.1 | f1=91.34f1=90.7 | -0.07% | 80%structured 2x1 | group lassounbalanced |
Bert-Base | text classificationMNLI | [m, mm] = [84.57, 84.79][m, mm] = [82.45, 83.27] | [-2.51%, -1.80%] | 70%unstructured | Prune once for allbalanced |
Bert-Base | text classificationMNLI | [m, mm] = [84.57, 84.79][m, mm] = [83.20, 84.11] | [-1.62%, -0.80%] | 50%structured 1:2 | Prune once for allbalanced |
Bert-Base | text classificationSST-2 | accuracy = 92.32accuracy = 91.51 | -0.88% | 70%unstructured | Prune once for allbalanced |
Bert-Base | text classificationSST-2 | accuracy = 92.32accuracy = 92.20 | -0.13% | 50%structured 1:2 | Prune once for allbalanced |
Bert-Base | text classificationSST-2 | accuracy = 92.32accuracy = 91.97 | -0.38% | 20%unstructured | gradient sensitivitybalanced |
Bert-Base | text classificationQQP | [accuracy, f1] = [91.10, 88.05][accuracy, f1] = [90.48, 87.06] | [-0.68%, -1.12%] | 70%unstructured | Prune once for allbalanced |
Bert-Base | text classificationQQP | [accuracy, f1] = [91.10, 88.05][accuracy, f1] = [90.92, 87.78] | [-0.20%, -0.31%] | 50%structured 1:2 | Prune once for allbalanced |
Bert-Base | text classificationQNLI | accuracy = 91.54accuracy = 90.39 | -1.26% | 70%unstructured | Prune once for allbalanced |
Bert-Base | text classificationQNLI | accuracy = 91.54accuracy = 90.87 | -0.73% | 50%structured 1:2 | Prune once for allbalanced |
Bert-Base | question answering | [em, f1] = [79.34, 87.10][em, f1] = [77.27, 85.75] | [-2.61%, -1.54%] | 70%unstructured | Prune once for allbalanced |
Bert-Base | question answering | [em, f1] = [79.34, 87.10][em, f1] = [78.03, 86.50] | [-1.65%, -0.69%] | 50%structured 1:2 | Prune once for allbalanced |
Validated Knowledge Distillation Examples
Example Name | Dataset | Student (Metrics) |
Teacher (Metrics) |
Student With Distillation (Metrics Improvement) |
Student With Distributed Distillation (Metrics Improvement) |
---|---|---|---|---|---|
MobileNet example | CIFAR-10 | MobileNetV2-0.35 (0.7965 ACC) |
WideResNet40-2 (0.9522 ACC) |
0.8178 ACC (0.0213 ACC) |
0.8235 ACC (0.027 ACC) |
CNN example | CIFAR-100 | CNN-2 (0.5494 ACC) |
CNN-10 (0.7153 ACC) |
0.5540 ACC (0.0046 ACC) |
0.5523 ACC (0.0029 ACC) |
VGG example | CIFAR-100 | VGG-8-BN (0.7022 ACC) |
VGG-13-BN (0.7415 ACC) |
0.7025 ACC (0.0003 ACC) |
NA |
ResNet example | ImageNet | ResNet18 (0.6739 ACC) |
ResNet50 (0.7399 ACC) |
0.6845 ACC (0.0106 ACC) |
NA |
BlendCnn example | MRPC | BlendCnn (0.7034 ACC) |
BERT-Base (0.8382 ACC) |
0.7034 ACC (0 ACC) |
NA |
BiLSTM example | SST-2 | BiLSTM (0.8314 ACC) |
RoBERTa-Base (0.9403 ACC) |
0.9048 ACC (0.0734 ACC) |
NA |
DistilBERT example | SQuAD | DistilBERT (0.7323/0.8256 EM/F1) |
BERT-Base (0.8084/0.8814 EM/F1) |
0.7442/0.8371 EM/F1 (0.0119/0.0115 EM/F1) |
NA |
TinyBERT example | MNLI | TinyBERT (0.8018/0.8044 m/mm) |
BERT-Base (0.8363/0.8411 m/mm) |
0.8025/0.8074 m/mm (0.0007/0.0030 m/mm) |
NA |
BERT-3 example | QQP | BERT-3 (0.8626/0.8213 EM/F1) |
BERT-Base (0.9091/0.8782 EM/F1) |
0.8684/0.8259 EM/F1 (0.0058/0.0046 EM/F1) |
NA |
DistilRoBERTa example | COLA | DistilRoBERTa (0.6057 ACC) |
RoBERTa-Large (0.6455 ACC) |
0.6187 ACC (0.0130 ACC) |
NA |
Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime
Model (ONNX QDQ) | AWS c6i.2xlarge (Intel) CPU Execution Provider |
AWS c6a.2xlarge (AMD) CPU Execution Provider |
AWS c6g.2xlarge (ARM) CPU Execution Provider |
NVidia A100 CUDA Execution Provider |
---|---|---|---|---|
ResNet50 | 74.76% | 68.95% | 74.76% | 74.75% |
BERT-base | 85.54% | 84.56% | 85.54% | 84.31% |
ResNet50 V1.5 | 72.20% | 67.70% | 72.20% | 72.29% |
MobileNet V2 | 65.82% | 58.56% | 65.83% | 65.63% |
SSD MobileNet V1 | 22.45% | 16.53% | 22.45% | 22.35% |
DistilBERT base MRPC | 84.56% | 83.82% | 84.56% | 84.56% |
SqueezeNet | 56.54% | 53.52% | 56.54% | 56.55% |
SSD | 18.63% | 18.54% | 18.63% | 18.61% |
AlexNet | 54.71% | 47.06% | 54.71% | 54.79% |
CaffeNet | 56.25% | 52.35% | 56.27% | 56.24% |
GoogleNet | 67.73% | 63.56% | 67.72% | 67.76% |
ZFNet | 55.86% | 45.09% | 55.86% | 55.89% |
Inception V1 | 67.21% | 63.03% | 67.20% | 67.21% |
SSD MobileNet V1 (ONNX Model Zoo) | 22.86% | 16.94% | 22.80% | 22.87% |
Mobile bert MRPC | 85.54% | 84.56% | 85.54% | 85.54% |
Roberta base MRPC | 89.46% | 90.44% | 89.71% | 89.71% |
ResNet50 V1.5 MLPerf | 76.14% | 72.80% | 76.14% | 76.17% |
VGG16 | 66.69% | 64.25% | 66.69% | 66.64% |
VGG16 (ONNX Model Zoo) | 72.31% | 69.35% | 72.32% | 72.34% |
MobileNet V3 MLPerf | 75.57% | 70.78% | 75.56% | 75.52% |
EfficientNet | 77.61% | 76.52% | 77.56% | 77.60% |
MobileNet V2 (ONNX Model Zoo) | 68.51% | 62.48% | 68.58% | 68.48% |
ShuffleNet V2 | 66.12% | 58.41% | 66.11% | 66.11% |