Validated Models
Intel® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in example tables, and the performance/accuracy results is available here.
Validated Quantization Examples
1.1. TensorFlow Models with Intel TensorFlow 2.12.0
1.2. TensorFlow Models with Intel® Extension for TensorFlow* 1.2.0
1.3. PyTorch Models with Torch 2.0.1+cpu in PTQ Mode
1.4. PyTorch Models with Torch 2.0.1+cpu in QAT Mode
1.5. PyTorch Models with Intel® Extension for PyTorch* 2.0.1+cpu
1.6. PyTorch Models with Torch 2.0.1+cpu in WOQ Mode
Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime
Validated Quantization Examples
System summary: Test by Intel on 06/19/2023. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 256GB (16x16GB DDR5 4800 MT/s [4800 MT/s]), BIOS 3A14.TEL2P1, microcode 0x2b0001b0,
CentOS Stream 8, gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16.
Using 1 socket, 4 cores/instance, 14 instances and batch size 1 to benchmark most of the model.
Using 1 socket, 56 cores/instance, 1 instance and batch size 1 for some large models performance measurement.
Performance varies by use, configuration and other factors.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
TensorFlow Models with Intel TensorFlow 2.12.0
| Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
|---|---|---|---|---|---|---|---|
| INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
| ResNet50 v1.0 | pb | 74.12% | 74.27% | -0.21% | 2721.21 | 638.25 | 4.26x |
| ResNet50 v1.5 | pb | 76.23% | 76.46% | -0.31% | 2123.70 | 552.94 | 3.84x |
| ResNet101 | pb | 77.50% | 76.45% | 1.37% | 1477.29 | 432.29 | 3.42x |
| Inception V1 | pb | 70.44% | 69.74% | 1.01% | 3267.92 | 1266.03 | 2.58x |
| Inception V2 | pb | 74.38% | 73.97% | 0.57% | 2399.76 | 1098.67 | 2.18x |
| Inception V3 | pb | 76.71% | 76.75% | -0.05% | 1593.59 | 508.58 | 3.13x |
| Inception V4 | pb | 80.18% | 80.27% | -0.11% | 1032.10 | 249.39 | 4.14x |
| Inception ResNet V2 | pb | 80.34% | 80.40% | -0.07% | 427.28 | 185.60 | 2.30x |
| MobileNet V1 | pb | 71.78% | 70.96% | 1.16% | 5503.87 | 1791.62 | 3.07x |
| MobileNet V2 | pb | 72.52% | 71.76% | 1.07% | 3639.83 | 1864.72 | 1.95x |
| VGG16 | pb | 72.64% | 70.89% | 2.47% | 1538.21 | 236.22 | 6.51x |
| VGG19 | pb | 72.69% | 71.01% | 2.37% | 1368.21 | 196.94 | 6.95x |
| ResNetV2 50 | pb | 70.44% | 69.64% | 1.15% | 1105.19 | 657.45 | 1.68x |
| ResNetV2 101 | pb | 72.65% | 71.87% | 1.08% | 716.49 | 369.95 | 1.94x |
| ResNetV2 152 | pb | 73.07% | 72.37% | 0.97% | 508.60 | 269.31 | 1.89x |
| Densenet 121 | pb | 73.59% | 72.89% | 0.97% | 617.94 | 498.43 | 1.24x |
| Densenet 161 | pb | 76.35% | 76.29% | 0.08% | 372.04 | 242.05 | 1.54x |
| Densenet 169 | pb | 74.34% | 74.65% | -0.41% | 496.41 | 411.94 | 1.21x |
| EfficientNet B0 | ckpt | 76.14% | 76.76% | -0.81% | 748.42 | 709.43 | 1.05x |
| SSD ResNet50 V1 | pb | 37.88% | 38.00% | -0.31% | 134.81 | 31.06 | 4.34x |
| SSD MobileNet V1 | pb | 22.98% | 23.13% | -0.64% | 1273.79 | 671.84 | 1.90x |
| SSD ResNet50 v1 | ckpt | 37.89% | 38.00% | -0.30% | 136.53 | 27.88 | 4.90x |
| SSD MobileNet v1 | ckpt | 22.96% | 23.13% | -0.72% | 1235.03 | 477.83 | 2.58x |
| SSD ResNet34 | pb | 21.70% | 22.09% | -1.76% | 179.37 | 13.96 | 12.85x |
| Faster R-CNN Inception ResNet V2 | pb | 37.47% | 38.31% | -2.18% | 5.39 | 3.01 | 1.79x |
| Faster R-CNN Inception ResNet V2 | SavedModel | 37.79% | 38.31% | -1.34% | 5.35 | 1.89 | 2.83x |
| Faster R-CNN ResNet101 | pb | 30.32% | 30.39% | -0.23% | 156.71 | 23.50 | 6.67x |
| Faster R-CNN ResNet101 | SavedModel | 30.33% | 30.39% | -0.20% | 152.21 | 18.50 | 8.23x |
| Faster R-CNN ResNet50 | pb | 26.64% | 26.59% | 0.21% | 173.07 | 28.83 | 6.00x |
| YOLOv3 | pb | 82.13% | 82.35% | -0.28% | 211.67 | 87.89 | 2.41x |
| BERT large SQuAD | pb | 92.47 | 92.99 | -0.56% | 46.87 | 16.65 | 2.82x |
| BERT large SQuAD (ONNX Model Zoo) | pb | 92.42 | 92.98 | -0.61% | 42.35 | 17.03 | 2.49x |
| BERT base MRPC | ckpt | 86.03% | 86.52% | -0.57% | 424.94 | 174.10 | 2.44x |
| Transformer LT | pb | 25.77 | 25.86 | -0.34% | 42.11 | 22.11 | 1.90x |
| Transformer lt MLPerf | pb | 27.10 | 27.17 | -0.25% | 9.82 | 4.29 | 2.29x |
| Wide Deep large DS | pb | 77.75% | 77.67% | 0.10% | 55612.97 | 43479.53 | 1.28x |
| Model | Example | Accuracy | Performance 1s56c1ins1bs Throughput(samples/sec) |
||||
|---|---|---|---|---|---|---|---|
| INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
| Mask R-CNN Inception V2 | pb | 28.60% | 28.73% | -0.44% | 39.35 | 23.84 | 1.65x |
| Mask R-CNN Inception V2 | ckpt | 28.60% | 28.73% | -0.44% | 40.21 | 23.90 | 1.68x |
| GPT2 | pb | 66.89% | 67.57% | -1.00% | 9.67 | 7.22 | 1.34x |
TensorFlow Models with Intel® Extension for TensorFlow* 1.2.0
| Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
|---|---|---|---|---|---|---|---|
| INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
| ResNet50 v1.0 | pb | 74.16% | 74.27% | -0.15% | 2716.04 | 569.18 | 4.77x |
| ResNet50 v1.5 | pb | 76.27% | 76.46% | -0.26% | 2683.90 | 476.14 | 5.64x |
| Inception V1 | pb | 69.59% | 69.74% | -0.22% | 2349.32 | 1035.63 | 2.27x |
| Inception V2 | pb | 73.75% | 73.97% | -0.30% | 2399.93 | 930.62 | 2.58x |
| Inception V4 | pb | 80.03% | 80.27% | -0.31% | 763.85 | 262.22 | 2.91x |
| MobileNet V1 | pb | 70.61% | 70.96% | -0.48% | 4003.12 | 1677.22 | 2.39x |
| MobileNet V2 | pb | 71.15% | 71.76% | -0.85% | 2766.36 | 2643.21 | 1.05x |
| VGG16 | pb | 70.84% | 70.89% | -0.07% | 1495.88 | 238.52 | 6.27x |
| VGG19 | pb | 71.03% | 71.01% | 0.03% | 1372.91 | 199.52 | 6.88x |
| ResNetV2 50 | pb | 69.43% | 69.64% | -0.30% | 1457.53 | 630.41 | 2.31x |
| ResNetV2 101 | pb | 71.84% | 71.87% | -0.05% | 842.53 | 338.44 | 2.49x |
| ResNetV2 152 | pb | 72.26% | 72.37% | -0.15% | 645.86 | 231.63 | 2.79x |
| EfficientNet B0 | ckpt | 76.76% | 76.76% | 0.00% | 938.82 | 707.22 | 1.33x |
| EfficientNet V2 B0 | SavedModel | 78.63% | 78.62% | 0.01% | 1533.95 | 1258.45 | 1.22x |
| SSD MobileNet V1 | pb | 22.90% | 23.13% | -0.99% | 981.29 | 647.07 | 1.52x |
| SSD MobileNet v1 | ckpt | 22.92% | 23.13% | -0.89% | 850.31 | 444.12 | 1.91x |
| Faster R-CNN Inception ResNet V2 | pb | 38.02% | 38.31% | -0.74% | 7.08 | 2.93 | 2.42x |
| Faster R-CNN Inception ResNet V2 | SavedModel | 38.18% | 38.31% | -0.32% | 6.61 | 2.79 | 2.37x |
| YOLOv3 | pb | 80.27% | 82.35% | -2.53% | 543.50 | 80.59 | 6.74x |
| BERT large SQuAD | pb | 92.67 | 92.97 | -0.33% | 72.27 | 18.39 | 3.93x |
| BERT base MRPC | ckpt | 86.28% | 86.28% | 0.00% | 947.96 | 233.07 | 4.07x |
| DistilBERT base | pb | 90.48% | 91.06% | -0.64% | 788.64 | 462.35 | 1.71x |
| Transformer LT | pb | 25.73 | 25.86 | -0.47% | 42.07 | 29.21 | 1.44x |
| Transformer lt MLPerf | pb | 27.13 | 27.17 | -0.14% | 10.43 | 4.84 | 2.15x |
| Wide Deep large DS | pb | 77.66% | 77.67% | -0.02% | 51958.00 | 39974.56 | 1.30x |
PyTorch Models with Torch 2.0.1+cpu in PTQ Mode
| Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
|---|---|---|---|---|---|---|---|
| INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
| ResNet18 | static | 69.61% | 69.76% | -0.22% | 1631.83 | 662.13 | 2.46x |
| ResNet50 | static | 75.92% | 76.15% | -0.30% | 1162.83 | 330.92 | 3.51x |
| Inception V3 | static | 69.47% | 69.52% | -0.07% | 968.67 | 334.53 | 2.90x |
| ResNeSt50 | static | 80.80% | 81.04% | -0.30% | 394.38 | 40.76 | 9.67x |
| ResNeXt101_32x8d | static | 78.94% | 79.31% | -0.46% | 558.59 | 108.42 | 5.15x |
| Efficientnet_b0 | static | 76.89% | 77.67% | -1.01% | 703.73 | 656.12 | 1.07x |
| Efficientnet_b3 | static | 77.82% | 78.54% | -0.93% | 510.58 | 391.05 | 1.31x |
| Efficientnet_b7 | static | 73.55% | 73.92% | -0.50% | 233.29 | 150.09 | 1.55x |
| Peleenet | static | 71.85% | 72.10% | -0.35% | 857.72 | 585.60 | 1.46x |
| YOLO V3 | static | 55.09% | 54.93% | 0.31% | 160.97 | 60.60 | 2.66x |
| SSD ResNet34 | static | 19.52 | 19.63 | -0.58% | 141.67 | 11.75 | 12.05x |
| Roberta base MRPC | static | 92.69% | 93.59% | -0.96% | 407.78 | 174.53 | 2.34x |
| CamemBERT base MRPC | static | 88.93% | 89.28% | -0.39% | 402.78 | 173.56 | 2.32x |
| DistilBERT base MRPC | dynamic | 90.20% | 90.27% | -0.07% | 748.28 | 343.54 | 2.18x |
| DistilBERT base MRPC | static | 89.53% | 90.27% | -0.82% | 804.57 | 343.24 | 2.34x |
| ALBERT base MRPC | static | 92.63% | 92.63% | 0.00% | 352.44 | 162.26 | 2.17x |
| 91.60% | 92.25% | -0.71% | 302.57 | 183.57 | 1.65x | ||
| Xlm Roberta MRPC | static | 88.36% | 88.62% | -0.29% | 404.61 | 173.71 | 2.33x |
| Xlm Roberta MRPC | dynamic | 88.24% | 88.24% | 0.00% | 382.72 | 174.63 | 2.19x |
| BERT base MRPC | static | 89.63% | 90.42% | -0.87% | 407.58 | 173.66 | 2.35x |
| BERT base COLA | static | 54.51% | 53.39% | 2.10% | 414.72 | 173.86 | 2.39x |
| BERT base STSB | static | 87.55% | 88.05% | -0.57% | 413.76 | 173.34 | 2.39x |
| BERT base SST-2 | static | 91.51% | 92.32% | -0.87% | 410.87 | 173.63 | 2.37x |
| BERT large COLA | static | 62.84% | 63.35% | -0.80% | 138.89 | 51.65 | 2.69x |
| BERT base RTE | static | 72.56% | 72.56% | 0.00% | 385.23 | 173.32 | 2.22x |
| BERT large MRPC | static | 90.22% | 90.38% | -0.17% | 141.61 | 51.67 | 2.74x |
| BERT large QNLI | static | 90.87% | 91.54% | -0.74% | 407.84 | 173.52 | 2.35x |
| BERT large RTE | static | 73.29% | 74.01% | -0.98% | 141.64 | 51.33 | 2.76x |
| BERT large RTE | dynamic | 71.48% | 74.01% | -3.41% | 126.49 | 51.34 | 2.46x |
| BERT large SQuAD | static | 92.27 | 93.16 | -0.95% | 37.61 | 16.57 | 2.27x |
| GPT J WikiText | static | 3.36 | 2.34 | NA | 0.87 | 0.28 | 3.15x |
| Reformer Crime and Punishment | static | 1.88 | 1.87 | 0.23% | 449.73 | 364.78 | 1.23x |
| lvwerra/pegasus-samsum | static | 42.50 | 42.67 | -0.39% | 101.32 | 37.80 | 2.68x |
| Model | Example | Accuracy | Performance 1s56c1ins1bs Throughput(samples/sec) |
||||
|---|---|---|---|---|---|---|---|
| INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
| openai/whisper-large | dynamic | 97.07% | 96.96% | 0.12% | 0.60 | 0.47 | 1.28x |
| abeja/gpt-neox-japanese-2.7b | static | 4.30 | 3.52 | 22.06% | 1.03 | 0.56 | 1.84x |
PyTorch Models with Torch 2.0.1+cpu in QAT Mode
| Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
|---|---|---|---|---|---|---|---|
| INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
| ResNet18 | static | 69.74% | 69.76% | -0.03% | 1723.70 | 654.17 | 2.63x |
| ResNet50 | static | 76.05% | 76.15% | -0.12% | 1141.22 | 306.04 | 3.73x |
| ResNeXt101_32x8d | static | 79.28% | 79.31% | -0.04% | 558.92 | 106.82 | 5.23x |
| MobileNet V2 | static | 69.73% | 71.84% | -2.93% | 1379.34 | 729.22 | 1.89x |
| BERT base MRPC | static | 89.70% | 90.40% | -0.77% | 389.77 | 173.54 | 2.25x |
PyTorch Models with Intel® Extension for PyTorch* 2.0.1+cpu
| Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
|---|---|---|---|---|---|---|---|
| INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
| ResNet18 | static | 75.98% | 76.15% | -0.22% | 1980.94 | 672.93 | 2.94x |
| ResNet50 | static | 69.56% | 69.76% | -0.29% | 5032.32 | 1500.16 | 3.35x |
| ResNeXt101_32x16d_wsl | static | 84.04% | 84.17% | -0.15% | 533.60 | 78.84 | 6.77x |
| SSD ResNet34 | static | 19.93% | 20.00% | -0.38% | 84.02 | 15.68 | 5.36x |
| bert-large-uncased-whole-word-masking-finetuned-squad | static | 92.93 | 93.16 | -0.25% | 161.44 | 22.19 | 7.27x |
| distilbert-base-uncased-distilled-squad | static | 86.09 | 86.84 | -0.86% | 556.19 | 149.79 | 3.71x |
| Model | Example | Accuracy | Performance 1s56c1ins1bs Throughput(samples/sec) |
||||
|---|---|---|---|---|---|---|---|
| INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
| EleutherAI/gpt-j-6B | static | 78.60% | 79.20% | -0.76% | 4.87 | 1.55 | 3.14x |
PyTorch Models with Torch 2.0.1+cpu in WOQ Mode
| Model name | Configuration | Lambada_openai | Hellaswag | Winogrande | Piqa | Average [Mean accuracy of previous four tasks] |
Wikitext | |
|---|---|---|---|---|---|---|---|---|
| Accuracy | Accuracy | Accuracy | Accuracy | Accuracy | Accuracy Ratio [INT8/FP32] |
Word_perplexity | ||
| EleutherAI/gpt-j-6b | FP32 | 0.6831 | 0.4954 | 0.6409 | 0.7541 | 0.6434 | / | 10.8816 |
| GPTQ W4G128Asym |
0.679 | 0.4895 | 0.6433 | 0.7476 | 0.6399 | 0.9945 | 11.0999 | |
| GPTQ W4G32Asym |
0.6829 | 0.4923 | 0.6401 | 0.7486 | 0.6410 | 0.9963 | 11.0141 | |
| GPTQ W4G128Sym |
0.685 | 0.4907 | 0.6361 | 0.7443 | 0.6390 | 0.9932 | 11.1498 | |
| GPTQ W4G32Sym |
0.6911 | 0.4899 | 0.6448 | 0.7497 | 0.6439 | 1.0008 | 11.0927 | |
| facebook/opt-6.7b | FP32 | 0.6769 | 0.5049 | 0.6543 | 0.7628 | 0.6497 | / | 12.2862 |
| GPTQ W4G32Asym |
0.6804 | 0.4984 | 0.6535 | 0.7568 | 0.6473 | 0.9962 | 12.4193 | |
| GPTQ W4G32Sym |
0.6885 | 0.4973 | 0.6433 | 0.753 | 0.6455 | 0.9935 | 12.4607 | |
| decapoda-research/llama-7b-hf | FP32 | 0.7361 | 0.5642 | 0.6709 | 0.7835 | 0.6887 | / | 9.4202 |
| GPTQ W4G32Asym |
0.7244 | 0.5603 | 0.6614 | 0.7835 | 0.6824 | 0.9909 | 9.5881 | |
| decapoda-research/llama-13b-hf | FP32 | 0.7627 | 0.5911 | 0.7009 | 0.7878 | 0.7106 | / | 8.212 |
| GPTQ W4G128Asym |
0.7518 | 0.5843 | 0.6961 | 0.7911 | 0.7058 | 0.9932 | 8.4319 | |
| GPTQ W4G32Asym |
0.7572 | 0.5898 | 0.7056 | 0.7894 | 0.7105 | 0.9998 | 8.3429 | |
| GPTQ W4G128Sym |
0.7596 | 0.5841 | 0.6977 | 0.7905 | 0.7080 | 0.9963 | 8.4916 | |
| decapoda-research/llama-30b-hf | FP32 | 0.7759 | 0.6266 | 0.7277 | 0.8096 | 0.7350 | / | 6.2384 |
| GPTQ W4G128Asym |
0.778 | 0.624 | 0.7269 | 0.8047 | 0.7334 | 0.9979 | 6.4237 | |
| GPTQ W4G32Asym |
0.7706 | 0.6239 | 0.7285 | 0.8058 | 0.7322 | 0.9963 | 6.4697 | |
| GPTQ W4G128Sym |
0.7836 | 0.6195 | 0.7269 | 0.8047 | 0.7337 | 0.9983 | 6.5604 | |
| meta-llama/Llama-2-7b-chat-hf | FP32 | 0.7058 | 0.5732 | 0.648 | 0.7715 | 0.6746 | / | 11.7107 |
| GPTQ W4G128Asym |
0.6982 | 0.5637 | 0.6527 | 0.7704 | 0.6713 | 0.9950 | 11.9702 | |
| GPTQ W4G32Asym |
0.6953 | 0.5682 | 0.6575 | 0.7758 | 0.6742 | 0.9994 | 11.9317 | |
| meta-llama/Llama-2-7b-hf | FP32 | 0.7392 | 0.567 | 0.6709 | 0.7835 | 0.6902 | / | 8.7911 |
| GPTQ W4G32Asym |
0.7353 | 0.5642 | 0.6622 | 0.7829 | 0.6862 | 0.9942 | 8.9635 | |
| GPTQ W4G128Sym |
0.7246 | 0.5617 | 0.6756 | 0.7797 | 0.6854 | 0.9931 | 9.2799 | |
| meta-llama/Llama-2-13b-chat-hf | FP32 | 0.7312 | 0.6059 | 0.7103 | 0.7835 | 0.7077 | / | 10.2213 |
| GPTQ W4G128Asym |
0.7273 | 0.6018 | 0.7088 | 0.7742 | 0.7030 | 0.9934 | 2538.083 | |
| GPTQ W4G32Asym |
0.7283 | 0.6053 | 0.7024 | 0.7764 | 0.7031 | 0.9935 | 1889.374 | |
| GPTQ W4G128Sym |
0.727 | 0.5997 | 0.7024 | 0.778 | 0.7018 | 0.9916 | 2504.497 | |
| meta-llama/Llama-2-13b-hf | FP32 | 0.7677 | 0.5972 | 0.6961 | 0.7878 | 0.7122 | / | 7.8984 |
| GPTQ W4G128Asym |
0.7627 | 0.5933 | 0.689 | 0.7851 | 0.7075 | 0.9934 | 1556.448 | |
| GPTQ W4G32Asym |
0.7675 | 0.5934 | 0.6977 | 0.7856 | 0.7111 | 0.9984 | 1514.927 | |
| GPTQ W4G128Sym |
0.7566 | 0.5899 | 0.7032 | 0.7856 | 0.7088 | 0.9953 | 1374.728 | |
| bigscience/bloom-7b1 | FP32 | 0.5764 | 0.4628 | 0.6456 | 0.7269 | 0.6029 | / | 30.6438 |
| GPTQ W4G32Sym |
0.5799 | 0.4542 | 0.6361 | 0.7312 | 0.6004 | 0.9957 | 32.0626 | |
| bigscience/bloomz-7b1 | FP32 | 0.5593 | 0.4789 | 0.6527 | 0.7628 | 0.6134 | / | 51.7432 |
| GPTQ W4G32Asym |
0.5525 | 0.4731 | 0.6504 | 0.7617 | 0.6094 | 0.9935 | 52.7828 | |
| databricks/dolly-v1-6b | FP32 | 0.6866 | 0.5098 | 0.6433 | 0.7622 | 0.6505 | / | 11.3242 |
| GPTQ W4G128Asym |
0.6878 | 0.5058 | 0.6393 | 0.7633 | 0.6491 | 0.9978 | 11.5514 | |
| GPTQ W4G32Asym |
0.6864 | 0.5084 | 0.6519 | 0.7568 | 0.6509 | 1.0006 | 11.4728 | |
| GPTQ W4G128Sym |
0.6876 | 0.5045 | 0.6433 | 0.7541 | 0.6474 | 0.9952 | 11.6474 | |
| databricks/dolly-v2-7b | FP32 | 0.6379 | 0.5282 | 0.614 | 0.7448 | 0.6312 | / | 16.161 |
| GPTQ W4G32Asym |
0.6377 | 0.5228 | 0.5991 | 0.7448 | 0.6261 | 0.9919 | 16.4096 | |
| EleutherAI/gpt-neo-2.7b | FP32 | 0.6224 | 0.4271 | 0.577 | 0.722 | 0.5871 | / | 13.9359 |
| GPTQ W4G128Asym |
0.6123 | 0.4227 | 0.5738 | 0.7203 | 0.5823 | 0.9917 | 14.3377 | |
| GPTQ W4G32Asym |
0.615 | 0.4259 | 0.5714 | 0.7247 | 0.5843 | 0.9951 | 14.2083 | |
| GPTQ W4G32Sym |
0.6154 | 0.4208 | 0.5777 | 0.7198 | 0.5834 | 0.9937 | 14.3121 | |
| EleutherAI/gpt-neox-20b | FP32 | 0.7233 | 0.5359 | 0.6614 | 0.7753 | 0.6740 | / | 9.195 |
| GPTQ W4G128Asym |
0.7186 | 0.5328 | 0.6535 | 0.7699 | 0.6687 | 0.9922 | 9.3463 | |
| GPTQ W4G32Asym |
0.7268 | 0.533 | 0.659 | 0.7715 | 0.6726 | 0.9979 | 9.2897 | |
| mosaicml/mpt-7b | FP32 | 0.7056 | 0.5718 | 0.6859 | 0.7927 | 0.6890 | / | 9.9324 |
| GPTQ W4G128Asym |
0.7006 | 0.5655 | 0.6803 | 0.7965 | 0.6857 | 0.9952 | 10.1515 | |
| mosaicml/mpt-7b-chat | FP32 | 0.655 | 0.5752 | 0.6748 | 0.7845 | 0.6724 | / | 13.5951 |
| GPTQ W4G128Asym |
0.6472 | 0.5716 | 0.6685 | 0.784 | 0.6678 | 0.9932 | 13.8539 | |
| mosaicml/mpt-7b-instruct | FP32 | 0.6918 | 0.5819 | 0.678 | 0.7927 | 0.6861 | / | 10.8863 |
| GPTQ W4G128Asym |
0.6864 | 0.5765 | 0.6827 | 0.7873 | 0.6832 | 0.9958 | 11.1451 | |
| mosaicml/mpt-7b-storywriter | FP32 | 0.693 | 0.5477 | 0.663 | 0.784 | 0.6719 | / | 9.9125 |
| GPTQ W4G128Asym |
0.6854 | 0.5443 | 0.6661 | 0.7813 | 0.6693 | 0.9961 | 10.1137 | |
| tiiuae/falcon-rw-7b | FP32 | 0.6604 | 0.5419 | 0.6598 | 0.7753 | 0.6594 | / | 11.7616 |
| GPTQ W4G128Asym |
0.6484 | 0.5369 | 0.6575 | 0.7807 | 0.6559 | 0.9947 | 11.9411 | |
| GPTQ W4G32Asym |
0.6571 | 0.5398 | 0.6582 | 0.7764 | 0.6579 | 0.9978 | 11.8809 | |
| GPTQ W4G128Sym |
0.652 | 0.535 | 0.6575 | 0.7682 | 0.6532 | 0.9906 | 12.0048 | |
| tiiuae/falcon-7b-instruct | FP32 | 0.6437 | 0.5177 | 0.6669 | 0.7824 | 0.6527 | / | 14.5053 |
| GPTQ W4G128Asym |
0.6301 | 0.5142 | 0.6654 | 0.7835 | 0.6483 | 0.9933 | 14.8146 | |
| GPTQ W4G32Asym |
0.6377 | 0.517 | 0.6598 | 0.7807 | 0.6488 | 0.9941 | 14.6953 | |
ONNX Models with ONNX Runtime 1.15.0
| Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
|---|---|---|---|---|---|---|---|
| INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
| ResNet50 V1.5 | qlinearops | 72.16% | 72.29% | -0.19% | 1412.05 | 710.02 | 1.99x |
| ResNet50 V1.5 | qdq | 72.14% | 72.29% | -0.22% | 1564.39 | 712.38 | 2.20x |
| ResNet50 V1.5 MLPerf | qlinearops | 76.11% | 76.46% | -0.46% | 1377.47 | 719.66 | 1.91x |
| ResNet50 V1.5 MLPerf | qdq | 76.13% | 76.46% | -0.44% | 1446.69 | 703.40 | 2.06x |
| ResNet50 V1.5 (ONNX Model Zoo) | qlinearops | 74.82% | 74.99% | -0.22% | 1579.31 | 747.73 | 2.11x |
| ResNet50 V1.5 (ONNX Model Zoo) | qdq | 74.82% | 74.99% | -0.23% | 1508.21 | 749.43 | 2.01x |
| MobileNet V2 | qlinearops | 65.49% | 66.89% | -2.09% | 6950.77 | 4214.56 | 1.65x |
| MobileNet V2 | qdq | 65.49% | 66.89% | -2.10% | 6881.60 | 4192.78 | 1.64x |
| MobileNet V2 (ONNX Model Zoo) | qlinearops | 68.38% | 69.48% | -1.59% | 6563.24 | 3804.18 | 1.73x |
| MobileNet V2 (ONNX Model Zoo) | qdq | 68.38% | 69.48% | -1.59% | 6631.12 | 3922.70 | 1.69x |
| VGG16 | qlinearops | 66.56% | 66.69% | -0.19% | 423.44 | 158.01 | 2.68x |
| VGG16 | qdq | 66.59% | 66.69% | -0.15% | 571.02 | 161.69 | 3.53x |
| VGG16 (ONNX Model Zoo) | qlinearops | 72.33% | 72.40% | -0.09% | 598.92 | 163.53 | 3.66x |
| VGG16 (ONNX Model Zoo) | qdq | 72.33% | 72.40% | -0.09% | 594.66 | 164.39 | 3.62x |
| MobileNet V3 MLPerf | qlinearops | 75.56% | 75.74% | -0.24% | 5473.90 | 2567.96 | 2.13x |
| MobileNet V3 MLPerf | qdq | 75.56% | 75.74% | -0.24% | 5455.36 | 2563.80 | 2.13x |
| ShuffleNet V2 (ONNX Model Zoo) | qlinearops | 66.09% | 66.36% | -0.41% | 6818.46 | 3839.67 | 1.78x |
| ShuffleNet V2 (ONNX Model Zoo) | qdq | 66.09% | 66.36% | -0.41% | 5750.72 | 3861.83 | 1.49x |
| GoogleNet (ONNX Model Zoo) | qlinearops | 67.71% | 67.79% | -0.12% | 1783.63 | 1095.06 | 1.63x |
| GoogleNet (ONNX Model Zoo) | qdq | 67.73% | 67.79% | -0.09% | 1755.03 | 1071.04 | 1.64x |
| SqueezeNet (ONNX Model Zoo) | qlinearops | 56.54% | 56.87% | -0.57% | 9918.09 | 5639.89 | 1.76x |
| SqueezeNet (ONNX Model Zoo) | qdq | 56.54% | 56.87% | -0.57% | 9423.22 | 5501.30 | 1.71x |
| CaffeNet (ONNX Model Zoo) | qlinearops | 56.21% | 56.30% | -0.16% | 3363.62 | 1015.06 | 3.31x |
| CaffeNet (ONNX Model Zoo) | qdq | 56.25% | 56.30% | -0.09% | 3276.82 | 798.28 | 4.10x |
| AlexNet (ONNX Model Zoo) | qlinearops | 54.73% | 54.79% | -0.10% | 2104.66 | 985.33 | 2.14x |
| AlexNet (ONNX Model Zoo) | qdq | 54.71% | 54.79% | -0.14% | 2054.60 | 745.36 | 2.76x |
| ZFNet (ONNX Model Zoo) | qlinearops | 55.84% | 55.96% | -0.21% | 864.73 | 456.41 | 1.89x |
| ZFNet (ONNX Model Zoo) | qdq | 55.86% | 55.96% | -0.18% | 866.80 | 455.75 | 1.90x |
| Inception V1 (ONNX Model Zoo) | qlinearops | 67.21% | 67.24% | -0.05% | 1802.03 | 1170.74 | 1.54x |
| Inception V1 (ONNX Model Zoo) | qdq | 67.21% | 67.24% | -0.05% | 1813.29 | 1164.87 | 1.56x |
| EfficientNet (ONNX Model Zoo) | qlinearops | 76.98% | 77.11% | -0.17% | 2615.12 | 1349.97 | 1.94x |
| EfficientNet (ONNX Model Zoo) | qdq | 76.99% | 77.11% | -0.16% | 2343.94 | 1322.86 | 1.77x |
| DenseNet (ONNX Model Zoo) | qlinearops | 60.53% | 60.96% | -0.70% | 630.80 | 499.98 | 1.26x |
| SSD (ONNX Model Zoo) | qlinearops | 18.83% | 18.98% | -0.77% | 56.69 | 14.56 | 3.89x |
| SSD (ONNX Model Zoo) | qdq | 18.62% | 18.98% | -1.89% | 57.54 | 14.55 | 3.95x |
| SSD MobileNet V1 | qlinearops | 22.44% | 23.10% | -2.86% | 1288.14 | 878.69 | 1.47x |
| SSD MobileNet V1 | qdq | 22.44% | 23.10% | -2.86% | 1173.88 | 851.00 | 1.38x |
| SSD MobileNet V1 (ONNX Model Zoo) | qlinearops | 22.96% | 23.02% | -0.27% | 1114.65 | 825.47 | 1.35x |
| SSD MobileNet V1 (ONNX Model Zoo) | qdq | 22.96% | 23.02% | -0.27% | 1056.30 | 792.66 | 1.33x |
| SSD MobileNet V2 | qlinearops | 23.87% | 24.67% | -3.25% | 788.51 | 669.72 | 1.18x |
| YOLOv3 (ONNX Model Zoo) | qlinearops | 27.01% | 28.73% | -5.99% | 140.21 | 110.43 | 1.27x |
| YOLOv4 (ONNX Model Zoo) | qlinearops | 32.30% | 33.71% | -4.19% | 72.95 | 64.95 | 1.12x |
| DUC (ONNX Model Zoo) | qlinearops | 81.63% | 81.92% | -0.36% | 9.12 | 4.96 | 1.84x |
| Tiny YOLOv3 (ONNX Model Zoo) | qlinearops | 11.83% | 12.42% | -4.73% | 1163.39 | 993.96 | 1.17x |
| Ultra Face (ONNX Model Zoo) | qlinearops | 83.23% | 83.65% | -0.49% | 8501.08 | 1922.19 | 4.42x |
| Emotion FERPlus (ONNX Model Zoo) | qlinearops | 7.97% | 8.00% | -0.35% | 3552.60 | 3114.19 | 1.14x |
| ArcFace (ONNX Model Zoo) | qlinearops | 99.80% | 99.80% | 0.00% | 558.78 | 246.87 | 2.26x |
| BERT base MRPC | qlinearops | 85.54% | 86.03% | -0.57% | 399.04 | 226.03 | 1.77x |
| BERT base MRPC | qdq | 85.54% | 86.03% | -0.57% | 392.26 | 223.21 | 1.76x |
| BERT base MRPC | integerops | 85.29% | 86.03% | -0.85% | 474.99 | 222.71 | 2.13x |
| DistilBERT base MRPC | qdq | 84.56% | 84.56% | 0.00% | 557.05 | 399.46 | 1.39x |
| DistilBERT base MRPC | integerops | 85.54% | 84.56% | 1.16% | 963.92 | 399.36 | 2.41x |
| Mobile bert MRPC | qdq | 85.54% | 86.28% | -0.85% | 529.98 | 394.46 | 1.34x |
| Mobile bert MRPC | integerops | 85.54% | 86.28% | -0.85% | 603.66 | 398.15 | 1.52x |
| Roberta base MRPC | integerops | 90.93% | 89.95% | 1.09% | 485.74 | 223.54 | 2.17x |
| BERT SQuAD (ONNX Model Zoo) | integerops | 80.29 | 80.67 | -0.47% | 187.63 | 95.88 | 1.96x |
| MobileBERT SQuAD MLPerf (ONNX Model Zoo) | integerops | 89.87 | 90.03 | -0.17% | 144.88 | 124.08 | 1.17x |
| BiDAF (ONNX Model Zoo) | integerops | 65.93% | 66.08% | -0.23% | 2757.83 | 2279.38 | 1.21x |
| GPT2 lm head WikiText (ONNX Model Zoo) | integerops | 31.98 | 29.00 | 10.31% | 15.35 | 9.73 | 1.58x |
| BERT base cased MRPC (HuggingFace) | qlinearops | 90.21% | 90.42% | -0.23% | 357.89 | 211.81 | 1.69x |
| BERT base uncased MRPC (HuggingFace) | integerops | 89.58% | 90.42% | -0.93% | 472.44 | 211.65 | 2.23x |
| Roberta base MRPC (HuggingFace) | qlinearops | 91.00% | 91.38% | -0.41% | 365.03 | 214.66 | 1.70x |
| Roberta base MRPC (HuggingFace) | integerops | 90.85% | 91.38% | -0.58% | 489.85 | 212.20 | 2.31x |
| XLM Roberta base MRPC (HuggingFace) | qlinearops | 89.37% | 90.10% | -0.81% | 302.49 | 212.76 | 1.42x |
| XLM Roberta base MRPC (HuggingFace) | integerops | 89.66% | 90.10% | -0.50% | 343.75 | 213.09 | 1.61x |
| Camembert base MRPC (HuggingFace) | qlinearops | 89.28% | 89.28% | 0.00% | 270.01 | 215.48 | 1.25x |
| Camembert base MRPC (HuggingFace) | integerops | 89.19% | 89.28% | -0.10% | 491.01 | 212.92 | 2.31x |
| MiniLM L12 H384 uncased MRPC (HuggingFace) | qlinearops | 90.13% | 90.97% | -0.93% | 1051.67 | 583.85 | 1.80x |
| MiniLM L12 H384 uncased MRPC (HuggingFace) | integerops | 91.07% | 90.97% | 0.10% | 1076.27 | 589.80 | 1.82x |
| DistilBERT base uncased SST-2 (HuggingFace) | qlinearops | 90.71% | 91.06% | -0.38% | 896.69 | 396.85 | 2.26x |
| DistilBERT base uncased SST-2 (HuggingFace) | integerops | 90.25% | 91.06% | -0.88% | 753.88 | 396.59 | 1.90x |
| Albert base v2 SST-2 (HuggingFace) | qlinearops | 91.40% | 92.32% | -0.99% | 274.17 | 210.87 | 1.30x |
| Albert base v2 SST-2 (HuggingFace) | integerops | 91.86% | 92.32% | -0.50% | 271.85 | 211.18 | 1.29x |
| MiniLM L6 H384 uncased SST-2 (HuggingFace) | qlinearops | 89.45% | 90.14% | -0.76% | 2022.40 | 1124.12 | 1.80x |
| MiniLM L6 H384 uncased SST-2 (HuggingFace) | integerops | 89.91% | 90.14% | -0.26% | 2010.50 | 1127.41 | 1.78x |
| MiniLM L6 H384 uncased SST-2 (HuggingFace) | qlinearops | 87.70% | 88.29% | -0.67% | 401.24 | 211.92 | 1.89x |
| MiniLM L6 H384 uncased SST-2 (HuggingFace) | integerops | 88.19% | 88.29% | -0.12% | 494.84 | 212.01 | 2.33x |
| Electra small discriminator MRPC (HuggingFace) | qlinearops | 89.57% | 89.83% | -0.29% | 1804.17 | 1154.99 | 1.56x |
| Electra small discriminator MRPC (HuggingFace) | integerops | 89.27% | 89.83% | -0.63% | 1961.57 | 1158.86 | 1.69x |
| BERT mini MRPC (HuggingFace) | qlinearops | 86.70% | 86.52% | 0.21% | 4986.29 | 3444.92 | 1.45x |
| BERT mini MRPC (HuggingFace) | integerops | 86.16% | 86.52% | -0.41% | 5603.86 | 3320.38 | 1.69x |
| Xlnet base cased MRPC (HuggingFace) | qlinearops | 89.74% | 89.86% | -0.13% | 108.36 | 91.63 | 1.18x |
| Xlnet base cased MRPC (HuggingFace) | integerops | 89.58% | 89.86% | -0.31% | 108.27 | 92.24 | 1.17x |
| BART large MRPC (HuggingFace) | qlinearops | 91.77% | 91.20% | 0.63% | 58.98 | 51.23 | 1.15x |
| BART large MRPC (HuggingFace) | integerops | 92.36% | 91.20% | 1.28% | 96.02 | 51.12 | 1.88x |
| DeBERTa v3 base MRPC (HuggingFace) | qlinearops | 91.85% | 92.23% | -0.40% | 161.42 | 147.11 | 1.10x |
| DeBERTa v3 base MRPC (HuggingFace) | integerops | 92.39% | 92.23% | 0.17% | 170.50 | 147.28 | 1.16x |
| Spanbert SQuAD (HuggingFace) | qlinearops | 91.14 | 91.98 | -0.91% | 69.94 | 42.36 | 1.65x |
| Spanbert SQuAD (HuggingFace) | integerops | 91.40 | 91.98 | -0.63% | 80.06 | 42.62 | 1.88x |
| Bert base multilingual cased SQuAD (HuggingFace) | qlinearops | 88.42 | 89.13 | -0.79% | 71.67 | 42.36 | 1.69x |
| Bert base multilingual cased SQuAD (HuggingFace) | integerops | 88.70 | 89.13 | -0.48% | 79.42 | 42.32 | 1.88x |
| DistilBert base uncased SQuAD (HuggingFace) | qlinearops | 86.33 | 86.86 | -0.62% | 112.14 | 67.59 | 1.66x |
| DistilBert base uncased SQuAD (HuggingFace) | integerops | 86.05 | 86.86 | -0.94% | 159.29 | 67.70 | 2.35x |
| BERT large uncased whole word masking SQuAD (HuggingFace) | qlinearops | 92.34 | 93.16 | -0.88% | 24.56 | 12.71 | 1.93x |
| BERT large uncased whole word masking SQuAD (HuggingFace) | integerops | 92.99 | 93.16 | -0.18% | 26.76 | 12.72 | 2.10x |
| Roberta large SQuAD v2 (HuggingFace) | qlinearops | 89.03 | 89.02 | 0.02% | 16.85 | 12.95 | 1.30x |
| Roberta large SQuAD v2 (HuggingFace) | integerops | 89.04 | 89.02 | 0.02% | 26.85 | 12.95 | 2.07x |
| GPT2 WikiText (HuggingFace) | qlinearops | 30.25 | 29.00 | 4.33% | 12.63 | 9.76 | 1.29x |
| GPT2 WikiText (HuggingFace) | integerops | 29.68 | 29.00 | 2.36% | 13.54 | 9.72 | 1.39x |
| DistilGPT2 WikiText (HuggingFace) | qlinearops | 44.93 | 43.43 | 3.46% | 20.45 | 16.72 | 1.22x |
| DistilGPT2 WikiText (HuggingFace) | integerops | 44.62 | 43.43 | 2.74% | 21.91 | 16.73 | 1.31x |
| LayoutLM FUNSD (HuggingFace) | qlinearops | 78.15% | 78.35% | -0.25% | 60.41 | 43.95 | 1.37x |
| LayoutLM FUNSD (HuggingFace) | integerops | 77.58% | 78.35% | -0.98% | 65.82 | 43.83 | 1.50x |
| LayoutLMv3 FUNSD (HuggingFace) | qlinearops | 89.85% | 90.49% | -0.71% | 31.12 | 29.13 | 1.07x |
| LayoutLMv3 FUNSD (HuggingFace) | integerops | 90.07% | 90.49% | -0.46% | 35.01 | 27.92 | 1.25x |
| Model | Example | Accuracy | Performance 1s56c1ins1bs Throughput(samples/sec) |
||||
|---|---|---|---|---|---|---|---|
| INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
| Faster R-CNN (ONNX Model Zoo) | qlinearops | 34.06% | 34.37% | -0.88% | 3.99 | 3.28 | 1.21x |
| Faster R-CNN (ONNX Model Zoo) | qdq | 33.98% | 34.37% | -1.12% | 4.00 | 3.37 | 1.19x |
| Mask R-CNN (ONNX Model Zoo) | qlinearops | 33.13% | 33.72% | -1.74% | 3.36 | 2.95 | 1.14x |
| Mask R-CNN (ONNX Model Zoo) | qdq | 33.29% | 33.72% | -1.28% | 3.38 | 2.98 | 1.14x |
| FCN (ONNX Model Zoo) | qlinearops | 64.54% | 64.98% | -0.67% | 28.19 | 12.60 | 2.24x |
| FCN (ONNX Model Zoo) | qdq | 64.54% | 64.98% | -0.67% | 28.22 | 12.56 | 2.25x |
ONNX Models with ONNX Runtime 1.15.0 in WOQ Mode
| Model name | Configuration | Lambada_openai | Accuracy Ratio [INT8/FP32] |
|
|---|---|---|---|---|
| Accuracy | Perplexity | |||
| meta-llama/Llama-2-7b-chat-hf | FP32 | 0.7058 | 3.2788 | / |
| GPTQ W4G32Asym |
0.7002 | 3.4124 | 0.9921 | |
| meta-llama/Llama-2-7b-hf | FP32 | 0.7392 | 3.3950 | / |
| GPTQ W4G32Asym |
0.7312 | 3.5711 | 0.9892 | |
| meta-llama/Llama-2-13b-chat-hf | FP32 | 0.7312 | 2.9163 | / |
| GPTQ W4G128Asym |
0.7240 | 2.9945 | 0.9902 | |
| meta-llama/Llama-2-13b-hf | FP32 | 0.7677 | 3.0438 | / |
| GPTQ W4G128Asym |
0.7634 | 3.1186 | 0.9944 | |
| GPTQ W4G32Asym |
0.7615 | 3.1276 | 0.9919 | |
| meta-llama/Llama-2-70b-chat-hf | FP32 | 0.7543 | 2.6181 | / |
| RTN W4G32Asym |
0.7518 | 2.6496 | 0.9967 | |
| meta-llama/Llama-2-70b-hf | FP32 | 0.7964 | 2.6612 | / |
| RTN W4G32Sym |
0.7941 | 2.7243 | 0.9971 | |
Validated Pruning Examples
| Model | TaskDataset | Dense Accuracy Sparse Accuracy |
Relative Drop | Sparsity ratio Sparsity Pattern |
Comments Balanced or unbalanced ratio |
|---|---|---|---|---|---|
| Bert-Mini | question answeringSQuAD-v1.1 | f1=76.87f1=76.2 | -0.80% | 80%structured 4x1 | snip momentumunbalanced |
| Bert-Mini | question answeringSQuAD-v1.1 | f1=76.87f1=76.2 | -0.80% | 80%structured 4x1 | snip momentumunbalanced |
| Bert-Mini | question answeringSQuAD-v1.1 | f1=76.87f1=77.62 | +0.98% | 50%structured 2:4 | snip momentumbalanced |
| Distilbert-base-uncased | question answeringSQuAD-v1.1 | f1=86.90f1=86.15 | -0.86% | 80%structured 4x1 | snip momentumunbalanced |
| Distilbert-base-uncased | question answeringSQuAD-v1.1 | f1=86.90f1=87.50 | +0.69% | 50%structured 2:4 | snip momentumbalanced |
| Bert-base-uncased | question answeringSQuAD-v1.1 | f1=88.59f1=87.78 | -0.92% | 80%structured 4x1 | snip momentumunbalanced |
| Bert-base-uncased | question answeringSQuAD-v1.1 | f1=88.59f1=89.40 | +0.91% | 50%structured 2:4 | snip momentumbalanced |
| Bert-large | question answeringSQuAD-v1.1 | f1=91.23f1=90.91 | -0.35% | 80%structured 4x1 | snip momentumunbalanced |
| Bert-large | question answeringSQuAD-v1.1 | f1=91.23f1=91.67 | +0.48% | 50%structured 2:4 | snip momentumbalanced |
| Bert-Mini | text classificationMRPC | f1=87.52f1=87.22 | -0.34% | 90%structured 4x1 | snip momentumunbalanced |
| Bert-Mini | text classificationMRPC | f1=87.52f1=87.33 | -0.22% | 90%structured 4x1 | snip momentumbalanced |
| Bert-Mini | text classificationMRPC | f1=87.52f1=86.89 | -0.72% | 50%structured 2:4 | snip momentumbalanced |
| Bert-Mini | text classificationMRPC | f1=87.52f1=86.8 | -0.83% | 60%structured per channel | snip momentumunbalanced |
| Distilbert-base-uncased | text classificationMRPC | f1=90.26f1=89.85 | -0.46% | 90%structured 4x1 | snip momentumunbalanced |
| Distilbert-base-uncased | text classificationMRPC | f1=90.26f1=90.88 | +0.69% | 50%structured 2:4 | snip momentumbalanced |
| Bert-Mini | text classificationSST-2 | accuracy=87.61accuracy=86.92 | -0.79% | 90%structured 4x1 | snip momentumunbalanced |
| Bert-Mini | text classificationSST-2 | accuracy=87.61accuracy=87.73 | +0.14% | 50%structured 2:4 | snip momentumbalanced |
| Bert-Mini | text classificationSST-2 | accuracy=87.61accuracy=86.92 | -0.79% | 50%structured per channel | snip momentumunbalanced |
| ResNet50 | image recognitionImageNet | top1 acc = 78.95top1 acc = 80.10 | -1.43% | 75%structured 2x1 | snip momentumunbalanced |
| YOLO-v5s6 | object detectionCOCO | AP0.50:0.95/AP0.50=0.404/0.6AP0.50:0.95/AP0.50=0.393/0.584 | -2.72% | 80%unstructured | snip momentumunbalanced |
| Bert-Large | question answeringSQuAD-v1.1 | f1=91.34f1=90.7 | -0.07% | 80%structured 2x1 | group lassounbalanced |
| Bert-Base | text classificationMNLI | [m, mm] = [84.57, 84.79][m, mm] = [82.45, 83.27] | [-2.51%, -1.80%] | 70%unstructured | Prune once for allbalanced |
| Bert-Base | text classificationMNLI | [m, mm] = [84.57, 84.79][m, mm] = [83.20, 84.11] | [-1.62%, -0.80%] | 50%structured 1:2 | Prune once for allbalanced |
| Bert-Base | text classificationSST-2 | accuracy = 92.32accuracy = 91.51 | -0.88% | 70%unstructured | Prune once for allbalanced |
| Bert-Base | text classificationSST-2 | accuracy = 92.32accuracy = 92.20 | -0.13% | 50%structured 1:2 | Prune once for allbalanced |
| Bert-Base | text classificationSST-2 | accuracy = 92.32accuracy = 91.97 | -0.38% | 20%unstructured | gradient sensitivitybalanced |
| Bert-Base | text classificationQQP | [accuracy, f1] = [91.10, 88.05][accuracy, f1] = [90.48, 87.06] | [-0.68%, -1.12%] | 70%unstructured | Prune once for allbalanced |
| Bert-Base | text classificationQQP | [accuracy, f1] = [91.10, 88.05][accuracy, f1] = [90.92, 87.78] | [-0.20%, -0.31%] | 50%structured 1:2 | Prune once for allbalanced |
| Bert-Base | text classificationQNLI | accuracy = 91.54accuracy = 90.39 | -1.26% | 70%unstructured | Prune once for allbalanced |
| Bert-Base | text classificationQNLI | accuracy = 91.54accuracy = 90.87 | -0.73% | 50%structured 1:2 | Prune once for allbalanced |
| Bert-Base | question answering | [em, f1] = [79.34, 87.10][em, f1] = [77.27, 85.75] | [-2.61%, -1.54%] | 70%unstructured | Prune once for allbalanced |
| Bert-Base | question answering | [em, f1] = [79.34, 87.10][em, f1] = [78.03, 86.50] | [-1.65%, -0.69%] | 50%structured 1:2 | Prune once for allbalanced |
Validated Knowledge Distillation Examples
| Example Name | Dataset | Student (Metrics) |
Teacher (Metrics) |
Student With Distillation (Metrics Improvement) |
Student With Distributed Distillation (Metrics Improvement) |
|---|---|---|---|---|---|
| MobileNet example | CIFAR-10 | MobileNetV2-0.35 (0.7965 ACC) |
WideResNet40-2 (0.9522 ACC) |
0.8178 ACC (0.0213 ACC) |
0.8235 ACC (0.027 ACC) |
| CNN example | CIFAR-100 | CNN-2 (0.5494 ACC) |
CNN-10 (0.7153 ACC) |
0.5540 ACC (0.0046 ACC) |
0.5523 ACC (0.0029 ACC) |
| VGG example | CIFAR-100 | VGG-8-BN (0.7022 ACC) |
VGG-13-BN (0.7415 ACC) |
0.7025 ACC (0.0003 ACC) |
NA |
| ResNet example | ImageNet | ResNet18 (0.6739 ACC) |
ResNet50 (0.7399 ACC) |
0.6845 ACC (0.0106 ACC) |
NA |
| BlendCnn example | MRPC | BlendCnn (0.7034 ACC) |
BERT-Base (0.8382 ACC) |
0.7034 ACC (0 ACC) |
NA |
| BiLSTM example | SST-2 | BiLSTM (0.8314 ACC) |
RoBERTa-Base (0.9403 ACC) |
0.9048 ACC (0.0734 ACC) |
NA |
| DistilBERT example | SQuAD | DistilBERT (0.7323/0.8256 EM/F1) |
BERT-Base (0.8084/0.8814 EM/F1) |
0.7442/0.8371 EM/F1 (0.0119/0.0115 EM/F1) |
NA |
| TinyBERT example | MNLI | TinyBERT (0.8018/0.8044 m/mm) |
BERT-Base (0.8363/0.8411 m/mm) |
0.8025/0.8074 m/mm (0.0007/0.0030 m/mm) |
NA |
| BERT-3 example | QQP | BERT-3 (0.8626/0.8213 EM/F1) |
BERT-Base (0.9091/0.8782 EM/F1) |
0.8684/0.8259 EM/F1 (0.0058/0.0046 EM/F1) |
NA |
| DistilRoBERTa example | COLA | DistilRoBERTa (0.6057 ACC) |
RoBERTa-Large (0.6455 ACC) |
0.6187 ACC (0.0130 ACC) |
NA |
Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime
| Model (ONNX QDQ) | AWS c6i.2xlarge (Intel) CPU Execution Provider |
AWS c6a.2xlarge (AMD) CPU Execution Provider |
AWS c6g.2xlarge (ARM) CPU Execution Provider |
NVidia A100 CUDA Execution Provider |
|---|---|---|---|---|
| ResNet50 | 74.76% | 68.95% | 74.76% | 74.75% |
| BERT-base | 85.54% | 84.56% | 85.54% | 84.31% |
| ResNet50 V1.5 | 72.20% | 67.70% | 72.20% | 72.29% |
| MobileNet V2 | 65.82% | 58.56% | 65.83% | 65.63% |
| SSD MobileNet V1 | 22.45% | 16.53% | 22.45% | 22.35% |
| DistilBERT base MRPC | 84.56% | 83.82% | 84.56% | 84.56% |
| SqueezeNet | 56.54% | 53.52% | 56.54% | 56.55% |
| SSD | 18.63% | 18.54% | 18.63% | 18.61% |
| AlexNet | 54.71% | 47.06% | 54.71% | 54.79% |
| CaffeNet | 56.25% | 52.35% | 56.27% | 56.24% |
| GoogleNet | 67.73% | 63.56% | 67.72% | 67.76% |
| ZFNet | 55.86% | 45.09% | 55.86% | 55.89% |
| Inception V1 | 67.21% | 63.03% | 67.20% | 67.21% |
| SSD MobileNet V1 (ONNX Model Zoo) | 22.86% | 16.94% | 22.80% | 22.87% |
| Mobile bert MRPC | 85.54% | 84.56% | 85.54% | 85.54% |
| Roberta base MRPC | 89.46% | 90.44% | 89.71% | 89.71% |
| ResNet50 V1.5 MLPerf | 76.14% | 72.80% | 76.14% | 76.17% |
| VGG16 | 66.69% | 64.25% | 66.69% | 66.64% |
| VGG16 (ONNX Model Zoo) | 72.31% | 69.35% | 72.32% | 72.34% |
| MobileNet V3 MLPerf | 75.57% | 70.78% | 75.56% | 75.52% |
| EfficientNet | 77.61% | 76.52% | 77.56% | 77.60% |
| MobileNet V2 (ONNX Model Zoo) | 68.51% | 62.48% | 68.58% | 68.48% |
| ShuffleNet V2 | 66.12% | 58.41% | 66.11% | 66.11% |