Validated Models

Intel® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in example tables, and the performance/accuracy results is available here.

  1. Validated Quantization Examples

    1.1. TensorFlow Models with TensorFlow 2.16.1

    1.2. Keras Models with keras 2.15.1

    1.3. PyTorch Models with Torch 2.3.0+cpu in PTQ Mode

    1.4. PyTorch Models with Torch 2.3.0+cpu in QAT Mode

    1.5. PyTorch Models with Torch 2.3.0+cpu in IPEX Mode

    1.6. ONNX Models with ONNX Runtime 1.18.1

  2. Validated Pruning Examples

  3. Validated Knowledge Distillation Examples

  4. Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime

Validated Quantization Examples

System summary: Test by Intel on 7/22/2024. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s]), BIOS EGSDCRB1.SYS.0081.D18.2205301336, microcode 0x2b000590,
Ubuntu 24.04 LTS, gcc (GCC) 13.2.0 (Ubuntu 13.2.0-23ubuntu4), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16.
Using 1 socket, 4 cores/instance, 14 instances and batch size 1 to benchmark most of the model.

Performance varies by use, configuration and other factors.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks

TensorFlow Models with TensorFlow 2.16.1

Model Example Accuracy Performance 1s4c14ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
ResNet50 v1.0 pb 74.11% 74.27% -0.22% 1732.92 578.88 2.99x
ResNet50 v1.5 pb 76.25% 76.46% -0.28% 1535.20 530.00 2.90x
ResNet101 pb 77.52% 76.45% 1.41% 1048.36 384.02 2.73x
Inception V1 pb 70.45% 69.74% +1.03% 2079.24 927.82 2.24x
Inception V2 pb 74.33% 73.97% +0.49% 1644.36 840.53 1.96x
Inception V3 pb 76.72% 76.75% -0.03% 1076.10 401.89 2.68x
Inception V4 pb 80.13% 80.27% -0.18% 704.96 199.28 3.54x
Inception ResNet V2 pb 80.25% 80.40% -0.18% 313.97 178.27 1.76x
DenseNet-161 pb 76.29% 76.29% +0.00% 279.20 214.03 1.30x
MobileNet V1 pb 71.79% 70.96% +1.18% 4199.13 1506.68 2.79x
MobileNet V2 pb 72.48% 71.76% +1.01% 2170.39 1445.05 1.50x
VGG16 pb 72.69% 70.89% +2.55% 1388.62 203.39 6.83x
VGG19 pb 72.67% 71.01% +2.33% 1236.12 169.74 7.28x
ResNet50 pb 69.09% 69.03% +0.09% 411.79 284.53 1.45x
ResNetV2 50 pb 70.37% 69.64% +1.05% 779.42 539.54 1.44x
ResNetV2 101 pb 72.64% 71.87% +1.08% 492.00 295.77 1.66x
ResNetV2 152 pb 73.12% 72.37% +1.04% 348.39 205.72 1.69x
ViT pb 81.39% 81.92% -0.64% 230.53 132.66 1.74x
SSD ResNet50 V1 pb 37.91% 38.00% -0.24% 135.71 28.75 4.72x
SSD MobileNet V1 pb 23.00% 23.13% -0.57% 1237.70 719.30 1.72x
SSD ResNet50 v1 ckpt 37.88% 38.00% -0.31% 130.54 22.05 5.92x
SSD MobileNet v1 ckpt 22.96% 23.13% -0.71% 1234.56 529.34 2.33x
Faster R-CNN ResNet101 pb 30.32% 30.39% -0.22% 144.21 22.64 6.37x
Faster R-CNN ResNet50 pb 26.61% 26.59% +0.09% 164.55 28.38 5.80x
YOLOv3 pb 83.28% 82.35% +1.12% 247.56 81.45 3.04x
BERT large SQuAD pb 92.44% 92.99% -0.58% 49.17 17.52 2.81x
BERT large SQuAD (ONNX Model Zoo) pb 92.36% 92.98% -0.67% 45.06 17.55 2.57x
Transformer LT pb 25.82% 25.86% -0.15% 28.99 15.77 1.84x
Transformer lt MLPerf pb 27.13% 27.17% -0.13% 10.27 5.08 2.02x
Mask R-CNN Inception V2 pb 28.46% 28.73% -0.91% 195.68 50.72 3.86x
Mask R-CNN Inception V2 ckpt 28.46% 28.73% -0.91% 206.14 47.04 4.38x

Keras Models with keras 2.15.1

Model Example Accuracy Performance 1s4c14ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
Inception ResNet V2 pb 80.25% 80.40% -0.18% 313.97 178.27 1.76x
Inception V3 pb 76.72% 76.75% -0.03% 1076.10 401.89 2.68x
MobileNet V2 pb 71.49% 71.76% -0.37% 947.44 779.51 1.22x
ResNet101 pb 77.52% 76.45% +1.41% 1048.36 384.02 2.73x
ResNet50 pb 69.09% 69.03% +0.09% 411.79 284.53 1.45x
ResNet50 pb 78.07% 78.12% -0.06% 680.56 498.08 1.37x
ResNetV2 101 pb 72.64% 71.87% +1.08% 492.00 295.77 1.66x
ResNetV2 50 pb 70.37% 69.64% +1.05% 779.42 539.54 1.44x
VGG16 pb 72.69% 70.89% +2.55% 1388.62 203.39 6.83x
VGG19 pb 72.67% 71.01% +2.33% 1236.12 169.74 7.28x

PyTorch Models with Torch 2.3.0+cpu in PTQ Mode

Model Example Accuracy Performance 1s4c14ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
ResNet18 static 69.59% 69.76% -0.24% 1707.52 602.47 2.83x
EfficientNet-B3 static 77.78% 78.54% -0.98% 513.82 360.02 1.43x
PeleeNet static 71.83% 72.10% -0.37% 837.83 541.66 1.55x
ResNet50 static 75.98% 76.15% -0.21% 1135.22 311.47 3.64x
Inception V3 static 69.46% 69.52% -0.09% 948.03 322.55 2.94x
ResNeSt50 static 80.76% 81.04% -0.35% 406.11 39.66 10.24x
ResNeXt101_32x8d static 78.92% 79.31% -0.49% 582.22 106.73 5.45x
YOLO V3 static 55.10% 54.93% +0.31% 156.29 60.30 2.59x
Roberta base MRPC static 93.14% 93.59% -0.48% 396.85 176.80 2.24x
CamemBERT base MRPC static 88.58% 89.28% -0.78% 405.37 182.87 2.22x
DistilBERT base MRPC static 90.64% 90.27% +0.41% 799.05 346.50 2.31x
DistilBERT base MRPC dynamic 90.02% 90.27% -0.28% 705.91 348.16 2.03x
ALBERT base MRPC static 92.28% 92.28% 0.00% 350.78 164.32 2.13x
Xlm Roberta MRPC static 87.80% 88.62% -0.93% 396.06 175.96 2.25x
Xlm Roberta MRPC dynamic 88.54% 88.24% +0.35% 381.19 175.96 2.17x
BERT base MRPC static 89.59% 90.42% -0.91% 402.42 177.73 2.26x
BERT base COLA static 53.47% 53.39% +0.16% 395.25 177.02 2.23x
BERT base STSB static 87.61% 88.05% -0.49% 397.62 177.23 2.24x
BERT base SST-2 static 91.97% 92.32% -0.37% 407.66 182.93 2.23x
BERT large COLA static 63.39% 63.35% +0.06% 147.86 56.01 2.64x
BERT base RTE static 71.84% 72.56% -1.00% 397.83 177.40 2.24x
BERT large MRPC static 90.07% 90.38% -0.34% 146.84 52.97 2.77x
BERT large QNLI static 91.12% 91.54% -0.46% 394.51 176.92 2.23x
BERT large RTE static 73.65% 74.01% -0.49% 148.84 55.83 2.67x
Funnel MRPC 91.94% 92.25% -0.34% 294.76 187.41 1.57x
BERT large SQuAD static 92.34% 93.16% -0.88% 50.21 18.69 2.69x
lvwerra/pegasus-samsum static 42.32% 42.67% -0.82% 102.73 37.99 2.70x
ResNet18 PT2E static 69.49% 69.76% -0.39% 1873.51 1106.97 1.69x
OPT-125M PT2E static 37.07% 37.90% -2.20% 42.09 29.68 1.42x

PyTorch Models with Torch 2.3.0+cpu in QAT Mode

Model Example Accuracy Performance 1s4c14ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
ResNet18 static 69.74% 69.76% -0.03% 1717.59 602.65 2.85x
ResNet50 static 76.03% 76.15% -0.15% 1091.62 305.83 3.57x
ResNeXt101_32x8d static 79.31% 79.31% 0.00% 584.54 107.38 5.44x

PyTorch Models with Torch 2.3.0+cpu in IPEX Mode

Model Example Accuracy Performance 1s4c14ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
bert-large-uncased-whole-word-masking-finetuned-squad static 93.01% 93.16% -0.16% 150.05 22.42 6.69x
distilbert-base-uncased-distilled-squad static 86.10% 86.84% -0.85% 1034.60 151.13 6.85x

ONNX Models with ONNX Runtime 1.18.1

Model Example Accuracy Performance 1s4c14ins1bs
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
ResNet50 V1.5 qlinearops 72.18% 72.29% -0.16% 1495.72 715.94 2.09x
ResNet50 V1.5 qdq 72.13% 72.29% -0.23% 1547.30 717.03 2.16x
ResNet50 V1.5 MLPerf qlinearops 76.15% 76.46% -0.41% 1365.56 718.55 1.90x
ResNet50 V1.5 MLPerf qdq 76.13% 76.46% -0.44% 1445.75 718.96 2.01x
ResNet50 V1.5 (ONNX Model Zoo) qlinearops 74.77% 74.99% -0.29% 1574.38 749.36 2.10x
ResNet50 V1.5 (ONNX Model Zoo) qdq 74.78% 74.99% -0.27% 1564.15 755.58 2.07x
VGG16 qlinearops 66.55% 66.69% -0.20% 526.57 162.64 3.24x
VGG16 qdq 66.62% 66.69% -0.11% 520.09 172.42 3.02x
VGG16 (ONNX Model Zoo) qlinearops 72.37% 72.40% -0.04% 558.81 162.87 3.43x
VGG16 (ONNX Model Zoo) qdq 72.36% 72.40% -0.04% 556.58 176.92 3.15x
MobileNet V3 MLPerf qlinearops 75.51% 75.74% -0.30% 5421.72 2578.08 2.10x
MobileNet V3 MLPerf qdq 75.51% 75.74% -0.30% 5382.87 2567.48 2.10x
ShuffleNet V2 (ONNX Model Zoo) qlinearops 66.13% 66.36% -0.36% 6426.22 3725.69 1.72x
ShuffleNet V2 (ONNX Model Zoo) qdq 66.22% 66.36% -0.22% 6534.24 3707.74 1.76x
GoogleNet (ONNX Model Zoo) qlinearops 67.69% 67.79% -0.14% 1842.90 1137.58 1.62x
GoogleNet (ONNX Model Zoo) qdq 67.71% 67.79% -0.11% 1818.99 1136.37 1.60x
SqueezeNet (ONNX Model Zoo) qlinearops 56.49% 56.87% -0.67% 9521.99 5530.36 1.72x
SqueezeNet (ONNX Model Zoo) qdq 56.49% 56.87% -0.67% 9391.07 5519.79 1.70x
CaffeNet (ONNX Model Zoo) qlinearops 56.26% 56.30% -0.07% 2949.36 893.77 3.30x
CaffeNet (ONNX Model Zoo) qdq 56.26% 56.30% -0.08% 2847.24 901.15 3.16x
AlexNet (ONNX Model Zoo) qlinearops 54.73% 54.79% -0.10% 2070.17 816.71 2.53x
AlexNet (ONNX Model Zoo) qdq 54.71% 54.79% -0.14% 2059.13 844.97 2.44x
ZFNet (ONNX Model Zoo) qlinearops 55.83% 55.96% -0.24% 858.76 461.25 1.86x
ZFNet (ONNX Model Zoo) qdq 55.87% 55.96% -0.16% 853.77 457.91 1.86x
Inception V1 (ONNX Model Zoo) qlinearops 67.23% 67.24% -0.02% 1891.36 1205.95 1.57x
Inception V1 (ONNX Model Zoo) qdq 67.23% 67.24% -0.02% 1879.27 1202.19 1.56x
BEiT (ONNX Model Zoo) qlinearops 85.07% 85.28% -0.25% 205.15 126.59 1.62x
EfficientNet (ONNX Model Zoo) qlinearops 77.02% 77.11% -0.12% 2428.32 1344.03 1.81x
EfficientNet (ONNX Model Zoo) qdq 76.99% 77.11% -0.16% 2286.73 1307.18 1.75x
DenseNet (ONNX Model Zoo) qlinearops 60.53% 60.96% -0.71% 626.26 499.76 1.25x
SSD MobileNet V1 (ONNX Model Zoo) qlinearops 22.96% 23.02% -0.27% 1121.43 841.32 1.33x
SSD MobileNet V1 (ONNX Model Zoo) qdq 22.96% 23.02% -0.27% 1048.50 798.22 1.31x
DUC (ONNX Model Zoo) qlinearops 81.62% 81.92% -0.37% 9.26 4.99 1.86x
Ultra Face (ONNX Model Zoo) qlinearops 83.33% 83.65% -0.38% 8993.58 1988.46 4.52x
Emotion FERPlus (ONNX Model Zoo) qlinearops 7.94% 8.00% -0.70% 6113.74 3087.50 1.98x
ArcFace (ONNX Model Zoo) qlinearops 99.82% 99.80% +0.02% 442.85 230.75 1.92x
BERT base MRPC qlinearops 85.54% 86.03% -0.57% 483.81 219.45 2.20x
BERT base MRPC qdq 85.54% 86.03% -0.57% 485.08 218.33 2.22x
BERT base MRPC integerops 85.29% 86.03% -0.85% 684.46 218.86 3.13x
DistilBERT base MRPC qdq 84.07% 84.56% -0.58% 633.28 399.31 1.59x
DistilBERT base MRPC integerops 85.54% 84.56% +1.16% 1388.44 401.08 3.46x
Mobile bert MRPC qdq 85.54% 86.28% -0.85% 505.62 387.43 1.31x
Mobile bert MRPC integerops 85.54% 86.28% -0.85% 565.46 386.39 1.46x
Roberta base MRPC integerops 90.93% 89.95% +1.09% 702.17 219.50 3.20x
BERT SQuAD (ONNX Model Zoo) integerops 80.29% 80.67% -0.47% 242.58 97.71 2.48x
MobileBERT SQuAD MLPerf (ONNX Model Zoo) integerops 89.87% 90.03% -0.17% 151.69 125.35 1.21x
GPT2 lm head WikiText (ONNX Model Zoo) integerops 31.98% 29.00% +10.31% 17.96 10.21 1.76x
BERT base uncased MRPC (HuggingFace) qlinearops 90.21% 90.42% -0.23% 434.65 210.58 2.06x
BERT base uncased MRPC (HuggingFace) integerops 89.58% 90.42% -0.93% 708.66 210.74 3.36x
Roberta base MRPC (HuggingFace) qlinearops 91.00% 91.38% -0.41% 431.37 211.03 2.04x
Roberta base MRPC (HuggingFace) integerops 90.85% 91.38% -0.58% 711.11 210.71 3.37x
XLM Roberta base MRPC (HuggingFace) qlinearops 89.37% 90.10% -0.81% 334.88 211.56 1.58x
XLM Roberta base MRPC (HuggingFace) integerops 89.66% 90.10% -0.50% 401.99 211.43 1.90x
Camembert base MRPC (HuggingFace) qlinearops 89.28% 89.28% 0.00% 282.30 213.33 1.32x
Camembert base MRPC (HuggingFace) integerops 89.19% 89.28% -0.10% 707.22 214.23 3.30x
MiniLM L12 H384 uncased MRPC (HuggingFace) qlinearops 90.13% 90.97% -0.93% 1188.05 578.35 2.05x
MiniLM L12 H384 uncased MRPC (HuggingFace) integerops 91.07% 90.97% +0.10% 1285.13 576.04 2.23x
DistilBERT base uncased SST-2 (HuggingFace) qlinearops 90.71% 91.06% -0.38% 1259.69 396.60 3.18x
DistilBERT base uncased SST-2 (HuggingFace) integerops 90.25% 91.06% -0.88% 914.63 395.09 2.32x
Albert base v2 SST-2 (HuggingFace) qlinearops 92.09% 92.32% -0.25% 284.62 210.52 1.35x
Albert base v2 SST-2 (HuggingFace) integerops 91.74% 92.32% -0.62% 284.69 210.00 1.36x
MiniLM L6 H384 uncased SST-2 (HuggingFace) qlinearops 89.45% 90.14% -0.76% 2172.98 1121.66 1.94x
MiniLM L6 H384 uncased SST-2 (HuggingFace) integerops 89.91% 90.14% -0.26% 2326.27 1114.57 2.09x
BERT base cased MRPC (HuggingFace) qlinearops 87.70% 88.29% -0.67% 494.96 210.80 2.35x
BERT base cased MRPC (HuggingFace) integerops 88.19% 88.29% -0.12% 714.61 210.99 3.39x
Electra small discriminator MRPC (HuggingFace) qlinearops 89.92% 89.83% +0.09% 1998.71 1115.18 1.79x
Electra small discriminator MRPC (HuggingFace) integerops 89.27% 89.83% -0.63% 2202.81 1121.41 1.96x
BERT mini MRPC (HuggingFace) qlinearops 86.21% 86.52% -0.35% 5767.23 3254.79 1.77x
BERT mini MRPC (HuggingFace) integerops 86.16% 86.52% -0.41% 6354.66 3424.42 1.86x
Xlnet base cased MRPC (HuggingFace) qlinearops 90.05% 89.86% +0.21% 121.24 95.56 1.27x
Xlnet base cased MRPC (HuggingFace) integerops 89.58% 89.86% -0.31% 123.06 95.60 1.29x
BART large MRPC (HuggingFace) integerops 92.36% 91.20% +1.28% 126.14 51.06 2.47x
DeBERTa v3 base MRPC (HuggingFace) integerops 92.39% 92.23% +0.17% 193.16 153.16 1.26x
Spanbert SQuAD (HuggingFace) qlinearops 91.14% 91.98% -0.91% 81.96 43.36 1.89x
Spanbert SQuAD (HuggingFace) integerops 91.40% 91.98% -0.63% 101.71 43.37 2.35x
Bert base multilingual cased SQuAD (HuggingFace) qlinearops 88.42% 89.13% -0.79% 86.33 43.27 2.00x
Bert base multilingual cased SQuAD (HuggingFace) integerops 88.70% 89.13% -0.48% 101.78 43.24 2.35x
DistilBert base uncased SQuAD (HuggingFace) qlinearops 86.33% 86.86% -0.62% 120.71 69.72 1.73x
DistilBert base uncased SQuAD (HuggingFace) integerops 86.05% 86.86% -0.94% 203.71 69.68 2.92x
BERT large uncased whole word masking SQuAD (HuggingFace) qlinearops 92.34% 93.16% -0.88% 31.81 12.94 2.46x
BERT large uncased whole word masking SQuAD (HuggingFace) integerops 92.99% 93.16% -0.18% 35.83 12.94 2.77x
Roberta large SQuAD v2 (HuggingFace) qlinearops 89.03% 89.02% +0.02% 17.61 13.27 1.33x
Roberta large SQuAD v2 (HuggingFace) integerops 89.04% 89.02% +0.02% 35.85 13.26 2.70x
GPT2 WikiText (HuggingFace) qlinearops 30.25% 29.00% +4.33% 13.85 10.17 1.36x
GPT2 WikiText (HuggingFace) integerops 29.68% 29.00% +2.36% 14.64 10.09 1.45x
DistilGPT2 WikiText (HuggingFace) qlinearops 44.93% 43.43% +3.46% 21.80 17.13 1.27x
DistilGPT2 WikiText (HuggingFace) integerops 44.62% 43.43% +2.74% 23.02 17.09 1.35x
LayoutLMv3 FUNSD (HuggingFace) integerops 90.07% 90.49% -0.46% 39.50 28.00 1.41x
CodeBert (HuggingFace) qlinearops 64.97% 65.41% -0.67% 75.69 45.10 1.68x
CodeBert (HuggingFace) integerops 64.93% 65.41% -0.73% 94.47 45.10 2.09x
FCN (ONNX Model Zoo) qlinearops 64.54% 64.98% -0.67% 25.83 12.90 2.00x
FCN (ONNX Model Zoo) qdq 64.54% 64.98% -0.67% 25.97 12.99 2.00x

Validated Pruning Examples

Model Task
Dataset
Dense Accuracy
Sparse Accuracy
Relative Drop Sparsity ratio
Sparsity Pattern
Comments
Balanced
or unbalanced ratio
Bert-Mini question answering
SQuAD-v1.1
f1=76.87
f1=76.2
-0.80% 80%
structured 4x1
snip momentum
unbalanced
Bert-Mini question answering
SQuAD-v1.1
f1=76.87
f1=76.2
-0.80% 80%
structured 4x1
snip momentum
unbalanced
Bert-Mini question answering
SQuAD-v1.1
f1=76.87
f1=77.62
+0.98% 50%
structured 2:4
snip momentum
balanced
Distilbert-base-uncased question answering
SQuAD-v1.1
f1=86.90
f1=86.15
-0.86% 80%
structured 4x1
snip momentum
unbalanced
Distilbert-base-uncased question answering
SQuAD-v1.1
f1=86.90
f1=87.50
+0.69% 50%
structured 2:4
snip momentum
balanced
Bert-base-uncased question answering
SQuAD-v1.1
f1=88.59
f1=87.78
-0.92% 80%
structured 4x1
snip momentum
unbalanced
Bert-base-uncased question answering
SQuAD-v1.1
f1=88.59
f1=89.40
+0.91% 50%
structured 2:4
snip momentum
balanced
Bert-large question answering
SQuAD-v1.1
f1=91.23
f1=90.91
-0.35% 80%
structured 4x1
snip momentum
unbalanced
Bert-large question answering
SQuAD-v1.1
f1=91.23
f1=91.67
+0.48% 50%
structured 2:4
snip momentum
balanced
Bert-Mini text classification
MRPC
f1=87.52
f1=87.22
-0.34% 90%
structured 4x1
snip momentum
unbalanced
Bert-Mini text classification
MRPC
f1=87.52
f1=87.33
-0.22% 90%
structured 4x1
snip momentum
balanced
Bert-Mini text classification
MRPC
f1=87.52
f1=86.89
-0.72% 50%
structured 2:4
snip momentum
balanced
Bert-Mini text classification
MRPC
f1=87.52
f1=86.8
-0.83% 60%
structured per channel
snip momentum
unbalanced
Distilbert-base-uncased text classification
MRPC
f1=90.26
f1=89.85
-0.46% 90%
structured 4x1
snip momentum
unbalanced
Distilbert-base-uncased text classification
MRPC
f1=90.26
f1=90.88
+0.69% 50%
structured 2:4
snip momentum
balanced
Bert-Mini text classification
SST-2
accuracy=87.61
accuracy=86.92
-0.79% 90%
structured 4x1
snip momentum
unbalanced
Bert-Mini text classification
SST-2
accuracy=87.61
accuracy=87.73
+0.14% 50%
structured 2:4
snip momentum
balanced
Bert-Mini text classification
SST-2
accuracy=87.61
accuracy=86.92
-0.79% 50%
structured per channel
snip momentum
unbalanced
ResNet50 image recognition
ImageNet
top1 acc = 78.95
top1 acc = 80.10
-1.43% 75%
structured 2x1
snip momentum
unbalanced
YOLO-v5s6 object detection
COCO
AP0.50:0.95/AP0.50=0.404/0.6
AP0.50:0.95/AP0.50=0.393/0.584
-2.72% 80%
unstructured
snip momentum
unbalanced
Bert-Large question answering
SQuAD-v1.1
f1=91.34
f1=90.7
-0.07% 80%
structured 2x1
group lasso
unbalanced
Bert-Base text classification
MNLI
[m, mm] = [84.57, 84.79]
[m, mm] = [82.45, 83.27]
[-2.51%, -1.80%] 70%
unstructured
Prune once for all
balanced
Bert-Base text classification
MNLI
[m, mm] = [84.57, 84.79]
[m, mm] = [83.20, 84.11]
[-1.62%, -0.80%] 50%
structured 1:2
Prune once for all
balanced
Bert-Base text classification
SST-2
accuracy = 92.32
accuracy = 91.51
-0.88% 70%
unstructured
Prune once for all
balanced
Bert-Base text classification
SST-2
accuracy = 92.32
accuracy = 92.20
-0.13% 50%
structured 1:2
Prune once for all
balanced
Bert-Base text classification
SST-2
accuracy = 92.32
accuracy = 91.97
-0.38% 20%
unstructured
gradient sensitivity
balanced
Bert-Base text classification
QQP
[accuracy, f1] = [91.10, 88.05]
[accuracy, f1] = [90.48, 87.06]
[-0.68%, -1.12%] 70%
unstructured
Prune once for all
balanced
Bert-Base text classification
QQP
[accuracy, f1] = [91.10, 88.05]
[accuracy, f1] = [90.92, 87.78]
[-0.20%, -0.31%] 50%
structured 1:2
Prune once for all
balanced
Bert-Base text classification
QNLI
accuracy = 91.54
accuracy = 90.39
-1.26% 70%
unstructured
Prune once for all
balanced
Bert-Base text classification
QNLI
accuracy = 91.54
accuracy = 90.87
-0.73% 50%
structured 1:2
Prune once for all
balanced
Bert-Base question answering [em, f1] = [79.34, 87.10]
[em, f1] = [77.27, 85.75]
[-2.61%, -1.54%] 70%
unstructured
Prune once for all
balanced
Bert-Base question answering [em, f1] = [79.34, 87.10]
[em, f1] = [78.03, 86.50]
[-1.65%, -0.69%] 50%
structured 1:2
Prune once for all
balanced

Validated Knowledge Distillation Examples

Example Name Dataset Student
(Metrics)
Teacher
(Metrics)
Student With Distillation
(Metrics Improvement)
Student With
Distributed Distillation
(Metrics Improvement)
MobileNet example CIFAR-10 MobileNetV2-0.35
(0.7965 ACC)
WideResNet40-2
(0.9522 ACC)
0.8178 ACC
(0.0213 ACC)
0.8235 ACC
(0.027 ACC)
CNN example CIFAR-100 CNN-2
(0.5494 ACC)
CNN-10
(0.7153 ACC)
0.5540 ACC
(0.0046 ACC)
0.5523 ACC
(0.0029 ACC)
VGG example CIFAR-100 VGG-8-BN
(0.7022 ACC)
VGG-13-BN
(0.7415 ACC)
0.7025 ACC
(0.0003 ACC)
NA
ResNet example ImageNet ResNet18
(0.6739 ACC)
ResNet50
(0.7399 ACC)
0.6845 ACC
(0.0106 ACC)
NA
BlendCnn example MRPC BlendCnn
(0.7034 ACC)
BERT-Base
(0.8382 ACC)
0.7034 ACC
(0 ACC)
NA
BiLSTM example SST-2 BiLSTM
(0.8314 ACC)
RoBERTa-Base
(0.9403 ACC)
0.9048 ACC
(0.0734 ACC)
NA
DistilBERT example SQuAD DistilBERT
(0.7323/0.8256 EM/F1)
BERT-Base
(0.8084/0.8814 EM/F1)
0.7442/0.8371 EM/F1
(0.0119/0.0115 EM/F1)
NA
TinyBERT example MNLI TinyBERT
(0.8018/0.8044 m/mm)
BERT-Base
(0.8363/0.8411 m/mm)
0.8025/0.8074 m/mm
(0.0007/0.0030 m/mm)
NA
BERT-3 example QQP BERT-3
(0.8626/0.8213 EM/F1)
BERT-Base
(0.9091/0.8782 EM/F1)
0.8684/0.8259 EM/F1
(0.0058/0.0046 EM/F1)
NA
DistilRoBERTa example COLA DistilRoBERTa
(0.6057 ACC)
RoBERTa-Large
(0.6455 ACC)
0.6187 ACC
(0.0130 ACC)
NA

Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime

Model (ONNX QDQ) AWS c6i.2xlarge (Intel)
CPU Execution Provider
AWS c6a.2xlarge (AMD)
CPU Execution Provider
AWS c6g.2xlarge (ARM)
CPU Execution Provider
NVidia A100
CUDA Execution
Provider
ResNet50 74.76% 68.95% 74.76% 74.75%
BERT-base 85.54% 84.56% 85.54% 84.31%
ResNet50 V1.5 72.20% 67.70% 72.20% 72.29%
MobileNet V2 65.82% 58.56% 65.83% 65.63%
SSD MobileNet V1 22.45% 16.53% 22.45% 22.35%
DistilBERT base MRPC 84.56% 83.82% 84.56% 84.56%
SqueezeNet 56.54% 53.52% 56.54% 56.55%
SSD 18.63% 18.54% 18.63% 18.61%
AlexNet 54.71% 47.06% 54.71% 54.79%
CaffeNet 56.25% 52.35% 56.27% 56.24%
GoogleNet 67.73% 63.56% 67.72% 67.76%
ZFNet 55.86% 45.09% 55.86% 55.89%
Inception V1 67.21% 63.03% 67.20% 67.21%
SSD MobileNet V1 (ONNX Model Zoo) 22.86% 16.94% 22.80% 22.87%
Mobile bert MRPC 85.54% 84.56% 85.54% 85.54%
Roberta base MRPC 89.46% 90.44% 89.71% 89.71%
ResNet50 V1.5 MLPerf 76.14% 72.80% 76.14% 76.17%
VGG16 66.69% 64.25% 66.69% 66.64%
VGG16 (ONNX Model Zoo) 72.31% 69.35% 72.32% 72.34%
MobileNet V3 MLPerf 75.57% 70.78% 75.56% 75.52%
EfficientNet 77.61% 76.52% 77.56% 77.60%
MobileNet V2 (ONNX Model Zoo) 68.51% 62.48% 68.58% 68.48%
ShuffleNet V2 66.12% 58.41% 66.11% 66.11%