Validated Models¶
Validated Quantization Examples
1.1. TensorFlow Models with TensorFlow 2.10.0
1.2. PyTorch Models with Torch 1.12.1+cpu in PTQ Mode
1.3. PyTorch Models with Torch 1.12.1+cpu in QAT Mode
1.4. PyTorch Models with Torch and Intel® Extension for PyTorch* 1.11.0+cpu
Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime
Validated Quantization Examples¶
Performance results test on 09/24/2022 with Intel Xeon Platinum 8380 Scalable processor, using 1 socket, 4 cores/instance, 8 instances and batch size 1.
Performance varies by use, configuration and other factors. See platform configuration for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
TensorFlow Models with TensorFlow 2.10.0¶
Model | Example | Accuracy | Performance Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||
EfficientNet | pb | 76.74% | 76.76% | -0.03% | 91.43 | 69.41 | 1.32x |
Faster R-CNN Inception ResNet V2 | pb | 37.65% | 38.33% | -1.77% | 2.53 | 1.62 | 1.57x |
Faster R-CNN Inception ResNet V2 | SavedModel | 37.77% | 38.33% | -1.46% | 2.54 | 1.61 | 1.58x |
Faster R-CNN ResNet101 | pb | 30.34% | 30.39% | -0.16% | 27.63 | 13.12 | 2.11x |
Faster R-CNN ResNet101 | SavedModel | 30.33% | 30.39% | -0.20% | 27.71 | 11.06 | 2.51x |
Faster R-CNN ResNet50 | pb | 26.65% | 26.59% | 0.23% | 33.64 | 16.33 | 2.06x |
Inception ResNet V2 | pb | 80.34% | 80.40% | -0.07% | 29.25 | 23.43 | 1.25x |
Inception V1 | pb | 70.44% | 69.74% | 1.00% | 163.14 | 133.44 | 1.22x |
Inception V2 | pb | 74.34% | 73.97% | 0.50% | 133.49 | 111.5 | 1.20x |
Inception V3 | pb | 76.71% | 76.75% | -0.05% | 91.67 | 64.02 | 1.43x |
Inception V4 | pb | 80.18% | 80.27% | -0.11% | 56.87 | 37.09 | 1.53x |
Mask R-CNN Inception V2 | pb | 28.50% | 28.73% | -0.80% | 36.06 | 27.15 | 1.33x |
Mask R-CNN Inception V2 | CKPT | 28.50% | 28.73% | -0.80% | 36.1 | 25.06 | 1.44x |
MobileNet V1 | pb | 71.85% | 70.96% | 1.25% | 374.38 | 226.03 | 1.66x |
MobileNet V2 | pb | 71.85% | 70.96% | 1.25% | 374.38 | 226.03 | 1.66x |
ResNet101 | pb | 77.50% | 76.45% | 1.37% | 92.47 | 65.56 | 1.41x |
ResNet50 Fashion | pb | 78.04% | 78.12% | -0.10% | 359.18 | 244.38 | 1.47x |
ResNet50 V1.0 | pb | 74.11% | 74.27% | -0.22% | 172.66 | 87.28 | 1.98x |
ResNet50 V1.5 | pb | 76.23% | 76.46% | -0.30% | 153.37 | 87.24 | 1.76x |
SSD MobileNet V1 | pb | 23.12% | 23.13% | -0.04% | 151.92 | 112.24 | 1.35x |
SSD MobileNet V1 | CKPT | 23.11% | 23.13% | -0.09% | 153.18 | 67.79 | 2.26x |
SSD ResNet34 | pb | 21.71% | 22.09% | -1.72% | 30.99 | 8.65 | 3.58x |
SSD ResNet50 V1 | pb | 37.76% | 38.00% | -0.63% | 23.04 | 14.75 | 1.56x |
SSD ResNet50 V1 | CKPT | 37.82% | 38.00% | -0.47% | 23 | 11.94 | 1.93x |
VGG16 | pb | 72.64% | 70.89% | 2.47% | 178.99 | 83.67 | 2.14x |
VGG19 | pb | 72.69% | 71.01% | 2.37% | 156.11 | 71.5 | 2.18x |
PyTorch Models with Torch 1.12.1+cpu in PTQ Mode¶
Model | Example | Accuracy | Performance Throughput (samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Acc Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||
ALBERT base MRPC | EAGER | 88.85% | 88.50% | 0.40% | 26 | 21.22 | 1.23x |
Barthez MRPC | EAGER | 83.92% | 83.81% | 0.14% | 128.66 | 70.86 | 1.82x |
BERT base MRPC | FX | 89.90% | 90.69% | -0.88% | 203.38 | 101.29 | 2.01x |
BERT base RTE | FX | 69.31% | 69.68% | -0.53% | 216.22 | 102.72 | 2.10x |
BERT base SST2 | FX | 91.06% | 91.86% | -0.88% | 218.2 | 101.86 | 2.14x |
BERT base STSB | FX | 64.12% | 62.57% | 2.48% | 73.65 | 29.61 | 2.49x |
BERT large COLA | FX | 92.79 | 93.16 | -0.39% | 36.54 | 9.89 | 3.70x |
BERT large MRPC | FX | 89.50% | 90.38% | -0.97% | 74.11 | 29.69 | 2.50x |
BERT large QNLI | FX | 90.90% | 91.82% | -1.00% | 72.45 | 29.66 | 2.44x |
BERT large RTE | FX | 73.65% | 74.01% | -0.49% | 41.53 | 29.67 | 1.40x |
BlendCNN | EAGER | 68.40% | 68.40% | 0.00% | 3878.48 | 3717.52 | 1.04x |
CamemBERT base MRPC | EAGER | 86.70% | 86.82% | -0.14% | 188.97 | 98.9 | 1.91x |
Ctrl MRPC | EAGER | 81.87% | 81.22% | 0.80% | 18.68 | 7.25 | 2.58x |
Deberta MRPC | EAGER | 90.88% | 90.91% | -0.04% | 124.43 | 68.74 | 1.81x |
DistilBERT base MRPC | EAGER | 88.23% | 89.16% | -1.05% | 347.47 | 200.76 | 1.73x |
DistilBERT base MRPC FX | FX | 88.54% | 89.16% | -0.69% | 382.74 | 198.25 | 1.93x |
FlauBERT MRPC | EAGER | 79.87% | 80.19% | -0.40% | 561.35 | 370.2 | 1.52x |
HuBERT | FX | 97.69% | 97.84% | -0.15% | 9.82 | 7.2 | 1.36x |
Inception V3 | EAGER | 69.43% | 69.52% | -0.13% | 409.34 | 181.95 | 2.25x |
Longformer MRPC | EAGER | 91.01% | 91.46% | -0.49% | 18.73 | 14.66 | 1.28x |
mBart WNLI | EAGER | 56.34% | 56.34% | 0.00% | 54.35 | 25.14 | 2.16x |
MobileNet V2 | EAGER | 70.54% | 71.84% | -1.81% | 639.87 | 490.05 | 1.31x |
lvwerra/pegasus-samsum | EAGER | 42.1 | 42.67 | -1.35% | 3.41 | 1.07 | 3.19x |
PeleeNet | EAGER | 71.64% | 72.10% | -0.64% | 419.42 | 316.98 | 1.32x |
ResNet18 | EAGER | 69.57% | 69.76% | -0.27% | 686.03 | 332.13 | 2.07x |
ResNet18 | FX | 69.54% | 69.76% | -0.31% | 611.36 | 333.27 | 1.83x |
ResNet50 | EAGER | 75.98% | 76.15% | -0.21% | 327.14 | 162.46 | 2.01x |
ResNeXt101_32x8d | EAGER | 79.08% | 79.31% | -0.29% | 175.93 | 61.09 | 2.88x |
Roberta Base MRPC | EAGER | 88.25% | 88.18% | 0.08% | 197.96 | 99.35 | 1.99x |
Se_ResNeXt50_32x4d | EAGER | 78.98% | 79.08% | -0.13% | 308.19 | 144.6 | 2.13x |
SqueezeBERT MRPC | EAGER | 86.87% | 87.65% | -0.89% | 186.26 | 155.67 | 1.20x |
SSD ResNet 34 | FX | 19.52 | 19.63 | -0.59% | 19.09 | 6.88 | 2.78x |
Transfo-xl MRPC | EAGER | 81.97% | 81.20% | 0.94% | 9.65 | 7.06 | 1.37x |
Wave2Vec2 | FX | 95.71% | 96.60% | -0.92% | 23.69 | 19.58 | 1.21x |
Xlm Roberta base MRPC | EAGER | 88.03% | 88.62% | -0.67% | 114.31 | 99.34 | 1.15x |
YOLO V3 | EAGER | 24.60% | 24.54% | 0.21% | 71.81 | 31.38 | 2.29x |
PyTorch Models with Torch 1.12.1+cpu in QAT Mode¶
Model | Example | Accuracy | Performance Throughput (samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Acc Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||
ResNet18 | EAGER | 69.84% | 69.76% | 0.11% | 690.73 | 330.85 | 2.09x |
ResNet18 | FX | 69.74% | 69.76% | -0.03% | 614.83 | 334.35 | 1.84x |
BERT base MRPC QAT | FX | 89.70% | 89.46% | 0.27% | 127.45 | 82.68 | 1.54x |
ResNet50 | EAGER | 76.05% | 76.15% | -0.13% | 410.44 | 168.81 | 2.43x |
PyTorch Models with Torch and Intel® Extension for PyTorch* 1.11.0+cpu¶
Model | Example | Accuracy | Performance Throughput (samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Acc Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||
bert-large-uncased-whole-word-masking-finetuned-squad | IPEX | 92.9 | 93.16 | -0.28% | 31.35 | 9.97 | 3.14x |
ResNeXt101_32x16d_wsl | IPEX | 69.48% | 69.76% | -0.40% | 1189.15 | 680 | 1.75x |
ResNet50 | IPEX | 76.07% | 76.15% | -0.10% | 677.69 | 381.59 | 1.78x |
SSD ResNet34 | IPEX | 19.95% | 20.00% | -0.25% | 24.07 | 6.71 | 3.59x |
DistilBERT base MRPC | IPEX | 86 | 86.84 | -0.96% | 98.02 | 62.4 | 1.57x |
ONNX Models with ONNX Runtime 1.12.1¶
Model | Example | Accuracy | Performance Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Acc Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||
AlexNet | QLinear | 54.73% | 54.79% | -0.11% | 960.18 | 469.17 | 2.05x |
AlexNet | QDQ | 54.71% | 54.79% | -0.15% | 962.71 | 466.56 | 2.06x |
ArcFace | QLinear | 99.80% | 99.80% | 0.00% | 235.14 | 130 | 1.81x |
BERT base MRPC DYNAMIC | QLinear | 85.29% | 86.03% | -0.86% | 294.05 | 125.85 | 2.34x |
BERT base MRPC STATIC | QLinear | 85.29% | 86.03% | -0.86% | 604.07 | 256.93 | 2.35x |
BERT SQuAD | QLinear | 80.44 | 80.67 | -0.29% | 93.21 | 51.45 | 1.81x |
BERT SQuAD | QDQ | 80.44 | 80.67 | -0.29% | 93.27 | 51.67 | 1.80x |
CaffeNet | QLinear | 56.21% | 56.30% | -0.16% | 1501.21 | 536.1 | 2.80x |
CaffeNet | QDQ | 56.25% | 56.30% | -0.09% | 1493.36 | 533.09 | 2.80x |
DistilBERT base MRPC | QLinear | 84.80% | 84.56% | 0.28% | 1372.84 | 485.95 | 2.83x |
DistilBERT base MRPC | QDQ | 84.56% | 84.56% | 0.00% | 541.43 | 480.25 | 1.13x |
EfficientNet | QLinear | 77.57% | 77.70% | -0.17% | 1250.63 | 753.09 | 1.66x |
EfficientNet | QDQ | 77.61% | 77.70% | -0.12% | 1130.67 | 748.12 | 1.51x |
Emotion Ferplus | QLinear | 7.86% | 8.00% | -1.75% | 336.52 | 163.72 | 2.06x |
Faster R-CNN | QLinear | 34.05% | 34.37% | -0.93% | 16.36 | 6.18 | 2.65x |
Faster R-CNN | QDQ | 33.97% | 34.37% | -1.16% | 10.26 | 6.18 | 1.66x |
FCN | QLinear | 64.54% | 64.98% | -0.67% | 40.05 | 12.08 | 3.31x |
FCN QDQ | QDQ | 64.65% | 64.98% | -0.50% | 26.73 | 12.04 | 2.22x |
GoogleNet | QLinear | 67.71% | 67.79% | -0.12% | 740.16 | 587.54 | 1.26x |
GoogleNet | QDQ | 67.73% | 67.79% | -0.09% | 770.51 | 567.88 | 1.36x |
Inception V1 | QLinear | 67.21% | 67.24% | -0.04% | 824.15 | 601.92 | 1.37x |
Inception V1 | QDQ | 67.21% | 67.24% | -0.04% | 819.85 | 597.46 | 1.37x |
Mask R-CNN | QLinear | 33.41% | 33.72% | -0.92% | 14.18 | 5.78 | 2.45x |
Mask R-CNN | QDQ | 33.30% | 33.72% | -1.25% | 9.42 | 5.7 | 1.65x |
Mobile bert MRPC | QLinear | 86.27% | 86.27% | 0.00% | 613.72 | 506.41 | 1.21x |
MobileBERT SQuAD MLPerf | QLinear | 89.82 | 90.03 | -0.23% | 88.41 | 76.07 | 1.16x |
MobileNet V2 | QLinear | 65.59% | 66.89% | -1.94% | 2454.53 | 1543.79 | 1.59x |
MobileNet V2 | QDQ | 65.82% | 66.89% | -1.60% | 2164.97 | 1564.21 | 1.38x |
MobileNet V3 MLPerf | QLinear | 75.58% | 75.74% | -0.21% | 2147.42 | 1046.69 | 2.05x |
MobileNet V3 MLPerf | QDQ | 75.57% | 75.74% | -0.22% | 1877.1 | 1054.88 | 1.78x |
MobileNetV2 (ONNX Model Zoo) | QLinear | 68.38% | 69.48% | -1.58% | 2751.7 | 1797.64 | 1.53x |
MobileNetV2 (ONNX Model Zoo) | QDQ | 68.51% | 69.48% | -1.40% | 2656.23 | 1835.74 | 1.45x |
ResNet50 v1.5 MLPerf | QLinear | 0.7615 | 0.7646 | -0.41% | 764.901 | 434.141 | 1.76x |
ResNet50 v1.5 MLPerf | QDQ | 0.7614 | 0.7646 | -0.42% | 575.952 | 433.75 | 1.33x |
ResNet50 V1.5 | QLinear | 0.7226 | 0.7229 | -0.04% | 761.12 | 432.615 | 1.76x |
ResNet50 V1.5 | QDQ | 0.722 | 0.7229 | -0.12% | 575.032 | 432.894 | 1.33x |
ResNet50 V1.5 (ONNX Model Zoo) | QLinear | 74.81% | 74.99% | -0.24% | 885.64 | 454.02 | 1.95x |
ResNet50 V1.5 (ONNX Model Zoo) | QDQ | 74.76% | 74.99% | -0.31% | 603.72 | 455.86 | 1.32x |
Roberta Base MRPC | QLinear | 89.71% | 89.95% | -0.27% | 644.636 | 254.791 | 2.53x |
ShuffleNet V2 | QLinear | 66.13% | 66.36% | -0.35% | 2298.55 | 1480.87 | 1.55x |
ShuffleNet V2 | QDQ | 66.12% | 66.36% | -0.36% | 1951.11 | 1490.78 | 1.31x |
SqueezeNet | QLinear | 56.54% | 56.87% | -0.58% | 2588.97 | 1605.92 | 1.61x |
SqueezeNet | QDQ | 56.54% | 56.87% | -0.58% | 2566.18 | 1936.79 | 1.32x |
SSD MobileNet V1 | QLinear | 22.45% | 23.10% | -2.81% | 725.83 | 570.24 | 1.27x |
SSD MobileNet V1 | QDQ | 22.45% | 23.10% | -2.81% | 666.01 | 539.77 | 1.23x |
SSD MobileNet V1 (ONNX Model Zoo) | QLinear | 22.86% | 23.03% | -0.74% | 641.56 | 519.93 | 1.23x |
SSD MobileNet V1 (ONNX Model Zoo) | QDQ | 22.86% | 23.03% | -0.74% | 633.61 | 492.5 | 1.29x |
SSD MobileNet V2 | QLinear | 24.04% | 24.68% | -2.59% | 542.68 | 401.56 | 1.35x |
SSD | QLinear | 18.84% | 18.98% | -0.74% | 31.33 | 8.87 | 3.53x |
SSD | QDQ | 18.63% | 18.98% | -1.84% | 23.98 | 8.95 | 2.68x |
Tiny YOLOv3 | QLinear | 12.08% | 12.43% | -2.82% | 648.62 | 518.97 | 1.25x |
VGG16 | QLinear | 66.67% | 66.69% | -0.03% | 221.93 | 99.51 | 2.23x |
VGG16 (ONNX Model Zoo) | QLinear | 72.32% | 72.40% | -0.11% | 319.54 | 99.9 | 3.20x |
VGG16 (ONNX Model Zoo) | QDQ | 72.31% | 72.40% | -0.12% | 319.41 | 99.94 | 3.20x |
VGG16 | QDQ | 66.69% | 66.69% | 0.00% | 307.52 | 99.24 | 3.10x |
YOLOv3 | QLinear | 26.82% | 28.74% | -6.68% | 124.24 | 54.03 | 2.30x |
YOLOv4 | QLinear | 33.25% | 33.71% | -1.36% | 49.76 | 32.99 | 1.51x |
ZFNet | QLinear | 55.84% | 55.96% | -0.21% | 459.38 | 261.93 | 1.75x |
ZFNet | QDQ | 55.86% | 55.96% | -0.18% | 460.66 | 264.34 | 1.74x |
MXNet Models with MXNet 1.7.0¶
Model | Accuracy | Performance Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|
INT8 | FP32 | Acc Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | |
Inception V3 | 77.80% | 77.65% | 0.20% | 86.52 | 47.98 | 1.80x |
MobileNet V1 | 71.60% | 72.23% | -0.86% | 441.59 | 337.52 | 1.31x |
MobileNet V3 MLPerf | 70.80% | 70.87% | -0.10% | 272.87 | 211.51 | 1.29x |
ResNet v1 152 | 78.28% | 78.54% | -0.33% | 65.2 | 37.05 | 1.76x |
ResNet18 V1.0 | 70.01% | 70.14% | -0.19% | 423.98 | 235.98 | 1.80x |
ResNet50 V1.0 | 75.91% | 76.33% | -0.55% | 180.69 | 100.49 | 1.80x |
SqueezeNet | 56.80% | 56.97% | -0.28% | 311.23 | 198.61 | 1.57x |
SSD MobileNet V1 | 74.94% | 75.54% | -0.79% | 43.5 | 25.77 | 1.69x |
SSD ResNet50 V1.0 | 80.21% | 80.23% | -0.03% | 31.64 | 15.13 | 2.09x |
Validated Pruning Examples¶
Model | TaskDataset | Dense Accuracy Sparse Accuracy |
Relative Drop | Sparsity ratio Sparsity Pattern |
Comments Balanced or unbalanced ratio |
---|---|---|---|---|---|
ResNet18 | image classificationImageNet | top-1% acc = 69.76top-1% acc = 69.47 | -0.42% | 30% | magnitude |
ResNet50 | image classificationImageNet | top-1% acc = 76.13top-1% acc = 76.11 | -0.03% | 30% | magnitude |
ResNet50 | image classificationImageNet | top-1% acc = 76.13top-1% acc = 76.01 | -0.16% | 30% | magnitudePost Training Quantization |
ResNet50 | image classificationImageNet | top-1% acc = 76.13top-1% acc = 75.90 | -0.30% | 30% | magnitudeQuantization Aware Training |
Bert-Large | question answeringSQuAD-v1.1 | f1=91.34f1=90.7 | -0.07% | 80%structured 2x1 | group lassounbalanced |
Bert-Base | text classificationMNLI | [m, mm] = [84.57, 84.79][m, mm] = [82.45, 83.27] | [-2.51%, -1.80%] | 70%unstructured | Prune once for allbalanced |
Bert-Base | text classificationMNLI | [m, mm] = [84.57, 84.79][m, mm] = [83.20, 84.11] | [-1.62%, -0.80%] | 50%structured 1:2 | Prune once for allbalanced |
Bert-Base | text classificationSST-2 | accuracy = 92.32accuracy = 91.51 | -0.88% | 70%unstructured | Prune once for allbalanced |
Bert-Base | text classificationSST-2 | accuracy = 92.32accuracy = 92.20 | -0.13% | 50%structured 1:2 | Prune once for allbalanced |
Bert-Base | text classificationSST-2 | accuracy = 92.32accuracy = 91.97 | -0.38% | 20%unstructured | gradient sensitivitybalanced |
Bert-Base | text classificationQQP | [accuracy, f1] = [91.10, 88.05][accuracy, f1] = [90.48, 87.06] | [-0.68%, -1.12%] | 70%unstructured | Prune once for allbalanced |
Bert-Base | text classificationQQP | [accuracy, f1] = [91.10, 88.05][accuracy, f1] = [90.92, 87.78] | [-0.20%, -0.31%] | 50%structured 1:2 | Prune once for allbalanced |
Bert-Base | text classificationQNLI | accuracy = 91.54accuracy = 90.39 | -1.26% | 70%unstructured | Prune once for allbalanced |
Bert-Base | text classificationQNLI | accuracy = 91.54accuracy = 90.87 | -0.73% | 50%structured 1:2 | Prune once for allbalanced |
Bert-Base | question answering | [em, f1] = [79.34, 87.10][em, f1] = [77.27, 85.75] | [-2.61%, -1.54%] | 70%unstructured | Prune once for allbalanced |
Bert-Base | question answering | [em, f1] = [79.34, 87.10][em, f1] = [78.03, 86.50] | [-1.65%, -0.69%] | 50%structured 1:2 | Prune once for allbalanced |
Bert-Mini | question answeringSQuAD-v1.1 | f1]=76.87f1=76.2 | -0.80% | 80%structured 4x1 | snip momentumunbalanced |
Bert-Mini | question answeringSQuAD-v1.1 | f1=76.87f1=77.62 | +0.98% | 50%structured 2:4 | snip momentumbalanced |
Distilbert-base-uncased | question answeringSQuAD-v1.1 | f1]=86.90f1=86.15 | -0.86% | 80%structured 4x1 | snip momentumunbalanced |
Distilbert-base-uncased | question answeringSQuAD-v1.1 | f1=86.90f1=87.50 | +0.69% | 50%structured 2:4 | snip momentumbalanced |
Bert-base-uncased | question answeringSQuAD-v1.1 | f1]=88.59f1=87.78 | -0.92% | 80%structured 4x1 | snip momentumunbalanced |
Bert-base-uncased | question answeringSQuAD-v1.1 | f1=88.59f1=89.40 | +0.91% | 50%structured 2:4 | snip momentumbalanced |
Bert-large | question answeringSQuAD-v1.1 | f1]=91.23f1=90.91 | -0.35% | 80%structured 4x1 | snip momentumunbalanced |
Bert-large | question answeringSQuAD-v1.1 | f1=91.23f1=91.67 | +0.48% | 50%structured 2:4 | snip momentumbalanced |
Bert-Mini | text classificationMRPC | f1=87.52f1=87.22 | -0.34% | 90%structured 4x1 | snip momentumunbalanced |
Bert-Mini | text classificationMRPC | f1=87.52f1=87.33 | -0.22% | 90%structured 4x1 | snip momentumbalanced |
Bert-Mini | text classificationMRPC | f1=87.52f1=86.89 | -0.72% | 50%structured 2:4 | snip momentumbalanced |
Bert-Mini | text classificationMRPC | f1=87.52f1=86.8 | -0.83% | 60%structured per channel | snip momentumunbalanced |
Distilbert-base-uncased | text classificationMRPC | f1=90.26f1=89.85 | -0.46% | 90%structured 4x1 | snip momentumunbalanced |
Distilbert-base-uncased | text classificationMRPC | f1=90.26f1=90.88 | +0.69% | 50%structured 2:4 | snip momentumbalanced |
Bert-Mini | text classificationSST-2 | accuracy=87.61accuracy=86.92 | -0.79% | 90%structured 4x1 | snip momentumunbalanced |
Bert-Mini | text classificationSST-2 | accuracy=87.61accuracy=87.73 | +0.14% | 50%structured 2:4 | snip momentumbalanced |
Bert-Mini | text classificationSST-2 | accuracy=87.61accuracy=86.92 | -0.79% | 50%structured per channel | snip momentumunbalanced |
Validated Knowledge Distillation Examples¶
| Example Name | Dataset | Student
(Metrics) | Teacher
(Metrics) | Student With Distillation
(Metrics Improvement) | Student With Distributed Distillation
(Metrics Improvement) |
|———————|———–|————————————–|————————————|—————————————————–|—————————————————–|
| MobileNet example | CIFAR-10 | MobileNetV2-0.35
(0.7965 ACC) | WideResNet40-2
(0.9522 ACC) | 0.8178 ACC
(0.0213 ACC) | 0.8235 ACC
(0.027 ACC) |
| CNN example | CIFAR-100 | CNN-2
(0.5494 ACC) | CNN-10
(0.7153 ACC) | 0.5540 ACC
(0.0046 ACC) | 0.5523 ACC
(0.0029 ACC) |
| VGG example | CIFAR-100 | VGG-8-BN
(0.7022 ACC) | VGG-13-BN
(0.7415 ACC) | 0.7025 ACC
(0.0003 ACC) | WIP |
| ResNet example | ImageNet | ResNet18
(0.6739 ACC) | ResNet50
(0.7399 ACC) | 0.6845 ACC
(0.0106 ACC) | WIP |
| BlendCnn example | MRPC | BlendCnn
(0.7034 ACC) | BERT-Base
(0.8382 ACC) | 0.7034 ACC
(0 ACC) | WIP |
| BiLSTM example | SST-2 | BiLSTM
(0.8314 ACC) | RoBERTa-Base
(0.9403 ACC) | 0.9048 ACC
(0.0734 ACC) | WIP |
|DistilBERT example | SQuAD | DistilBERT
(0.7323/0.8256 EM/F1) | BERT-Base
(0.8084/0.8814 EM/F1) | 0.7442/0.8371 EM/F1
(0.0119/0.0115 EM/F1) | WIP |
|TinyBERT example | MNLI | TinyBERT
(0.8018/0.8044 m/mm) | BERT-Base
(0.8363/0.8411 m/mm) | 0.8025/0.8074 m/mm
(0.0007/0.0030 m/mm) | WIP |
|BERT-3 example | QQP | BERT-3
(0.8626/0.8213 EM/F1) | BERT-Base
(0.9091/0.8782 EM/F1) | 0.8684/0.8259 EM/F1
(0.0058/0.0046 EM/F1) | WIP |
|DistilRoBERTa example| COLA | DistilRoBERTa
(0.6057 ACC) | RoBERTa-Large
(0.6455 ACC) | 0.6187 ACC
(0.0130 ACC) | WIP |
Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime¶
Model (ONNX QDQ) | AWS c6i.2xlarge (Intel) CPU Execution Provider |
AWS c6a.2xlarge (AMD) CPU Execution Provider |
AWS c6g.2xlarge (ARM) CPU Execution Provider |
NVidia A100 CUDA Execution Provider |
---|---|---|---|---|
ResNet50 | 74.76% | 68.95% | 74.76% | 74.75% |
BERT-base | 85.54% | 84.56% | 85.54% | 84.31% |
ResNet50 V1.5 | 72.20% | 67.70% | 72.20% | 72.29% |
MobileNet V2 | 65.82% | 58.56% | 65.83% | 65.63% |
SSD MobileNet V1 | 22.45% | 16.53% | 22.45% | 22.35% |
DistilBERT base MRPC | 84.56% | 83.82% | 84.56% | 84.56% |
SqueezeNet | 56.54% | 53.52% | 56.54% | 56.55% |
SSD | 18.63% | 18.54% | 18.63% | 18.61% |
AlexNet | 54.71% | 47.06% | 54.71% | 54.79% |
CaffeNet | 56.25% | 52.35% | 56.27% | 56.24% |
GoogleNet | 67.73% | 63.56% | 67.72% | 67.76% |
ZFNet | 55.86% | 45.09% | 55.86% | 55.89% |
Inception V1 | 67.21% | 63.03% | 67.20% | 67.21% |
SSD MobileNet V1 (ONNX Model Zoo) | 22.86% | 16.94% | 22.80% | 22.87% |
Mobile bert MRPC | 85.54% | 84.56% | 85.54% | 85.54% |
Roberta base MRPC | 89.46% | 90.44% | 89.71% | 89.71% |
ResNet50 V1.5 MLPerf | 76.14% | 72.80% | 76.14% | 76.17% |
VGG16 | 66.69% | 64.25% | 66.69% | 66.64% |
VGG16 (ONNX Model Zoo) | 72.31% | 69.35% | 72.32% | 72.34% |
MobileNet V3 MLPerf | 75.57% | 70.78% | 75.56% | 75.52% |
EfficientNet | 77.61% | 76.52% | 77.56% | 77.60% |
MobileNet V2 (ONNX Model Zoo) | 68.51% | 62.48% | 68.58% | 68.48% |
ShuffleNet V2 | 66.12% | 58.41% | 66.11% | 66.11% |