Validated Models

Intel® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in example tables, and the performance/accuracy results is available here.

Validated Quantization Examples

1.1. TensorFlow Models with TensorFlow 2.16.1

1.2. Keras Models with keras 2.15.1

1.3. PyTorch Models with Torch 2.3.0+cpu in PTQ Mode

1.4. PyTorch Models with Torch 2.3.0+cpu in QAT Mode

1.5. PyTorch Models with Torch 2.3.0+cpu in IPEX Mode

1.6. ONNX Models with ONNX Runtime 1.18.1
Validated Pruning Examples
Validated Knowledge Distillation Examples
Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime

Validated Quantization Examples

System summary: Test by Intel on 7/22/2024. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s]), BIOS EGSDCRB1.SYS.0081.D18.2205301336, microcode 0x2b000590,
Ubuntu 24.04 LTS, gcc (GCC) 13.2.0 (Ubuntu 13.2.0-23ubuntu4), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16.
Using 1 socket, 4 cores/instance, 14 instances and batch size 1 to benchmark most of the model.

Performance varies by use, configuration and other factors.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks

TensorFlow Models with TensorFlow 2.16.1

Model	Example	Accuracy			Performance 1s4c14ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
ResNet50 v1.0	pb	74.11%	74.27%	-0.22%	1732.92	578.88	2.99x
ResNet50 v1.5	pb	76.25%	76.46%	-0.28%	1535.20	530.00	2.90x
ResNet101	pb	77.52%	76.45%	1.41%	1048.36	384.02	2.73x
Inception V1	pb	70.45%	69.74%	+1.03%	2079.24	927.82	2.24x
Inception V2	pb	74.33%	73.97%	+0.49%	1644.36	840.53	1.96x
Inception V3	pb	76.72%	76.75%	-0.03%	1076.10	401.89	2.68x
Inception V4	pb	80.13%	80.27%	-0.18%	704.96	199.28	3.54x
Inception ResNet V2	pb	80.25%	80.40%	-0.18%	313.97	178.27	1.76x
DenseNet-161	pb	76.29%	76.29%	+0.00%	279.20	214.03	1.30x
MobileNet V1	pb	71.79%	70.96%	+1.18%	4199.13	1506.68	2.79x
MobileNet V2	pb	72.48%	71.76%	+1.01%	2170.39	1445.05	1.50x
VGG16	pb	72.69%	70.89%	+2.55%	1388.62	203.39	6.83x
VGG19	pb	72.67%	71.01%	+2.33%	1236.12	169.74	7.28x
ResNet50	pb	69.09%	69.03%	+0.09%	411.79	284.53	1.45x
ResNetV2 50	pb	70.37%	69.64%	+1.05%	779.42	539.54	1.44x
ResNetV2 101	pb	72.64%	71.87%	+1.08%	492.00	295.77	1.66x
ResNetV2 152	pb	73.12%	72.37%	+1.04%	348.39	205.72	1.69x
ViT	pb	81.39%	81.92%	-0.64%	230.53	132.66	1.74x
SSD ResNet50 V1	pb	37.91%	38.00%	-0.24%	135.71	28.75	4.72x
SSD MobileNet V1	pb	23.00%	23.13%	-0.57%	1237.70	719.30	1.72x
SSD ResNet50 v1	ckpt	37.88%	38.00%	-0.31%	130.54	22.05	5.92x
SSD MobileNet v1	ckpt	22.96%	23.13%	-0.71%	1234.56	529.34	2.33x
Faster R-CNN ResNet101	pb	30.32%	30.39%	-0.22%	144.21	22.64	6.37x
Faster R-CNN ResNet50	pb	26.61%	26.59%	+0.09%	164.55	28.38	5.80x
YOLOv3	pb	83.28%	82.35%	+1.12%	247.56	81.45	3.04x
BERT large SQuAD	pb	92.44%	92.99%	-0.58%	49.17	17.52	2.81x
BERT large SQuAD (ONNX Model Zoo)	pb	92.36%	92.98%	-0.67%	45.06	17.55	2.57x
Transformer LT	pb	25.82%	25.86%	-0.15%	28.99	15.77	1.84x
Transformer lt MLPerf	pb	27.13%	27.17%	-0.13%	10.27	5.08	2.02x
Mask R-CNN Inception V2	pb	28.46%	28.73%	-0.91%	195.68	50.72	3.86x
Mask R-CNN Inception V2	ckpt	28.46%	28.73%	-0.91%	206.14	47.04	4.38x

Keras Models with keras 2.15.1

Model	Example	Accuracy			Performance 1s4c14ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
Inception ResNet V2	pb	80.25%	80.40%	-0.18%	313.97	178.27	1.76x
Inception V3	pb	76.72%	76.75%	-0.03%	1076.10	401.89	2.68x
MobileNet V2	pb	71.49%	71.76%	-0.37%	947.44	779.51	1.22x
ResNet101	pb	77.52%	76.45%	+1.41%	1048.36	384.02	2.73x
ResNet50	pb	69.09%	69.03%	+0.09%	411.79	284.53	1.45x
ResNet50	pb	78.07%	78.12%	-0.06%	680.56	498.08	1.37x
ResNetV2 101	pb	72.64%	71.87%	+1.08%	492.00	295.77	1.66x
ResNetV2 50	pb	70.37%	69.64%	+1.05%	779.42	539.54	1.44x
VGG16	pb	72.69%	70.89%	+2.55%	1388.62	203.39	6.83x
VGG19	pb	72.67%	71.01%	+2.33%	1236.12	169.74	7.28x

PyTorch Models with Torch 2.3.0+cpu in PTQ Mode

Model	Example	Accuracy			Performance 1s4c14ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
ResNet18	static	69.59%	69.76%	-0.24%	1707.52	602.47	2.83x
EfficientNet-B3	static	77.78%	78.54%	-0.98%	513.82	360.02	1.43x
PeleeNet	static	71.83%	72.10%	-0.37%	837.83	541.66	1.55x
ResNet50	static	75.98%	76.15%	-0.21%	1135.22	311.47	3.64x
Inception V3	static	69.46%	69.52%	-0.09%	948.03	322.55	2.94x
ResNeSt50	static	80.76%	81.04%	-0.35%	406.11	39.66	10.24x
ResNeXt101_32x8d	static	78.92%	79.31%	-0.49%	582.22	106.73	5.45x
YOLO V3	static	55.10%	54.93%	+0.31%	156.29	60.30	2.59x
Roberta base MRPC	static	93.14%	93.59%	-0.48%	396.85	176.80	2.24x
CamemBERT base MRPC	static	88.58%	89.28%	-0.78%	405.37	182.87	2.22x
DistilBERT base MRPC	static	90.64%	90.27%	+0.41%	799.05	346.50	2.31x
DistilBERT base MRPC	dynamic	90.02%	90.27%	-0.28%	705.91	348.16	2.03x
ALBERT base MRPC	static	92.28%	92.28%	0.00%	350.78	164.32	2.13x
Xlm Roberta MRPC	static	87.80%	88.62%	-0.93%	396.06	175.96	2.25x
Xlm Roberta MRPC	dynamic	88.54%	88.24%	+0.35%	381.19	175.96	2.17x
BERT base MRPC	static	89.59%	90.42%	-0.91%	402.42	177.73	2.26x
BERT base COLA	static	53.47%	53.39%	+0.16%	395.25	177.02	2.23x
BERT base STSB	static	87.61%	88.05%	-0.49%	397.62	177.23	2.24x
BERT base SST-2	static	91.97%	92.32%	-0.37%	407.66	182.93	2.23x
BERT large COLA	static	63.39%	63.35%	+0.06%	147.86	56.01	2.64x
BERT base RTE	static	71.84%	72.56%	-1.00%	397.83	177.40	2.24x
BERT large MRPC	static	90.07%	90.38%	-0.34%	146.84	52.97	2.77x
BERT large QNLI	static	91.12%	91.54%	-0.46%	394.51	176.92	2.23x
BERT large RTE	static	73.65%	74.01%	-0.49%	148.84	55.83	2.67x
Funnel MRPC		91.94%	92.25%	-0.34%	294.76	187.41	1.57x
BERT large SQuAD	static	92.34%	93.16%	-0.88%	50.21	18.69	2.69x
lvwerra/pegasus-samsum	static	42.32%	42.67%	-0.82%	102.73	37.99	2.70x
ResNet18 PT2E	static	69.49%	69.76%	-0.39%	1873.51	1106.97	1.69x
OPT-125M PT2E	static	37.07%	37.90%	-2.20%	42.09	29.68	1.42x

PyTorch Models with Torch 2.3.0+cpu in QAT Mode

Model	Example	Accuracy			Performance 1s4c14ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
ResNet18	static	69.74%	69.76%	-0.03%	1717.59	602.65	2.85x
ResNet50	static	76.03%	76.15%	-0.15%	1091.62	305.83	3.57x
ResNeXt101_32x8d	static	79.31%	79.31%	0.00%	584.54	107.38	5.44x

PyTorch Models with Torch 2.3.0+cpu in IPEX Mode

Model	Example	Accuracy			Performance 1s4c14ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
bert-large-uncased-whole-word-masking-finetuned-squad	static	93.01%	93.16%	-0.16%	150.05	22.42	6.69x
distilbert-base-uncased-distilled-squad	static	86.10%	86.84%	-0.85%	1034.60	151.13	6.85x

ONNX Models with ONNX Runtime 1.18.1

Model	Example	Accuracy			Performance 1s4c14ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
ResNet50 V1.5	qlinearops	72.18%	72.29%	-0.16%	1495.72	715.94	2.09x
ResNet50 V1.5	qdq	72.13%	72.29%	-0.23%	1547.30	717.03	2.16x
ResNet50 V1.5 MLPerf	qlinearops	76.15%	76.46%	-0.41%	1365.56	718.55	1.90x
ResNet50 V1.5 MLPerf	qdq	76.13%	76.46%	-0.44%	1445.75	718.96	2.01x
ResNet50 V1.5 (ONNX Model Zoo)	qlinearops	74.77%	74.99%	-0.29%	1574.38	749.36	2.10x
ResNet50 V1.5 (ONNX Model Zoo)	qdq	74.78%	74.99%	-0.27%	1564.15	755.58	2.07x
VGG16	qlinearops	66.55%	66.69%	-0.20%	526.57	162.64	3.24x
VGG16	qdq	66.62%	66.69%	-0.11%	520.09	172.42	3.02x
VGG16 (ONNX Model Zoo)	qlinearops	72.37%	72.40%	-0.04%	558.81	162.87	3.43x
VGG16 (ONNX Model Zoo)	qdq	72.36%	72.40%	-0.04%	556.58	176.92	3.15x
MobileNet V3 MLPerf	qlinearops	75.51%	75.74%	-0.30%	5421.72	2578.08	2.10x
MobileNet V3 MLPerf	qdq	75.51%	75.74%	-0.30%	5382.87	2567.48	2.10x
ShuffleNet V2 (ONNX Model Zoo)	qlinearops	66.13%	66.36%	-0.36%	6426.22	3725.69	1.72x
ShuffleNet V2 (ONNX Model Zoo)	qdq	66.22%	66.36%	-0.22%	6534.24	3707.74	1.76x
GoogleNet (ONNX Model Zoo)	qlinearops	67.69%	67.79%	-0.14%	1842.90	1137.58	1.62x
GoogleNet (ONNX Model Zoo)	qdq	67.71%	67.79%	-0.11%	1818.99	1136.37	1.60x
SqueezeNet (ONNX Model Zoo)	qlinearops	56.49%	56.87%	-0.67%	9521.99	5530.36	1.72x
SqueezeNet (ONNX Model Zoo)	qdq	56.49%	56.87%	-0.67%	9391.07	5519.79	1.70x
CaffeNet (ONNX Model Zoo)	qlinearops	56.26%	56.30%	-0.07%	2949.36	893.77	3.30x
CaffeNet (ONNX Model Zoo)	qdq	56.26%	56.30%	-0.08%	2847.24	901.15	3.16x
AlexNet (ONNX Model Zoo)	qlinearops	54.73%	54.79%	-0.10%	2070.17	816.71	2.53x
AlexNet (ONNX Model Zoo)	qdq	54.71%	54.79%	-0.14%	2059.13	844.97	2.44x
ZFNet (ONNX Model Zoo)	qlinearops	55.83%	55.96%	-0.24%	858.76	461.25	1.86x
ZFNet (ONNX Model Zoo)	qdq	55.87%	55.96%	-0.16%	853.77	457.91	1.86x
Inception V1 (ONNX Model Zoo)	qlinearops	67.23%	67.24%	-0.02%	1891.36	1205.95	1.57x
Inception V1 (ONNX Model Zoo)	qdq	67.23%	67.24%	-0.02%	1879.27	1202.19	1.56x
BEiT (ONNX Model Zoo)	qlinearops	85.07%	85.28%	-0.25%	205.15	126.59	1.62x
EfficientNet (ONNX Model Zoo)	qlinearops	77.02%	77.11%	-0.12%	2428.32	1344.03	1.81x
EfficientNet (ONNX Model Zoo)	qdq	76.99%	77.11%	-0.16%	2286.73	1307.18	1.75x
DenseNet (ONNX Model Zoo)	qlinearops	60.53%	60.96%	-0.71%	626.26	499.76	1.25x
SSD MobileNet V1 (ONNX Model Zoo)	qlinearops	22.96%	23.02%	-0.27%	1121.43	841.32	1.33x
SSD MobileNet V1 (ONNX Model Zoo)	qdq	22.96%	23.02%	-0.27%	1048.50	798.22	1.31x
DUC (ONNX Model Zoo)	qlinearops	81.62%	81.92%	-0.37%	9.26	4.99	1.86x
Ultra Face (ONNX Model Zoo)	qlinearops	83.33%	83.65%	-0.38%	8993.58	1988.46	4.52x
Emotion FERPlus (ONNX Model Zoo)	qlinearops	7.94%	8.00%	-0.70%	6113.74	3087.50	1.98x
ArcFace (ONNX Model Zoo)	qlinearops	99.82%	99.80%	+0.02%	442.85	230.75	1.92x
BERT base MRPC	qlinearops	85.54%	86.03%	-0.57%	483.81	219.45	2.20x
BERT base MRPC	qdq	85.54%	86.03%	-0.57%	485.08	218.33	2.22x
BERT base MRPC	integerops	85.29%	86.03%	-0.85%	684.46	218.86	3.13x
DistilBERT base MRPC	qdq	84.07%	84.56%	-0.58%	633.28	399.31	1.59x
DistilBERT base MRPC	integerops	85.54%	84.56%	+1.16%	1388.44	401.08	3.46x
Mobile bert MRPC	qdq	85.54%	86.28%	-0.85%	505.62	387.43	1.31x
Mobile bert MRPC	integerops	85.54%	86.28%	-0.85%	565.46	386.39	1.46x
Roberta base MRPC	integerops	90.93%	89.95%	+1.09%	702.17	219.50	3.20x
BERT SQuAD (ONNX Model Zoo)	integerops	80.29%	80.67%	-0.47%	242.58	97.71	2.48x
MobileBERT SQuAD MLPerf (ONNX Model Zoo)	integerops	89.87%	90.03%	-0.17%	151.69	125.35	1.21x
GPT2 lm head WikiText (ONNX Model Zoo)	integerops	31.98%	29.00%	+10.31%	17.96	10.21	1.76x
BERT base uncased MRPC (HuggingFace)	qlinearops	90.21%	90.42%	-0.23%	434.65	210.58	2.06x
BERT base uncased MRPC (HuggingFace)	integerops	89.58%	90.42%	-0.93%	708.66	210.74	3.36x
Roberta base MRPC (HuggingFace)	qlinearops	91.00%	91.38%	-0.41%	431.37	211.03	2.04x
Roberta base MRPC (HuggingFace)	integerops	90.85%	91.38%	-0.58%	711.11	210.71	3.37x
XLM Roberta base MRPC (HuggingFace)	qlinearops	89.37%	90.10%	-0.81%	334.88	211.56	1.58x
XLM Roberta base MRPC (HuggingFace)	integerops	89.66%	90.10%	-0.50%	401.99	211.43	1.90x
Camembert base MRPC (HuggingFace)	qlinearops	89.28%	89.28%	0.00%	282.30	213.33	1.32x
Camembert base MRPC (HuggingFace)	integerops	89.19%	89.28%	-0.10%	707.22	214.23	3.30x
MiniLM L12 H384 uncased MRPC (HuggingFace)	qlinearops	90.13%	90.97%	-0.93%	1188.05	578.35	2.05x
MiniLM L12 H384 uncased MRPC (HuggingFace)	integerops	91.07%	90.97%	+0.10%	1285.13	576.04	2.23x
DistilBERT base uncased SST-2 (HuggingFace)	qlinearops	90.71%	91.06%	-0.38%	1259.69	396.60	3.18x
DistilBERT base uncased SST-2 (HuggingFace)	integerops	90.25%	91.06%	-0.88%	914.63	395.09	2.32x
Albert base v2 SST-2 (HuggingFace)	qlinearops	92.09%	92.32%	-0.25%	284.62	210.52	1.35x
Albert base v2 SST-2 (HuggingFace)	integerops	91.74%	92.32%	-0.62%	284.69	210.00	1.36x
MiniLM L6 H384 uncased SST-2 (HuggingFace)	qlinearops	89.45%	90.14%	-0.76%	2172.98	1121.66	1.94x
MiniLM L6 H384 uncased SST-2 (HuggingFace)	integerops	89.91%	90.14%	-0.26%	2326.27	1114.57	2.09x
BERT base cased MRPC (HuggingFace)	qlinearops	87.70%	88.29%	-0.67%	494.96	210.80	2.35x
BERT base cased MRPC (HuggingFace)	integerops	88.19%	88.29%	-0.12%	714.61	210.99	3.39x
Electra small discriminator MRPC (HuggingFace)	qlinearops	89.92%	89.83%	+0.09%	1998.71	1115.18	1.79x
Electra small discriminator MRPC (HuggingFace)	integerops	89.27%	89.83%	-0.63%	2202.81	1121.41	1.96x
BERT mini MRPC (HuggingFace)	qlinearops	86.21%	86.52%	-0.35%	5767.23	3254.79	1.77x
BERT mini MRPC (HuggingFace)	integerops	86.16%	86.52%	-0.41%	6354.66	3424.42	1.86x
Xlnet base cased MRPC (HuggingFace)	qlinearops	90.05%	89.86%	+0.21%	121.24	95.56	1.27x
Xlnet base cased MRPC (HuggingFace)	integerops	89.58%	89.86%	-0.31%	123.06	95.60	1.29x
BART large MRPC (HuggingFace)	integerops	92.36%	91.20%	+1.28%	126.14	51.06	2.47x
DeBERTa v3 base MRPC (HuggingFace)	integerops	92.39%	92.23%	+0.17%	193.16	153.16	1.26x
Spanbert SQuAD (HuggingFace)	qlinearops	91.14%	91.98%	-0.91%	81.96	43.36	1.89x
Spanbert SQuAD (HuggingFace)	integerops	91.40%	91.98%	-0.63%	101.71	43.37	2.35x
Bert base multilingual cased SQuAD (HuggingFace)	qlinearops	88.42%	89.13%	-0.79%	86.33	43.27	2.00x
Bert base multilingual cased SQuAD (HuggingFace)	integerops	88.70%	89.13%	-0.48%	101.78	43.24	2.35x
DistilBert base uncased SQuAD (HuggingFace)	qlinearops	86.33%	86.86%	-0.62%	120.71	69.72	1.73x
DistilBert base uncased SQuAD (HuggingFace)	integerops	86.05%	86.86%	-0.94%	203.71	69.68	2.92x
BERT large uncased whole word masking SQuAD (HuggingFace)	qlinearops	92.34%	93.16%	-0.88%	31.81	12.94	2.46x
BERT large uncased whole word masking SQuAD (HuggingFace)	integerops	92.99%	93.16%	-0.18%	35.83	12.94	2.77x
Roberta large SQuAD v2 (HuggingFace)	qlinearops	89.03%	89.02%	+0.02%	17.61	13.27	1.33x
Roberta large SQuAD v2 (HuggingFace)	integerops	89.04%	89.02%	+0.02%	35.85	13.26	2.70x
GPT2 WikiText (HuggingFace)	qlinearops	30.25%	29.00%	+4.33%	13.85	10.17	1.36x
GPT2 WikiText (HuggingFace)	integerops	29.68%	29.00%	+2.36%	14.64	10.09	1.45x
DistilGPT2 WikiText (HuggingFace)	qlinearops	44.93%	43.43%	+3.46%	21.80	17.13	1.27x
DistilGPT2 WikiText (HuggingFace)	integerops	44.62%	43.43%	+2.74%	23.02	17.09	1.35x
LayoutLMv3 FUNSD (HuggingFace)	integerops	90.07%	90.49%	-0.46%	39.50	28.00	1.41x
CodeBert (HuggingFace)	qlinearops	64.97%	65.41%	-0.67%	75.69	45.10	1.68x
CodeBert (HuggingFace)	integerops	64.93%	65.41%	-0.73%	94.47	45.10	2.09x
FCN (ONNX Model Zoo)	qlinearops	64.54%	64.98%	-0.67%	25.83	12.90	2.00x
FCN (ONNX Model Zoo)	qdq	64.54%	64.98%	-0.67%	25.97	12.99	2.00x

Validated Pruning Examples

Model	Task Dataset	Dense Accuracy Sparse Accuracy	Relative Drop	Sparsity ratio Sparsity Pattern	Comments Balanced or unbalanced ratio
Model	Task Dataset	Dense Accuracy Sparse Accuracy	Relative Drop	Sparsity ratio Sparsity Pattern	Comments Balanced or unbalanced ratio
Bert-Mini	question answering SQuAD-v1.1	f1=76.87 f1=76.2	-0.80%	80% structured 4x1	snip momentum unbalanced
Bert-Mini	question answering SQuAD-v1.1	f1=76.87 f1=76.2	-0.80%	80% structured 4x1	snip momentum unbalanced
Bert-Mini	question answering SQuAD-v1.1	f1=76.87 f1=77.62	+0.98%	50% structured 2:4	snip momentum balanced
Distilbert-base-uncased	question answering SQuAD-v1.1	f1=86.90 f1=86.15	-0.86%	80% structured 4x1	snip momentum unbalanced
Distilbert-base-uncased	question answering SQuAD-v1.1	f1=86.90 f1=87.50	+0.69%	50% structured 2:4	snip momentum balanced
Bert-base-uncased	question answering SQuAD-v1.1	f1=88.59 f1=87.78	-0.92%	80% structured 4x1	snip momentum unbalanced
Bert-base-uncased	question answering SQuAD-v1.1	f1=88.59 f1=89.40	+0.91%	50% structured 2:4	snip momentum balanced
Bert-large	question answering SQuAD-v1.1	f1=91.23 f1=90.91	-0.35%	80% structured 4x1	snip momentum unbalanced
Bert-large	question answering SQuAD-v1.1	f1=91.23 f1=91.67	+0.48%	50% structured 2:4	snip momentum balanced
Bert-Mini	text classification MRPC	f1=87.52 f1=87.22	-0.34%	90% structured 4x1	snip momentum unbalanced
Bert-Mini	text classification MRPC	f1=87.52 f1=87.33	-0.22%	90% structured 4x1	snip momentum balanced
Bert-Mini	text classification MRPC	f1=87.52 f1=86.89	-0.72%	50% structured 2:4	snip momentum balanced
Bert-Mini	text classification MRPC	f1=87.52 f1=86.8	-0.83%	60% structured per channel	snip momentum unbalanced
Distilbert-base-uncased	text classification MRPC	f1=90.26 f1=89.85	-0.46%	90% structured 4x1	snip momentum unbalanced
Distilbert-base-uncased	text classification MRPC	f1=90.26 f1=90.88	+0.69%	50% structured 2:4	snip momentum balanced
Bert-Mini	text classification SST-2	accuracy=87.61 accuracy=86.92	-0.79%	90% structured 4x1	snip momentum unbalanced
Bert-Mini	text classification SST-2	accuracy=87.61 accuracy=87.73	+0.14%	50% structured 2:4	snip momentum balanced
Bert-Mini	text classification SST-2	accuracy=87.61 accuracy=86.92	-0.79%	50% structured per channel	snip momentum unbalanced
ResNet50	image recognition ImageNet	top1 acc = 78.95 top1 acc = 80.10	-1.43%	75% structured 2x1	snip momentum unbalanced
YOLO-v5s6	object detection COCO	AP0.50:0.95/AP0.50=0.404/0.6 AP0.50:0.95/AP0.50=0.393/0.584	-2.72%	80% unstructured	snip momentum unbalanced
Bert-Large	question answering SQuAD-v1.1	f1=91.34 f1=90.7	-0.07%	80% structured 2x1	group lasso unbalanced
Bert-Base	text classification MNLI	[m, mm] = [84.57, 84.79] [m, mm] = [82.45, 83.27]	[-2.51%, -1.80%]	70% unstructured	Prune once for all balanced
Bert-Base	text classification MNLI	[m, mm] = [84.57, 84.79] [m, mm] = [83.20, 84.11]	[-1.62%, -0.80%]	50% structured 1:2	Prune once for all balanced
Bert-Base	text classification SST-2	accuracy = 92.32 accuracy = 91.51	-0.88%	70% unstructured	Prune once for all balanced
Bert-Base	text classification SST-2	accuracy = 92.32 accuracy = 92.20	-0.13%	50% structured 1:2	Prune once for all balanced
Bert-Base	text classification SST-2	accuracy = 92.32 accuracy = 91.97	-0.38%	20% unstructured	gradient sensitivity balanced
Bert-Base	text classification QQP	[accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.48, 87.06]	[-0.68%, -1.12%]	70% unstructured	Prune once for all balanced
Bert-Base	text classification QQP	[accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.92, 87.78]	[-0.20%, -0.31%]	50% structured 1:2	Prune once for all balanced
Bert-Base	text classification QNLI	accuracy = 91.54 accuracy = 90.39	-1.26%	70% unstructured	Prune once for all balanced
Bert-Base	text classification QNLI	accuracy = 91.54 accuracy = 90.87	-0.73%	50% structured 1:2	Prune once for all balanced
Bert-Base	question answering	[em, f1] = [79.34, 87.10] [em, f1] = [77.27, 85.75]	[-2.61%, -1.54%]	70% unstructured	Prune once for all balanced
Bert-Base	question answering	[em, f1] = [79.34, 87.10] [em, f1] = [78.03, 86.50]	[-1.65%, -0.69%]	50% structured 1:2	Prune once for all balanced

Validated Knowledge Distillation Examples

Example Name	Dataset	Student (Metrics)	Teacher (Metrics)	Student With Distillation (Metrics Improvement)	Student With Distributed Distillation (Metrics Improvement)
MobileNet example	CIFAR-10	MobileNetV2-0.35 (0.7965 ACC)	WideResNet40-2 (0.9522 ACC)	0.8178 ACC (0.0213 ACC)	0.8235 ACC (0.027 ACC)
CNN example	CIFAR-100	CNN-2 (0.5494 ACC)	CNN-10 (0.7153 ACC)	0.5540 ACC (0.0046 ACC)	0.5523 ACC (0.0029 ACC)
VGG example	CIFAR-100	VGG-8-BN (0.7022 ACC)	VGG-13-BN (0.7415 ACC)	0.7025 ACC (0.0003 ACC)	NA
ResNet example	ImageNet	ResNet18 (0.6739 ACC)	ResNet50 (0.7399 ACC)	0.6845 ACC (0.0106 ACC)	NA
BlendCnn example	MRPC	BlendCnn (0.7034 ACC)	BERT-Base (0.8382 ACC)	0.7034 ACC (0 ACC)	NA
BiLSTM example	SST-2	BiLSTM (0.8314 ACC)	RoBERTa-Base (0.9403 ACC)	0.9048 ACC (0.0734 ACC)	NA
DistilBERT example	SQuAD	DistilBERT (0.7323/0.8256 EM/F1)	BERT-Base (0.8084/0.8814 EM/F1)	0.7442/0.8371 EM/F1 (0.0119/0.0115 EM/F1)	NA
TinyBERT example	MNLI	TinyBERT (0.8018/0.8044 m/mm)	BERT-Base (0.8363/0.8411 m/mm)	0.8025/0.8074 m/mm (0.0007/0.0030 m/mm)	NA
BERT-3 example	QQP	BERT-3 (0.8626/0.8213 EM/F1)	BERT-Base (0.9091/0.8782 EM/F1)	0.8684/0.8259 EM/F1 (0.0058/0.0046 EM/F1)	NA
DistilRoBERTa example	COLA	DistilRoBERTa (0.6057 ACC)	RoBERTa-Large (0.6455 ACC)	0.6187 ACC (0.0130 ACC)	NA

Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime

Model (ONNX QDQ)	AWS c6i.2xlarge (Intel) CPU Execution Provider	AWS c6a.2xlarge (AMD) CPU Execution Provider	AWS c6g.2xlarge (ARM) CPU Execution Provider	NVidia A100 CUDA Execution Provider
ResNet50	74.76%	68.95%	74.76%	74.75%
BERT-base	85.54%	84.56%	85.54%	84.31%
ResNet50 V1.5	72.20%	67.70%	72.20%	72.29%
MobileNet V2	65.82%	58.56%	65.83%	65.63%
SSD MobileNet V1	22.45%	16.53%	22.45%	22.35%
DistilBERT base MRPC	84.56%	83.82%	84.56%	84.56%
SqueezeNet	56.54%	53.52%	56.54%	56.55%
SSD	18.63%	18.54%	18.63%	18.61%
AlexNet	54.71%	47.06%	54.71%	54.79%
CaffeNet	56.25%	52.35%	56.27%	56.24%
GoogleNet	67.73%	63.56%	67.72%	67.76%
ZFNet	55.86%	45.09%	55.86%	55.89%
Inception V1	67.21%	63.03%	67.20%	67.21%
SSD MobileNet V1 (ONNX Model Zoo)	22.86%	16.94%	22.80%	22.87%
Mobile bert MRPC	85.54%	84.56%	85.54%	85.54%
Roberta base MRPC	89.46%	90.44%	89.71%	89.71%
ResNet50 V1.5 MLPerf	76.14%	72.80%	76.14%	76.17%
VGG16	66.69%	64.25%	66.69%	66.64%
VGG16 (ONNX Model Zoo)	72.31%	69.35%	72.32%	72.34%
MobileNet V3 MLPerf	75.57%	70.78%	75.56%	75.52%
EfficientNet	77.61%	76.52%	77.56%	77.60%
MobileNet V2 (ONNX Model Zoo)	68.51%	62.48%	68.58%	68.48%
ShuffleNet V2	66.12%	58.41%	66.11%	66.11%