Validated Models¶

Validated Quantization Examples

1.1. TensorFlow Models with TensorFlow 2.10.0

1.2. PyTorch Models with Torch 1.12.1+cpu in PTQ Mode

1.3. PyTorch Models with Torch 1.12.1+cpu in QAT Mode

1.4. PyTorch Models with Torch and Intel® Extension for PyTorch* 1.11.0+cpu

1.5. ONNX Models with ONNX Runtime 1.12.1

1.6. MXNet Models with MXNet 1.7.0
Validated Pruning Examples
Validated Knowledge Distillation Examples
Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime

Validated Quantization Examples¶

Performance results test on 09/24/2022 with Intel Xeon Platinum 8380 Scalable processor, using 1 socket, 4 cores/instance, 8 instances and batch size 1.

Performance varies by use, configuration and other factors. See platform configuration for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks

TensorFlow Models with TensorFlow 2.10.0¶

Model	Example	Accuracy			Performance Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio[(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio[INT8/FP32]
EfficientNet	pb	76.74%	76.76%	-0.03%	91.43	69.41	1.32x
Faster R-CNN Inception ResNet V2	pb	37.65%	38.33%	-1.77%	2.53	1.62	1.57x
Faster R-CNN Inception ResNet V2	SavedModel	37.77%	38.33%	-1.46%	2.54	1.61	1.58x
Faster R-CNN ResNet101	pb	30.34%	30.39%	-0.16%	27.63	13.12	2.11x
Faster R-CNN ResNet101	SavedModel	30.33%	30.39%	-0.20%	27.71	11.06	2.51x
Faster R-CNN ResNet50	pb	26.65%	26.59%	0.23%	33.64	16.33	2.06x
Inception ResNet V2	pb	80.34%	80.40%	-0.07%	29.25	23.43	1.25x
Inception V1	pb	70.44%	69.74%	1.00%	163.14	133.44	1.22x
Inception V2	pb	74.34%	73.97%	0.50%	133.49	111.5	1.20x
Inception V3	pb	76.71%	76.75%	-0.05%	91.67	64.02	1.43x
Inception V4	pb	80.18%	80.27%	-0.11%	56.87	37.09	1.53x
Mask R-CNN Inception V2	pb	28.50%	28.73%	-0.80%	36.06	27.15	1.33x
Mask R-CNN Inception V2	CKPT	28.50%	28.73%	-0.80%	36.1	25.06	1.44x
MobileNet V1	pb	71.85%	70.96%	1.25%	374.38	226.03	1.66x
MobileNet V2	pb	71.85%	70.96%	1.25%	374.38	226.03	1.66x
ResNet101	pb	77.50%	76.45%	1.37%	92.47	65.56	1.41x
ResNet50 Fashion	pb	78.04%	78.12%	-0.10%	359.18	244.38	1.47x
ResNet50 V1.0	pb	74.11%	74.27%	-0.22%	172.66	87.28	1.98x
ResNet50 V1.5	pb	76.23%	76.46%	-0.30%	153.37	87.24	1.76x
SSD MobileNet V1	pb	23.12%	23.13%	-0.04%	151.92	112.24	1.35x
SSD MobileNet V1	CKPT	23.11%	23.13%	-0.09%	153.18	67.79	2.26x
SSD ResNet34	pb	21.71%	22.09%	-1.72%	30.99	8.65	3.58x
SSD ResNet50 V1	pb	37.76%	38.00%	-0.63%	23.04	14.75	1.56x
SSD ResNet50 V1	CKPT	37.82%	38.00%	-0.47%	23	11.94	1.93x
VGG16	pb	72.64%	70.89%	2.47%	178.99	83.67	2.14x
VGG19	pb	72.69%	71.01%	2.37%	156.11	71.5	2.18x

PyTorch Models with Torch 1.12.1+cpu in PTQ Mode¶

Model	Example	Accuracy			Performance Throughput (samples/sec)
Model	Example	INT8	FP32	Acc Ratio[(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio[INT8/FP32]
ALBERT base MRPC	EAGER	88.85%	88.50%	0.40%	26	21.22	1.23x
Barthez MRPC	EAGER	83.92%	83.81%	0.14%	128.66	70.86	1.82x
BERT base MRPC	FX	89.90%	90.69%	-0.88%	203.38	101.29	2.01x
BERT base RTE	FX	69.31%	69.68%	-0.53%	216.22	102.72	2.10x
BERT base SST2	FX	91.06%	91.86%	-0.88%	218.2	101.86	2.14x
BERT base STSB	FX	64.12%	62.57%	2.48%	73.65	29.61	2.49x
BERT large COLA	FX	92.79	93.16	-0.39%	36.54	9.89	3.70x
BERT large MRPC	FX	89.50%	90.38%	-0.97%	74.11	29.69	2.50x
BERT large QNLI	FX	90.90%	91.82%	-1.00%	72.45	29.66	2.44x
BERT large RTE	FX	73.65%	74.01%	-0.49%	41.53	29.67	1.40x
BlendCNN	EAGER	68.40%	68.40%	0.00%	3878.48	3717.52	1.04x
CamemBERT base MRPC	EAGER	86.70%	86.82%	-0.14%	188.97	98.9	1.91x
Ctrl MRPC	EAGER	81.87%	81.22%	0.80%	18.68	7.25	2.58x
Deberta MRPC	EAGER	90.88%	90.91%	-0.04%	124.43	68.74	1.81x
DistilBERT base MRPC	EAGER	88.23%	89.16%	-1.05%	347.47	200.76	1.73x
DistilBERT base MRPC FX	FX	88.54%	89.16%	-0.69%	382.74	198.25	1.93x
FlauBERT MRPC	EAGER	79.87%	80.19%	-0.40%	561.35	370.2	1.52x
HuBERT	FX	97.69%	97.84%	-0.15%	9.82	7.2	1.36x
Inception V3	EAGER	69.43%	69.52%	-0.13%	409.34	181.95	2.25x
Longformer MRPC	EAGER	91.01%	91.46%	-0.49%	18.73	14.66	1.28x
mBart WNLI	EAGER	56.34%	56.34%	0.00%	54.35	25.14	2.16x
MobileNet V2	EAGER	70.54%	71.84%	-1.81%	639.87	490.05	1.31x
lvwerra/pegasus-samsum	EAGER	42.1	42.67	-1.35%	3.41	1.07	3.19x
PeleeNet	EAGER	71.64%	72.10%	-0.64%	419.42	316.98	1.32x
ResNet18	EAGER	69.57%	69.76%	-0.27%	686.03	332.13	2.07x
ResNet18	FX	69.54%	69.76%	-0.31%	611.36	333.27	1.83x
ResNet50	EAGER	75.98%	76.15%	-0.21%	327.14	162.46	2.01x
ResNeXt101_32x8d	EAGER	79.08%	79.31%	-0.29%	175.93	61.09	2.88x
Roberta Base MRPC	EAGER	88.25%	88.18%	0.08%	197.96	99.35	1.99x
Se_ResNeXt50_32x4d	EAGER	78.98%	79.08%	-0.13%	308.19	144.6	2.13x
SqueezeBERT MRPC	EAGER	86.87%	87.65%	-0.89%	186.26	155.67	1.20x
SSD ResNet 34	FX	19.52	19.63	-0.59%	19.09	6.88	2.78x
Transfo-xl MRPC	EAGER	81.97%	81.20%	0.94%	9.65	7.06	1.37x
Wave2Vec2	FX	95.71%	96.60%	-0.92%	23.69	19.58	1.21x
Xlm Roberta base MRPC	EAGER	88.03%	88.62%	-0.67%	114.31	99.34	1.15x
YOLO V3	EAGER	24.60%	24.54%	0.21%	71.81	31.38	2.29x

PyTorch Models with Torch 1.12.1+cpu in QAT Mode¶

Model	Example	Accuracy			Performance Throughput (samples/sec)
Model	Example	INT8	FP32	Acc Ratio[(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio[INT8/FP32]
ResNet18	EAGER	69.84%	69.76%	0.11%	690.73	330.85	2.09x
ResNet18	FX	69.74%	69.76%	-0.03%	614.83	334.35	1.84x
BERT base MRPC QAT	FX	89.70%	89.46%	0.27%	127.45	82.68	1.54x
ResNet50	EAGER	76.05%	76.15%	-0.13%	410.44	168.81	2.43x

PyTorch Models with Torch and Intel® Extension for PyTorch* 1.11.0+cpu¶

Model	Example	Accuracy			Performance Throughput (samples/sec)
Model	Example	INT8	FP32	Acc Ratio[(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio[INT8/FP32]
bert-large-uncased-whole-word-masking-finetuned-squad	IPEX	92.9	93.16	-0.28%	31.35	9.97	3.14x
ResNeXt101_32x16d_wsl	IPEX	69.48%	69.76%	-0.40%	1189.15	680	1.75x
ResNet50	IPEX	76.07%	76.15%	-0.10%	677.69	381.59	1.78x
SSD ResNet34	IPEX	19.95%	20.00%	-0.25%	24.07	6.71	3.59x
DistilBERT base MRPC	IPEX	86	86.84	-0.96%	98.02	62.4	1.57x

ONNX Models with ONNX Runtime 1.12.1¶

Model	Example	Accuracy			Performance Throughput(samples/sec)
Model	Example	INT8	FP32	Acc Ratio[(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio[INT8/FP32]
AlexNet	QLinear	54.73%	54.79%	-0.11%	960.18	469.17	2.05x
AlexNet	QDQ	54.71%	54.79%	-0.15%	962.71	466.56	2.06x
ArcFace	QLinear	99.80%	99.80%	0.00%	235.14	130	1.81x
BERT base MRPC DYNAMIC	QLinear	85.29%	86.03%	-0.86%	294.05	125.85	2.34x
BERT base MRPC STATIC	QLinear	85.29%	86.03%	-0.86%	604.07	256.93	2.35x
BERT SQuAD	QLinear	80.44	80.67	-0.29%	93.21	51.45	1.81x
BERT SQuAD	QDQ	80.44	80.67	-0.29%	93.27	51.67	1.80x
CaffeNet	QLinear	56.21%	56.30%	-0.16%	1501.21	536.1	2.80x
CaffeNet	QDQ	56.25%	56.30%	-0.09%	1493.36	533.09	2.80x
DistilBERT base MRPC	QLinear	84.80%	84.56%	0.28%	1372.84	485.95	2.83x
DistilBERT base MRPC	QDQ	84.56%	84.56%	0.00%	541.43	480.25	1.13x
EfficientNet	QLinear	77.57%	77.70%	-0.17%	1250.63	753.09	1.66x
EfficientNet	QDQ	77.61%	77.70%	-0.12%	1130.67	748.12	1.51x
Emotion Ferplus	QLinear	7.86%	8.00%	-1.75%	336.52	163.72	2.06x
Faster R-CNN	QLinear	34.05%	34.37%	-0.93%	16.36	6.18	2.65x
Faster R-CNN	QDQ	33.97%	34.37%	-1.16%	10.26	6.18	1.66x
FCN	QLinear	64.54%	64.98%	-0.67%	40.05	12.08	3.31x
FCN QDQ	QDQ	64.65%	64.98%	-0.50%	26.73	12.04	2.22x
GoogleNet	QLinear	67.71%	67.79%	-0.12%	740.16	587.54	1.26x
GoogleNet	QDQ	67.73%	67.79%	-0.09%	770.51	567.88	1.36x
Inception V1	QLinear	67.21%	67.24%	-0.04%	824.15	601.92	1.37x
Inception V1	QDQ	67.21%	67.24%	-0.04%	819.85	597.46	1.37x
Mask R-CNN	QLinear	33.41%	33.72%	-0.92%	14.18	5.78	2.45x
Mask R-CNN	QDQ	33.30%	33.72%	-1.25%	9.42	5.7	1.65x
Mobile bert MRPC	QLinear	86.27%	86.27%	0.00%	613.72	506.41	1.21x
MobileBERT SQuAD MLPerf	QLinear	89.82	90.03	-0.23%	88.41	76.07	1.16x
MobileNet V2	QLinear	65.59%	66.89%	-1.94%	2454.53	1543.79	1.59x
MobileNet V2	QDQ	65.82%	66.89%	-1.60%	2164.97	1564.21	1.38x
MobileNet V3 MLPerf	QLinear	75.58%	75.74%	-0.21%	2147.42	1046.69	2.05x
MobileNet V3 MLPerf	QDQ	75.57%	75.74%	-0.22%	1877.1	1054.88	1.78x
MobileNetV2 (ONNX Model Zoo)	QLinear	68.38%	69.48%	-1.58%	2751.7	1797.64	1.53x
MobileNetV2 (ONNX Model Zoo)	QDQ	68.51%	69.48%	-1.40%	2656.23	1835.74	1.45x
ResNet50 v1.5 MLPerf	QLinear	0.7615	0.7646	-0.41%	764.901	434.141	1.76x
ResNet50 v1.5 MLPerf	QDQ	0.7614	0.7646	-0.42%	575.952	433.75	1.33x
ResNet50 V1.5	QLinear	0.7226	0.7229	-0.04%	761.12	432.615	1.76x
ResNet50 V1.5	QDQ	0.722	0.7229	-0.12%	575.032	432.894	1.33x
ResNet50 V1.5 (ONNX Model Zoo)	QLinear	74.81%	74.99%	-0.24%	885.64	454.02	1.95x
ResNet50 V1.5 (ONNX Model Zoo)	QDQ	74.76%	74.99%	-0.31%	603.72	455.86	1.32x
Roberta Base MRPC	QLinear	89.71%	89.95%	-0.27%	644.636	254.791	2.53x
ShuffleNet V2	QLinear	66.13%	66.36%	-0.35%	2298.55	1480.87	1.55x
ShuffleNet V2	QDQ	66.12%	66.36%	-0.36%	1951.11	1490.78	1.31x
SqueezeNet	QLinear	56.54%	56.87%	-0.58%	2588.97	1605.92	1.61x
SqueezeNet	QDQ	56.54%	56.87%	-0.58%	2566.18	1936.79	1.32x
SSD MobileNet V1	QLinear	22.45%	23.10%	-2.81%	725.83	570.24	1.27x
SSD MobileNet V1	QDQ	22.45%	23.10%	-2.81%	666.01	539.77	1.23x
SSD MobileNet V1 (ONNX Model Zoo)	QLinear	22.86%	23.03%	-0.74%	641.56	519.93	1.23x
SSD MobileNet V1 (ONNX Model Zoo)	QDQ	22.86%	23.03%	-0.74%	633.61	492.5	1.29x
SSD MobileNet V2	QLinear	24.04%	24.68%	-2.59%	542.68	401.56	1.35x
SSD	QLinear	18.84%	18.98%	-0.74%	31.33	8.87	3.53x
SSD	QDQ	18.63%	18.98%	-1.84%	23.98	8.95	2.68x
Tiny YOLOv3	QLinear	12.08%	12.43%	-2.82%	648.62	518.97	1.25x
VGG16	QLinear	66.67%	66.69%	-0.03%	221.93	99.51	2.23x
VGG16 (ONNX Model Zoo)	QLinear	72.32%	72.40%	-0.11%	319.54	99.9	3.20x
VGG16 (ONNX Model Zoo)	QDQ	72.31%	72.40%	-0.12%	319.41	99.94	3.20x
VGG16	QDQ	66.69%	66.69%	0.00%	307.52	99.24	3.10x
YOLOv3	QLinear	26.82%	28.74%	-6.68%	124.24	54.03	2.30x
YOLOv4	QLinear	33.25%	33.71%	-1.36%	49.76	32.99	1.51x
ZFNet	QLinear	55.84%	55.96%	-0.21%	459.38	261.93	1.75x
ZFNet	QDQ	55.86%	55.96%	-0.18%	460.66	264.34	1.74x

MXNet Models with MXNet 1.7.0¶

Model	Accuracy			Performance Throughput(samples/sec)
Model	INT8	FP32	Acc Ratio[(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio[INT8/FP32]
Inception V3	77.80%	77.65%	0.20%	86.52	47.98	1.80x
MobileNet V1	71.60%	72.23%	-0.86%	441.59	337.52	1.31x
MobileNet V3 MLPerf	70.80%	70.87%	-0.10%	272.87	211.51	1.29x
ResNet v1 152	78.28%	78.54%	-0.33%	65.2	37.05	1.76x
ResNet18 V1.0	70.01%	70.14%	-0.19%	423.98	235.98	1.80x
ResNet50 V1.0	75.91%	76.33%	-0.55%	180.69	100.49	1.80x
SqueezeNet	56.80%	56.97%	-0.28%	311.23	198.61	1.57x
SSD MobileNet V1	74.94%	75.54%	-0.79%	43.5	25.77	1.69x
SSD ResNet50 V1.0	80.21%	80.23%	-0.03%	31.64	15.13	2.09x

Validated Pruning Examples¶

Model	Task Dataset	Dense Accuracy Sparse Accuracy	Relative Drop	Sparsity ratio Sparsity Pattern	Comments Balanced or unbalanced ratio
Model	Task Dataset	Dense Accuracy Sparse Accuracy	Relative Drop	Sparsity ratio Sparsity Pattern	Comments Balanced or unbalanced ratio
ResNet18	image classification ImageNet	top-1% acc = 69.76 top-1% acc = 69.47	-0.42%	30%	magnitude
ResNet50	image classification ImageNet	top-1% acc = 76.13 top-1% acc = 76.11	-0.03%	30%	magnitude
ResNet50	image classification ImageNet	top-1% acc = 76.13 top-1% acc = 76.01	-0.16%	30%	magnitude Post Training Quantization
ResNet50	image classification ImageNet	top-1% acc = 76.13 top-1% acc = 75.90	-0.30%	30%	magnitude Quantization Aware Training
Bert-Large	question answering SQuAD-v1.1	f1=91.34 f1=90.7	-0.07%	80% structured 2x1	group lasso unbalanced
Bert-Base	text classification MNLI	[m, mm] = [84.57, 84.79] [m, mm] = [82.45, 83.27]	[-2.51%, -1.80%]	70% unstructured	Prune once for all balanced
Bert-Base	text classification MNLI	[m, mm] = [84.57, 84.79] [m, mm] = [83.20, 84.11]	[-1.62%, -0.80%]	50% structured 1:2	Prune once for all balanced
Bert-Base	text classification SST-2	accuracy = 92.32 accuracy = 91.51	-0.88%	70% unstructured	Prune once for all balanced
Bert-Base	text classification SST-2	accuracy = 92.32 accuracy = 92.20	-0.13%	50% structured 1:2	Prune once for all balanced
Bert-Base	text classification SST-2	accuracy = 92.32 accuracy = 91.97	-0.38%	20% unstructured	gradient sensitivity balanced
Bert-Base	text classification QQP	[accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.48, 87.06]	[-0.68%, -1.12%]	70% unstructured	Prune once for all balanced
Bert-Base	text classification QQP	[accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.92, 87.78]	[-0.20%, -0.31%]	50% structured 1:2	Prune once for all balanced
Bert-Base	text classification QNLI	accuracy = 91.54 accuracy = 90.39	-1.26%	70% unstructured	Prune once for all balanced
Bert-Base	text classification QNLI	accuracy = 91.54 accuracy = 90.87	-0.73%	50% structured 1:2	Prune once for all balanced
Bert-Base	question answering	[em, f1] = [79.34, 87.10] [em, f1] = [77.27, 85.75]	[-2.61%, -1.54%]	70% unstructured	Prune once for all balanced
Bert-Base	question answering	[em, f1] = [79.34, 87.10] [em, f1] = [78.03, 86.50]	[-1.65%, -0.69%]	50% structured 1:2	Prune once for all balanced
Bert-Mini	question answering SQuAD-v1.1	f1]=76.87 f1=76.2	-0.80%	80% structured 4x1	snip momentum unbalanced
Bert-Mini	question answering SQuAD-v1.1	f1=76.87 f1=77.62	+0.98%	50% structured 2:4	snip momentum balanced
Distilbert-base-uncased	question answering SQuAD-v1.1	f1]=86.90 f1=86.15	-0.86%	80% structured 4x1	snip momentum unbalanced
Distilbert-base-uncased	question answering SQuAD-v1.1	f1=86.90 f1=87.50	+0.69%	50% structured 2:4	snip momentum balanced
Bert-base-uncased	question answering SQuAD-v1.1	f1]=88.59 f1=87.78	-0.92%	80% structured 4x1	snip momentum unbalanced
Bert-base-uncased	question answering SQuAD-v1.1	f1=88.59 f1=89.40	+0.91%	50% structured 2:4	snip momentum balanced
Bert-large	question answering SQuAD-v1.1	f1]=91.23 f1=90.91	-0.35%	80% structured 4x1	snip momentum unbalanced
Bert-large	question answering SQuAD-v1.1	f1=91.23 f1=91.67	+0.48%	50% structured 2:4	snip momentum balanced
Bert-Mini	text classification MRPC	f1=87.52 f1=87.22	-0.34%	90% structured 4x1	snip momentum unbalanced
Bert-Mini	text classification MRPC	f1=87.52 f1=87.33	-0.22%	90% structured 4x1	snip momentum balanced
Bert-Mini	text classification MRPC	f1=87.52 f1=86.89	-0.72%	50% structured 2:4	snip momentum balanced
Bert-Mini	text classification MRPC	f1=87.52 f1=86.8	-0.83%	60% structured per channel	snip momentum unbalanced
Distilbert-base-uncased	text classification MRPC	f1=90.26 f1=89.85	-0.46%	90% structured 4x1	snip momentum unbalanced
Distilbert-base-uncased	text classification MRPC	f1=90.26 f1=90.88	+0.69%	50% structured 2:4	snip momentum balanced
Bert-Mini	text classification SST-2	accuracy=87.61 accuracy=86.92	-0.79%	90% structured 4x1	snip momentum unbalanced
Bert-Mini	text classification SST-2	accuracy=87.61 accuracy=87.73	+0.14%	50% structured 2:4	snip momentum balanced
Bert-Mini	text classification SST-2	accuracy=87.61 accuracy=86.92	-0.79%	50% structured per channel	snip momentum unbalanced

Validated Knowledge Distillation Examples¶

Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime¶

Model (ONNX QDQ)	AWS c6i.2xlarge (Intel) CPU Execution Provider	AWS c6a.2xlarge (AMD) CPU Execution Provider	AWS c6g.2xlarge (ARM) CPU Execution Provider	NVidia A100 CUDA Execution Provider
ResNet50	74.76%	68.95%	74.76%	74.75%
BERT-base	85.54%	84.56%	85.54%	84.31%
ResNet50 V1.5	72.20%	67.70%	72.20%	72.29%
MobileNet V2	65.82%	58.56%	65.83%	65.63%
SSD MobileNet V1	22.45%	16.53%	22.45%	22.35%
DistilBERT base MRPC	84.56%	83.82%	84.56%	84.56%
SqueezeNet	56.54%	53.52%	56.54%	56.55%
SSD	18.63%	18.54%	18.63%	18.61%
AlexNet	54.71%	47.06%	54.71%	54.79%
CaffeNet	56.25%	52.35%	56.27%	56.24%
GoogleNet	67.73%	63.56%	67.72%	67.76%
ZFNet	55.86%	45.09%	55.86%	55.89%
Inception V1	67.21%	63.03%	67.20%	67.21%
SSD MobileNet V1 (ONNX Model Zoo)	22.86%	16.94%	22.80%	22.87%
Mobile bert MRPC	85.54%	84.56%	85.54%	85.54%
Roberta base MRPC	89.46%	90.44%	89.71%	89.71%
ResNet50 V1.5 MLPerf	76.14%	72.80%	76.14%	76.17%
VGG16	66.69%	64.25%	66.69%	66.64%
VGG16 (ONNX Model Zoo)	72.31%	69.35%	72.32%	72.34%
MobileNet V3 MLPerf	75.57%	70.78%	75.56%	75.52%
EfficientNet	77.61%	76.52%	77.56%	77.60%
MobileNet V2 (ONNX Model Zoo)	68.51%	62.48%	68.58%	68.48%
ShuffleNet V2	66.12%	58.41%	66.11%	66.11%