# Examples 1. [Quantization](#quantization) 1.1 [Stock PyTorch Examples](#stock-pytorch-examples) 1.2 [Intel Extension for Pytorch (IPEX) Examples](#intel-extension-for-pytorch-ipex-examples) 1.3 [Intel TensorFlow Examples](#intel-tensorflow-examples) 2. [Length Adaptive Transformers](#length-adaptive-transformers) 3. [Pruning](#pruning) 4. [Distillation](#distillation) 4.1 [Knowledge Distillation](#knowledge-distillation) 4.2 [Auto Distillation (NAS Based)](#auto-distillation-nas-based) 5. [Orchestrate](#orchestrate) 6. [Reference Deployment on Neural Engine](#reference-deployment-on-neural-engine) 6.1 [Dense Reference](#dense-reference-deployment-on-neural-engine) 6.2 [Sparse Reference](#sparse-reference-deployment-on-neural-engine) 7. [Early-Exit](#early-exit) Intel Extension for Transformers is a powerful toolkit with multiple model optimization techniques for Natural Language Processing Models, including quantization, pruning, distillation, auto distillation and orchestrate. Meanwhile Intel Extension for Transformers provides Transformers-accelerated Neural Engine, an optimized backend for NLP models to demonstrate the deployment. ## Quantization ### Stock PyTorch Examples
Model Task Dataset PostTrainingDynamic PostTrainingStatic
gpt-j-6B language-modeling(CLM) wikitext
t5-large-finetuned-xsum-cnn summarization cnn_dailymail
t5-base-cnn-dm summarization cnn_dailymail
lambdalabs/sd-pokemon-diffusers text-to-image image
bert-base-uncased language-modeling(MLM) wikitext
xlnet-base-cased language-modeling(PLM) wikitext
EleutherAI/gpt-neo-125M language-modeling(CLM) wikitext
sshleifer/tiny-ctrl language-modeling(CLM) wikitext WIP :star:
ehdwns1516/bert-base-uncased_SWAG multiple-choice swag
distilbert-base-uncased-distilled-squad question-answering SQuAD
valhalla/longformer-base-4096-finetuned-squadv1 question-answering SQuAD
lvwerra/pegasus-samsum summarization samsum WIP :star:
textattack/bert-base-uncased-MRPC text-classification MRPC
echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid text-classification SST-2
distilbert-base-uncased-finetuned-sst-2-english text-classification SST-2
elastic/distilbert-base-uncased-finetuned-conll03-english token-classification conll2003
t5-small translation wmt16 WIP :star:
Helsinki-NLP/opus-mt-en-ro translation wmt16 WIP :star:
Model Task Dataset QuantizationAwareTraining No Trainer quantization
textattack/bert-base-uncased-MRPC text-classification MRPC
echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid text-classification SST-2
### Intel Extension for Pytorch (IPEX) examples
Model Task Dataset PostTrainingStatic
distilbert-base-uncased-distilled-squad question-answering SQuAD
bert-large-uncased-whole-word-maskinuned-squad question-answering SQuAD
### Intel TensorFlow Examples
Model Task Dataset PostTrainingStatic
bert-base-cased-finetuned-mrpc text-classification MRPC
xlnet-base-cased text-classification MRPC
distilgpt2 language-modeling(CLM) wikitext
distilbert-base-cased language-modeling(MLM) wikitext
Rocketknight1/bert-base-uncased-finetuned-swag multiple-choice swag
dslim/bert-base-NER token-classification conll2003
## Length Adaptive Transformers
Model Name Datatype Optimization Method Modelsize (MB) Inference Result
Accuracy(F1) Latency(ms) GFLOPS** Speedup(compared with BERT Base)
BERT Base fp32 None 415.47 88.58 56.56 35.3 1x
LA-MiniLM fp32 Drop and restore base MiniLMv2 115.04 89.28 16.99 4.76 3.33x
LA-MiniLM(269, 253, 252, 202, 104, 34)* fp32 Evolution search (best config) 115.04 87.76 11.44 2.49 4.94x
QuaLA-MiniLM int8 Quantization base LA-MiniLM 84.85 88.85 7.84 4.76 7.21x
QuaLA-MiniLM(315,251,242,159,142,33)* int8 Evolution search (best config) 84.86 87.68 6.41 2.55 8.82x
>**Note**: * length config apply to Length Adaptive model >**Note**: ** the multiplication and addition operation amount when model inference (GFLOPS is obtained from torchprofile tool) Data is tested on Intel Xeon Platinum 8280 Scalable processor. Configuration detail please refer to [examples](../examples/huggingface/pytorch/question-answering/dynamic/README.html) ## Pruning
Model Task Dataset Pruning Approach Pruning Type Framework
distilbert-base-uncased-distilled-squad question-answering SQuAD BasicMagnitude Unstructured Stock PyTorch
bert-large-uncased question-answering SQuAD Group LASSO Structured Stock PyTorch
distilbert-base-uncased-finetuned-sst-2-english text-classification SST-2 BasicMagnitude Unstructured Stock PyTorch/   Intel TensorFlow
## Distillation ### Knowledge Distillation
Student Model Teacher Model Task Dataset
distilbert-base-uncased bert-base-uncased-SST-2 text-classification SST-2
distilbert-base-uncased bert-base-uncased-QNLI text-classification QNLI
distilbert-base-uncased bert-base-uncased-QQP text-classification QQP
distilbert-base-uncased bert-base-uncased-MNLI-v1 text-classification MNLI
distilbert-base-uncased bert-base-uncased-squad-v1 question-answering SQuAD
TinyBERT_General_4L_312D bert-base-uncased-MNLI-v1 text-classification MNLI
distilroberta-base roberta-large-cola-krishna2020 text-classification COLA
### Auto Distillation (NAS Based)
Model Task Dataset Distillation Teacher
google/mobilebert-uncased language-modeling(MLM) wikipedia bert-large-uncased
prajjwal1/bert-tiny language-modeling(MLM) wikipedia bert-base-uncased
## Orchestrate
Model Task Dataset Distillation Teacher Pruning Approch Pruning Type
Intel/distilbert-base-uncased-sparse-90-unstructured-pruneofa question-answering SQuAD distilbert-base-uncased-distilled-squad PatternLock Unstructured
BasicMagnitude Unstructured
text-classification SST-2 distilbert-base-uncased-finetuned-sst-2-english PatternLock Unstructured
BasicMagnitude Unstructured
## Reference Deployment on Neural Engine ### Dense Reference Deployment on Neural Engine
Model Task Dataset Datatype
INT8 BF16
bert-large-uncased-whole-word-masking-finetuned-squad question-answering SQuAD
bhadresh-savani/distilbert-base-uncased-emotion text-classification emotion
textattack/bert-base-uncased-MRPC text-classification MRPC
textattack/distilbert-base-uncased-MRPC text-classification MRPC
Intel/roberta-base-mrpc text-classification MRPC
M-FAC/bert-mini-finetuned-mrpc text-classification MRPC
gchhablani/bert-base-cased-finetuned-mrpc text-classification MRPC
distilbert-base-uncased-finetuned-sst-2-english text-classification SST-2
philschmid/MiniLM-L6-H384-uncased-sst2 text-classification SST-2
moshew/bert-mini-sst2-distilled text-classification SST-2
### Sparse Reference Deployment on Neural Engine
Model Task Dataset Datatype
INT8 BF16
Intel/distilbert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa question-answering SQuAD WIP :star:
Intel/bert-mini-sst2-distilled-sparse-90-1X4-block text-classification SST-2 WIP :star:
## Early-Exit
Model Task Dataset Early-Exit Type
bert-base-uncased text-classification MNLI SWEET
notebook
philschmid/tiny-bert-sst2-distilled
textattack/roberta-base-SST-2
text-classification SST-2 TangoBERT
notebook