# Examples 1. [Quantization](#quantization) 1.1 [Stock PyTorch Examples](#stock-pytorch-examples) 1.2 [Intel Extension for Pytorch (IPEX) Examples](#intel-extension-for-pytorch-ipex-examples) 1.3 [Intel TensorFlow Examples](#intel-tensorflow-examples) 2. [Length Adaptive Transformers](#length-adaptive-transformers) 3. [Pruning](#pruning) 4. [Distillation](#distillation) 4.1 [Knowledge Distillation](#knowledge-distillation) 4.2 [Auto Distillation (NAS Based)](#auto-distillation-nas-based) 5. [Orchestrate](#orchestrate) 6. [Reference Deployment on Neural Engine](#reference-deployment-on-neural-engine) 6.1 [Dense Reference](#dense-reference-deployment-on-neural-engine) 6.2 [Sparse Reference](#sparse-reference-deployment-on-neural-engine) 7. [Early-Exit](#early-exit) Intel Extension for Transformers is a powerful toolkit with multiple model optimization techniques for Natural Language Processing Models, including quantization, pruning, distillation, auto distillation and orchestrate. Meanwhile Intel Extension for Transformers provides Transformers-accelerated Neural Engine, an optimized backend for NLP models to demonstrate the deployment. ## Quantization ### Stock PyTorch Examples
| Model | Task | Dataset | QuantizationAwareTraining | No Trainer quantization |
|---|---|---|---|---|
| textattack/bert-base-uncased-MRPC | text-classification | MRPC | ✔ | |
| echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid | text-classification | SST-2 | ✔ |
| Model | Task | Dataset | PostTrainingStatic |
|---|---|---|---|
| distilbert-base-uncased-distilled-squad | question-answering | SQuAD | ✔ |
| bert-large-uncased-whole-word-maskinuned-squad | question-answering | SQuAD | ✔ |
| Model | Task | Dataset | PostTrainingStatic |
|---|---|---|---|
| bert-base-cased-finetuned-mrpc | text-classification | MRPC | ✔ |
| xlnet-base-cased | text-classification | MRPC | ✔ |
| distilgpt2 | language-modeling(CLM) | wikitext | ✔ |
| distilbert-base-cased | language-modeling(MLM) | wikitext | ✔ |
| Rocketknight1/bert-base-uncased-finetuned-swag | multiple-choice | swag | ✔ |
| dslim/bert-base-NER | token-classification | conll2003 | ✔ |
| Model Name | Datatype | Optimization Method | Modelsize (MB) | Inference Result | |||
|---|---|---|---|---|---|---|---|
| Accuracy(F1) | Latency(ms) | GFLOPS** | Speedup(compared with BERT Base) | ||||
| BERT Base | fp32 | None | 415.47 | 88.58 | 56.56 | 35.3 | 1x |
| LA-MiniLM | fp32 | Drop and restore base MiniLMv2 | 115.04 | 89.28 | 16.99 | 4.76 | 3.33x |
| LA-MiniLM(269, 253, 252, 202, 104, 34)* | fp32 | Evolution search (best config) | 115.04 | 87.76 | 11.44 | 2.49 | 4.94x |
| QuaLA-MiniLM | int8 | Quantization base LA-MiniLM | 84.85 | 88.85 | 7.84 | 4.76 | 7.21x |
| QuaLA-MiniLM(315,251,242,159,142,33)* | int8 | Evolution search (best config) | 84.86 | 87.68 | 6.41 | 2.55 | 8.82x |
| Model | Task | Dataset | Pruning Approach | Pruning Type | Framework |
|---|---|---|---|---|---|
| distilbert-base-uncased-distilled-squad | question-answering | SQuAD | BasicMagnitude | Unstructured | Stock PyTorch |
| bert-large-uncased | question-answering | SQuAD | Group LASSO | Structured | Stock PyTorch |
| distilbert-base-uncased-finetuned-sst-2-english | text-classification | SST-2 | BasicMagnitude | Unstructured | Stock PyTorch/ Intel TensorFlow |
| Student Model | Teacher Model | Task | Dataset |
|---|---|---|---|
| distilbert-base-uncased | bert-base-uncased-SST-2 | text-classification | SST-2 |
| distilbert-base-uncased | bert-base-uncased-QNLI | text-classification | QNLI |
| distilbert-base-uncased | bert-base-uncased-QQP | text-classification | QQP |
| distilbert-base-uncased | bert-base-uncased-MNLI-v1 | text-classification | MNLI |
| distilbert-base-uncased | bert-base-uncased-squad-v1 | question-answering | SQuAD |
| TinyBERT_General_4L_312D | bert-base-uncased-MNLI-v1 | text-classification | MNLI |
| distilroberta-base | roberta-large-cola-krishna2020 | text-classification | COLA |
| Model | Task | Dataset | Distillation Teacher |
|---|---|---|---|
| google/mobilebert-uncased | language-modeling(MLM) | wikipedia | bert-large-uncased |
| prajjwal1/bert-tiny | language-modeling(MLM) | wikipedia | bert-base-uncased |
| Model | Task | Dataset | Distillation Teacher | Pruning Approch | Pruning Type |
|---|---|---|---|---|---|
| Intel/distilbert-base-uncased-sparse-90-unstructured-pruneofa | question-answering | SQuAD | distilbert-base-uncased-distilled-squad | PatternLock | Unstructured |
| BasicMagnitude | Unstructured | ||||
| text-classification | SST-2 | distilbert-base-uncased-finetuned-sst-2-english | PatternLock | Unstructured | |
| BasicMagnitude | Unstructured |
| Model | Task | Dataset | Datatype | |
|---|---|---|---|---|
| INT8 | BF16 | |||
| bert-large-uncased-whole-word-masking-finetuned-squad | question-answering | SQuAD | ✔ | ✔ |
| bhadresh-savani/distilbert-base-uncased-emotion | text-classification | emotion | ✔ | ✔ |
| textattack/bert-base-uncased-MRPC | text-classification | MRPC | ✔ | ✔ |
| textattack/distilbert-base-uncased-MRPC | text-classification | MRPC | ✔ | ✔ |
| Intel/roberta-base-mrpc | text-classification | MRPC | ✔ | ✔ |
| M-FAC/bert-mini-finetuned-mrpc | text-classification | MRPC | ✔ | ✔ |
| gchhablani/bert-base-cased-finetuned-mrpc | text-classification | MRPC | ✔ | ✔ |
| distilbert-base-uncased-finetuned-sst-2-english | text-classification | SST-2 | ✔ | ✔ |
| philschmid/MiniLM-L6-H384-uncased-sst2 | text-classification | SST-2 | ✔ | ✔ |
| moshew/bert-mini-sst2-distilled | text-classification | SST-2 | ✔ | ✔ |
| Model | Task | Dataset | Datatype | |
|---|---|---|---|---|
| INT8 | BF16 | |||
| Intel/distilbert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa | question-answering | SQuAD | ✔ | WIP :star: |
| Intel/bert-mini-sst2-distilled-sparse-90-1X4-block | text-classification | SST-2 | ✔ | WIP :star: |
| Model | Task | Dataset | Early-Exit Type |
|---|---|---|---|
| bert-base-uncased | text-classification | MNLI | SWEET notebook |
| philschmid/tiny-bert-sst2-distilled textattack/roberta-base-SST-2 |
text-classification | SST-2 | TangoBERT notebook |