Examples

  1. Quantization

    1.1 Stock PyTorch Examples

    1.2 Intel Extension for Pytorch (IPEX) Examples

    1.3 Intel TensorFlow Examples

  2. Length Adaptive Transformers

  3. Pruning

  4. Distillation

    4.1 Knowledge Distillation

  5. Orchestrate

  6. Reference Deployment on Neural Engine

    6.1 Dense Reference

    6.2 Sparse Reference

  7. Early-Exit

Intel Extension for Transformers is a powerful toolkit with multiple model optimization techniques for Natural Language Processing Models, including quantization, pruning, distillation, auto distillation and orchestrate. Meanwhile Intel Extension for Transformers provides Transformers-accelerated Neural Engine, an optimized backend for NLP models to demonstrate the deployment.

Quantization

Stock PyTorch Examples

Model Task Dataset dynamic static
gpt-j-6B language-modeling(CLM) wikitext
t5-large-finetuned-xsum-cnn summarization cnn_dailymail
t5-base-cnn-dm summarization cnn_dailymail
lambdalabs/sd-pokemon-diffusers text-to-image image
bert-base-uncased language-modeling(MLM) wikitext
xlnet-base-cased language-modeling(PLM) wikitext
EleutherAI/gpt-neo-125M language-modeling(CLM) wikitext
sshleifer/tiny-ctrl language-modeling(CLM) wikitext WIP :star:
ehdwns1516/bert-base-uncased_SWAG multiple-choice swag
distilbert-base-uncased-distilled-squad question-answering SQuAD
valhalla/longformer-base-4096-finetuned-squadv1 question-answering SQuAD
lvwerra/pegasus-samsum summarization samsum WIP :star:
textattack/bert-base-uncased-MRPC text-classification MRPC
echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid text-classification SST-2
distilbert-base-uncased-finetuned-sst-2-english text-classification SST-2
elastic/distilbert-base-uncased-finetuned-conll03-english token-classification conll2003
t5-small translation wmt16 WIP :star:
Helsinki-NLP/opus-mt-en-ro translation wmt16 WIP :star:
Model Task Dataset qat No Trainer quantization
textattack/bert-base-uncased-MRPC text-classification MRPC
echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid text-classification SST-2

Intel Extension for Pytorch (IPEX) examples

Model Task Dataset static
distilbert-base-uncased-distilled-squad question-answering SQuAD
bert-large-uncased-whole-word-maskinuned-squad question-answering SQuAD

Intel TensorFlow Examples

Model Task Dataset static
bert-base-cased-finetuned-mrpc text-classification MRPC
xlnet-base-cased text-classification MRPC
distilgpt2 language-modeling(CLM) wikitext
distilbert-base-cased language-modeling(MLM) wikitext
Rocketknight1/bert-base-uncased-finetuned-swag multiple-choice swag
dslim/bert-base-NER token-classification conll2003

Length Adaptive Transformers

Model Name Datatype Optimization Method Modelsize (MB) Inference Result
Accuracy(F1) Latency(ms) GFLOPS** Speedup(compared with BERT Base)
BERT Base fp32 None 415.47 88.58 56.56 35.3 1x
LA-MiniLM fp32 Drop and restore base MiniLMv2 115.04 89.28 16.99 4.76 3.33x
LA-MiniLM(269, 253, 252, 202, 104, 34)* fp32 Evolution search (best config) 115.04 87.76 11.44 2.49 4.94x
QuaLA-MiniLM int8 Quantization base LA-MiniLM 84.85 88.85 7.84 4.76 7.21x
QuaLA-MiniLM(315,251,242,159,142,33)* int8 Evolution search (best config) 84.86 87.68 6.41 2.55 8.82x

Note: * length config apply to Length Adaptive model

Note: ** the multiplication and addition operation amount when model inference (GFLOPS is obtained from torchprofile tool)

Data is tested on Intel Xeon Platinum 8280 Scalable processor. Configuration detail please refer to examples

Pruning

Model Task Dataset Pruning Approach Pruning Type Framework
distilbert-base-uncased-distilled-squad question-answering SQuAD BasicMagnitude Unstructured Stock PyTorch
bert-large-uncased question-answering SQuAD Group LASSO Structured Stock PyTorch
distilbert-base-uncased-finetuned-sst-2-english text-classification SST-2 BasicMagnitude Unstructured Stock PyTorch/   Intel TensorFlow

Distillation

Knowledge Distillation

Student Model Teacher Model Task Dataset
distilbert-base-uncased bert-base-uncased-SST-2 text-classification SST-2
distilbert-base-uncased bert-base-uncased-QNLI text-classification QNLI
distilbert-base-uncased bert-base-uncased-QQP text-classification QQP
distilbert-base-uncased bert-base-uncased-MNLI-v1 text-classification MNLI
distilbert-base-uncased bert-base-uncased-squad-v1 question-answering SQuAD
TinyBERT_General_4L_312D bert-base-uncased-MNLI-v1 text-classification MNLI
distilroberta-base roberta-large-cola-krishna2020 text-classification COLA

Orchestrate

Model Task Dataset Distillation Teacher Pruning Approach Pruning Type
Intel/distilbert-base-uncased-sparse-90-unstructured-pruneofa question-answering SQuAD distilbert-base-uncased-distilled-squad PatternLock Unstructured
BasicMagnitude Unstructured
text-classification SST-2 distilbert-base-uncased-finetuned-sst-2-english PatternLock Unstructured
BasicMagnitude Unstructured

Reference Deployment on Neural Engine

Dense Reference Deployment on Neural Engine

Model Task Dataset Datatype
INT8 BF16
bert-large-uncased-whole-word-masking-finetuned-squad question-answering SQuAD
bhadresh-savani/distilbert-base-uncased-emotion text-classification emotion
textattack/bert-base-uncased-MRPC text-classification MRPC
textattack/distilbert-base-uncased-MRPC text-classification MRPC
Intel/roberta-base-mrpc text-classification MRPC
M-FAC/bert-mini-finetuned-mrpc text-classification MRPC
gchhablani/bert-base-cased-finetuned-mrpc text-classification MRPC
distilbert-base-uncased-finetuned-sst-2-english text-classification SST-2
philschmid/MiniLM-L6-H384-uncased-sst2 text-classification SST-2
moshew/bert-mini-sst2-distilled text-classification SST-2

Sparse Reference Deployment on Neural Engine

Model Task Dataset Datatype
INT8 BF16
Intel/distilbert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa question-answering SQuAD WIP :star:
Intel/bert-mini-sst2-distilled-sparse-90-1X4-block text-classification SST-2 WIP :star:

Early-Exit

Model Task Dataset Early-Exit Type
bert-base-uncased text-classification MNLI SWEET
notebook
philschmid/tiny-bert-sst2-distilled
textattack/roberta-base-SST-2
text-classification SST-2 TangoBERT
notebook