Fast BERT (Experimental)

Feature Description

Intel proposed a technique to speed up BERT workloads. Implementation leverages the idea from Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads.

The Implementation is integrated into Intel® Extension for PyTorch*. BERT could benefit from this new technique, for both training and inference.

Prerequisite

  • Transformers 4.6.0 ~ 4.31.0

Usage Example

An API ipex.fast_bert is provided for a simple usage. Usage of this API follows the pattern of ipex.optimize function. More detailed description of API is available at Fast BERT API doc

import torch
from transformers import BertModel

model = BertModel.from_pretrained("bert-base-uncased")
model.eval()

vocab_size = model.config.vocab_size
batch_size = 1
seq_length = 512
data = torch.randint(vocab_size, size=[batch_size, seq_length])
torch.manual_seed(43)

#################### code changes ####################  # noqa F401
import intel_extension_for_pytorch as ipex
model = ipex.fast_bert(model, dtype=torch.bfloat16)
######################################################  # noqa F401

with torch.no_grad():
    model(data)

print("Execution finished")