Pipeline
Introduction
The pipeline is inherited from huggingface/transformers pipeline, it is simple to use any model from Hub for inference on any language, computer vision, speech, and multimodal tasks. Two features for int8 model inference and model inference on executor backend have been added to the extension.
Examples
Pipeline Inference for INT8 Model
Initialize a pipeline instance with a model name and specific task.
from intel_extension_for_transformers.transformers.pipeline import pipeline text_classifier = pipeline( task="text-classification", model="Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static", framework="pt", device=torch.device("cpu"), )
Pass your input text to the pipeline instance for inference.
outputs = text_classifier("This is great !") # output: [{'label': 1, 'score': 0.9998425245285034}]
Pipeline Inference for Executor Backend
For executor, we only accept ONNX model now for pipeline. Users can get ONNX model from PyTorch model with our existing API. Right now, pipeline for executor only supports text-classification task.
Initialize a pipeline instance with an ONNX model, model config, model tokenizer and specific backend. The MODEL_NAME is the pytorch model name you used for exporting the ONNX model.
from intel_extension_for_transformers.transformers.pipeline import pipeline from transformers import AutoConfig, AutoTokenizer config = AutoConfig.from_pretrained(MODEL_NAME) tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) text_classifier = pipeline( task="text-classification", config=config, tokenizer=tokenizer, model='fp32.onnx', model_kwargs={'backend': "executor"}, )
Pass your input text to the pipeline instance for inference.
outputs = text_classifier( "But believe it or not , it 's one of the most " "beautiful , evocative works I 've seen ." ) # output: [{'label': 'POSITIVE', 'score': 0.9998886585235596}]