Benchmark ====== 1. [Introduction](#introduction) 2. [Get Started](#get-started-with-benchmark-api) 3. [Examples](#examples) 3.1. [Stock Pytorch Model](#stock-pytorch-model) 3.2. [IPEX Model](#ipex-model) 3.3. [Benchmark Output](#benchmark-output) ## Introduction The Benchmark is used to measure the model performance with the objective settings. It is inherited from IntelĀ® Neural Compressor [Benchmark](https://github.com/intel/neural-compressor/blob/master/docs/source/benchmark.html). ## Get Started with Benchmark API The class `BenchmarkConfig` allows users to adjust the following parameters with objective settings to measure model performance: `backend` (str, optional): the backend used for benchmark. Defaults to "torch". \ `warmup` (int, optional): number of iters to skip when collecting latency. Defaults to 5. \ `iteration` (int, optional): total iters when collecting latency. Defaults to 20. \ `cores_per_instance` (int, optional): the core number for 1 instance. Defaults to 4. \ `num_of_instance` (int, optional): the instance number. Defaults to -1. \ `torchscript` (bool, optional): enable it if you want to jit trace it before benchmarking. Defaults to False. \ `generate` (bool, optional): enable it if you want to use model generate when benchmarking. Defaults to False. >**Note**: Benchmark provides capability to automatically run with multiple instance through `cores_per_instance` and `num_of_instance` config (CPU only). Please make sure `cores_per_instance * num_of_instance` must be less than CPU physical core numbers. ## Examples Example inputs or a dataloader is required for benchmark. ```py def get_example_inputs(model_name, dataset_name='sst2'): tokenizer = transformers.AutoTokenizer.from_pretrained(model_name) dataset = load_dataset(dataset_name, split='validation') text = dataset[0]['text'] if dataset_name=='lambada' else dataset[0]['sentence'] example_inputs = tokenizer(text, padding='max_length', max_length=195, return_tensors='pt') return example_inputs ``` ### Stock Pytorch Model ```py from intel_extension_for_transformers.transformers import BenchmarkConfig from intel_extension_for_transformers.transformers.benchmark import benchmark config = BenchmarkConfig( batch_size=16, cores_per_instance=4, num_of_instance=-1, ) example_inputs = get_example_inputs(model_name_or_path) benchmark(model_name_or_path, config, example_inputs=example_inputs) ``` ### IPEX Model ```py from intel_extension_for_transformers.transformers import BenchmarkConfig from intel_extension_for_transformers.transformers.benchmark import benchmark config = BenchmarkConfig( backend='ipex', batch_size=16, cores_per_instance=4, num_of_instance=-1, ) example_inputs = get_example_inputs(model_name_or_path) benchmark(model_name_or_path, config, example_inputs=example_inputs) ``` ### Benchmark Output ```bash ********************************************** |****Multiple Instance Benchmark Summary*****| +---------------------------------+----------+ | Items | Result | +---------------------------------+----------+ | Latency average [second/sample] | 0.003 | | Throughput sum [samples/second] | 5071.933 | +---------------------------------+----------+ ```