Benchmark

Introduction
Supported Matrix
Usage

Introduction

Intel Neural Compressor provides a command incbench to launch the Intel CPU performance benchmark.

To get the peak performance on Intel Xeon CPU, we should avoid crossing NUMA node in one instance. Therefore, by default, incbench will trigger 1 instance on the first NUMA node.

Supported Matrix

Platform	Status
Linux	✔
Windows	✔

Usage

Parameters	Default	comments
num_instances	1	Number of instances
num_cores_per_instance	None	Number of cores in each instance
C, cores	0-${num_cores_on_NUMA-1}	decides the visible core range
cross_memory	False	whether to allocate memory cross NUMA

Note: cross_memory is set to True only when memory is insufficient.

General Use Cases

incbench main.py: run 1 instance on NUMA:0.
incbench --num_i 2 main.py: run 2 instances on NUMA:0.
incbench --num_c 2 main.py: run multi-instances with 2 cores per instance on NUMA:0.
incbench -C 24-47 main.py: run 1 instance on COREs:24-47.
incbench -C 24-47 --num_c 4 main.py: run multi-instances with 4 COREs per instance on COREs:24-47.

Note: > - num_i works the same as num_instances > - num_c works the same as num_cores_per_instance

Dump Throughput and Latency Summary

To merge benchmark results from multi-instances, “incbench” automatically checks log file messages for “throughput” and “latency” information matching the following patterns.

throughput_pattern = r"[T,t]hroughput:\s*([0-9]*\.?[0-9]+)\s*([a-zA-Z/]*)"
latency_pattern = r"[L,l]atency:\s*([0-9]*\.?[0-9]+)\s*([a-zA-Z/]*)"

Demo usage

print("Throughput: {:.3f} samples/sec".format(throughput))
print("Latency: {:.3f} ms".format(latency * 10**3))