Benchmark
Introduction
Intel Neural Compressor provides a command incbench to launch the Intel CPU performance benchmark.
To get the peak performance on Intel Xeon CPU, we should avoid crossing NUMA node in one instance.
Therefore, by default, incbench will trigger 1 instance on the first NUMA node.
Supported Matrix
| Platform | Status |
|---|---|
| Linux | ✔ |
| Windows | ✔ |
Usage
| Parameters | Default | comments |
|---|---|---|
| num_instances | 1 | Number of instances |
| num_cores_per_instance | None | Number of cores in each instance |
| C, cores | 0-${num_cores_on_NUMA-1} | decides the visible core range |
| cross_memory | False | whether to allocate memory cross NUMA |
Note: cross_memory is set to True only when memory is insufficient.
General Use Cases
incbench main.py: run 1 instance on NUMA:0.incbench --num_i 2 main.py: run 2 instances on NUMA:0.incbench --num_c 2 main.py: run multi-instances with 2 cores per instance on NUMA:0.incbench -C 24-47 main.py: run 1 instance on COREs:24-47.incbench -C 24-47 --num_c 4 main.py: run multi-instances with 4 COREs per instance on COREs:24-47.
Note: > -
num_iworks the same asnum_instances> -num_cworks the same asnum_cores_per_instance
Dump Throughput and Latency Summary
To merge benchmark results from multi-instances, “incbench” automatically checks log file messages for “throughput” and “latency” information matching the following patterns.
throughput_pattern = r"[T,t]hroughput:\s*([0-9]*\.?[0-9]+)\s*([a-zA-Z/]*)"
latency_pattern = r"[L,l]atency:\s*([0-9]*\.?[0-9]+)\s*([a-zA-Z/]*)"
Demo usage
print("Throughput: {:.3f} samples/sec".format(throughput))
print("Latency: {:.3f} ms".format(latency * 10**3))