Benchmark

  1. Introduction

  2. Supported Matrix

  3. Usage

Introduction

Intel Neural Compressor provides a command incbench to launch the Intel CPU performance benchmark.

To get the peak performance on Intel Xeon CPU, we should avoid crossing NUMA node in one instance. Therefore, by default, incbench will trigger 1 instance on the first NUMA node.

Supported Matrix

Platform Status
Linux
Windows

Usage

Parameters Default comments
num_instances 1 Number of instances
num_cores_per_instance None Number of cores in each instance
C, cores 0-${num_cores_on_NUMA-1} decides the visible core range
cross_memory False whether to allocate memory cross NUMA

Note: cross_memory is set to True only when memory is insufficient.

General Use Cases

  1. incbench main.py: run 1 instance on NUMA:0.

  2. incbench --num_i 2 main.py: run 2 instances on NUMA:0.

  3. incbench --num_c 2 main.py: run multi-instances with 2 cores per instance on NUMA:0.

  4. incbench -C 24-47 main.py: run 1 instance on COREs:24-47.

  5. incbench -C 24-47 --num_c 4 main.py: run multi-instances with 4 COREs per instance on COREs:24-47.

Note: > - num_i works the same as num_instances > - num_c works the same as num_cores_per_instance

Dump Throughput and Latency Summary

To merge benchmark results from multi-instances, “incbench” automatically checks log file messages for “throughput” and “latency” information matching the following patterns.

throughput_pattern = r"[T,t]hroughput:\s*([0-9]*\.?[0-9]+)\s*([a-zA-Z/]*)"
latency_pattern = r"[L,l]atency:\s*([0-9]*\.?[0-9]+)\s*([a-zA-Z/]*)"

Demo usage

print("Throughput: {:.3f} samples/sec".format(throughput))
print("Latency: {:.3f} ms".format(latency * 10**3))