IV Performance 15.1 Measuring Performance

15 Simulation Performance

This chapter covers various topics related to Simics performance and what can be done to measure and improve it. It discusses the general performance features provided by Simics. For ways to scale the simulation even further see chapter 16.

Simics is a fast simulator utilizing various techniques such as run-time code generation to optimize performance. In some cases Simics can execute code faster than the target system being simulated, while it can also be considerably slower in other cases.

There are four major execution modes Simics uses to execute target instructions: hypersimulation, VMP, JIT and interpreted mode.

Hypersimulation means that Simics detects repetitive work performed by the target code and performs the effects of the code without actually having to run the code. In the most simple case this is a simple idle loop, but it can also be applied to more complex examples such as spin-locks and device polling. This is the fastest execution mode.

VMP, which is a part of Simics's x86 models, utilizes the virtualization capabilities of modern processors to run target instructions directly. This typically results in high simulation performance, but the host and target needs have the same instruction set, and you have to do special set up to enable it. VMP is currently only supported on x86 hosts.

JIT mode uses run-time code generation to translate blocks of the target instructions into blocks of host instructions. JIT mode is when Simics runs such translated blocks. This mode is supported by most target processor models in Simics.

Interpreted mode interprets the target instructions one by one. This mode is the slowest, but it is always available.

There are basically two ways to measure Simics performance:

How fast the instructions are being simulated, typically measured in million target instructions per host second (MIPS).
How fast the virtual times elapses.

In most cases the user is mostly interested in the first. Simics should execute instructions as fast as possible to finish the workload in shortest possible time. However, since Simics is a full system simulator, it is also important that the virtual time on the simulated machine advances quickly. That is important in cases where a program or operating system is waiting on a timer to expire or an interrupt from a device in order to proceed with the workload.

If we divide the wall-clock time on the host that Simics executes on, with the elapsed virtual time on the target machine, we get a slowdown number.

slowdown = Time_host/Time_virtual

A slowdown number of 2.3 means that Simics performance is 2.3 times slower than the system it simulates. A slowdown value of less than 1.0 means that Simics manages to execute the corresponding code faster than the system it simulates. The slowdown depends on various factors:

The performance of the host which Simics runs on.
The application which runs in Simics.
The frequency of the target being simulated.
The simulator time model.

The default time model in Simics is that each target instruction takes one target cycle to execute. That is the default, Instructions Per Cycle (IPC) is 1.0. This is a simplification (but in many cases an adequate approximation) compared to the actual time it takes on the real hardware to execute instructions. It is possible to change the IPC number using the <cpu>.set-step-rate command. For example:

simics> board.mb.cpu0.core[0][0].set-step-rate ipc = 1.5
Setting step rate to 3/2 steps/cycle
simics> board.mb.cpu0.core[0][0].set-step-rate ipc = 0.5
Setting step rate to 1/2 steps/cycle

In the first example, IPC of 1.5 means that Simics needs to execute 3 instructions for 2 cycles to elapse. In the second example, for each instruction executed two cycles elapse. Thus, with a lower IPC value, virtual time will progress faster and simulation slowdown will decrease.

Note that there is nothing wrong in changing the default IPC when it comes to the accuracy of the simulation. In many cases, the IPC observed for a given benchmark is much lower than the 1.0 that Simics assumes, and matching it will both make the simulation closer to the real hardware and improve the simulation speed, at least in virtual time. Simulations that profits most from this change are simulations involving devices and long memory latencies.

IV Performance 15.1 Measuring Performance