15.4 Workload Characteristics 15.6 VMP
Simics User's Guide  /  IV Performance  /  15 Simulation Performance  / 

15.5 Hypersimulation

The term hypersimulation refers to a simulator feature which can detect, analyze and understand, frequently executed target instructions and fast-forward the simulation of these, thus providing the corresponding results more rapidly.

Being able to detect the idle loop (see chapter 15.4.1) is one example of when this technique is applicable. A much more extreme hypersimulation task would be to understand a complete program and simply provide the corresponding result without actually starting the program. Naturally, this is hardly ever applicable, and impossible in general. Busy-wait loops and spin-locks are more realistic examples of cases where it is easy to optimize away the execution with hypersimulation.

Hypersimulation can be achieved in several ways:

The following instructions are handled with CPU handled instruction hypersimulation:

Target Instruction Comment
ARM mcr Enabling "Wait for Interrupt"
m68k stop
MIPS wait
PowerPC mtmsr Setting MSR[POW].
PowerPC b 0 Branch to itself
PowerPC wait
x86 hlt
x86 mwait

Hypersimulation should be as non-intrusive as possible, the only difference that should be noticeable as a Simics user is the increased performance. Registers, timing, memory contents, exceptions, interrupts etc. should be identical.

Hypersimulation using the hypersim-pattern-matcher may have some intrusions regarding Simics features:

Hypersimulation using the hypersim-pattern-matcher is activated by default, and can be activated/deactivated with enable-hypersim/disable-hypersim.

The hypersim-status command gives some details on what hypersim features that are currently active.

Hypersim patterns are typically fragile, since they depend on an exact instruction pattern. Simply changing the compiler revision or an optimizing flag to the compiler can break the pattern from being recognized.

The QSP-x86 machine does not use hypersim patterns, but with an old PPC-based machine we run the following example:

simics> disable-hypersim
simics> system-perfmeter -realtime -mips
Using real time sample slice of 1.000000s
simics> c
SystemPerf: Total-vt Total-rt Sample-vt Sample-rt Slowdown  CPU Idle  MIPS
SystemPerf: -------- -------- --------- --------- -------- ---- ---- -----
SystemPerf:     0.1s     0.3s     0.09s     0.33s      3.4 100%   0%    29
SystemPerf:     0.7s     1.3s     0.56s     1.00s      1.8  97%   0%    55
SystemPerf:     0.8s     2.3s     0.13s     1.00s      7.6  99%   0%    13
SystemPerf:     2.0s     3.3s     1.22s     1.00s      0.8  95%   0%   122
SystemPerf:     4.2s     4.3s     2.24s     1.00s      0.4  78%   0%   223
SystemPerf:     5.8s     5.3s     1.54s     1.00s      0.6  97%   0%   153
SystemPerf:    11.3s     6.3s     5.46s     1.00s      0.2  99%   0%   543
SystemPerf:    15.9s     7.3s     4.65s     1.00s      0.2  98%   0%   462
SystemPerf:    21.7s     8.3s     5.82s     1.00s      0.2  99%   0%   579
SystemPerf:    27.5s     9.3s     5.82s     1.00s      0.2 100%   0%   579
SystemPerf:    33.3s    10.3s     5.80s     1.00s      0.2  99%   0%   579

simics> enable-hypersim
simics> c
SystemPerf:    65.6s    11.2s    32.23s     0.88s      0.0  98%  85%  3673
SystemPerf:   491.1s    12.2s   425.52s     1.00s      0.0 100% 100% 42382
SystemPerf:   908.4s    13.2s   417.36s     1.00s      0.0  99% 100% 41550
SystemPerf:  1305.9s    14.2s   397.44s     1.00s      0.0 100% 100% 39745
SystemPerf:  1746.3s    15.2s   440.44s     1.00s      0.0  99% 100% 44039
SystemPerf:  2200.9s    16.2s   454.59s     1.00s      0.0  99% 100% 45457

This configuration has a Linux idle loop optimizer by default. We disable hypersim and execute the code "normally" during boot. After 6 seconds (host) or 12 seconds (virtual) the boot is finished and the operating system starts executing the idle loop. The idle loop itself is executed quickly in Simics, running at 579 MIPS. When idling, almost 6 virtual seconds is executed for each host second. That is, Simics executes 6 times faster than the hardware (the processor is configured to be running at 100 MHz).

Next, we stop the execution, enable hypersim, and continue the simulation. Now we can see the idle loop optimizer kicking in and 400 virtual seconds is executed each host second, that is about 70 times faster than without hypersim enabled.

15.4 Workload Characteristics 15.6 VMP