The term hypersimulation refers to a simulator feature which can detect, analyze and understand, frequently executed target instructions and fast-forward the simulation of these, thus providing the corresponding results more rapidly.
Being able to detect the idle loop (see chapter 15.4.1) is one example of when this technique is applicable. A much more extreme hypersimulation task would be to understand a complete program and simply provide the corresponding result without actually starting the program. Naturally, this is hardly ever applicable, and impossible in general. Busy-wait loops and spin-locks are more realistic examples of cases where it is easy to optimize away the execution with hypersimulation.
Hypersimulation can be achieved in several ways:
-no-auto
switch for the enable-hypersim command disables automatic
hypersimulation.
The following instructions are handled with CPU handled instruction hypersimulation:
Target | Instruction | Comment |
ARM | mcr | Enabling "Wait for Interrupt" |
m68k | stop | |
MIPS | wait | |
PowerPC | mtmsr | Setting MSR[POW] . |
PowerPC | b 0 | Branch to itself |
PowerPC | wait | |
x86 | hlt | |
x86 | mwait |
Hypersimulation should be as non-intrusive as possible, the only difference that should be noticeable as a Simics user is the increased performance. Registers, timing, memory contents, exceptions, interrupts etc. should be identical.
Hypersimulation using the hypersim-pattern-matcher may have some intrusions regarding Simics features:
Hypersimulation using the hypersim-pattern-matcher is activated by default, and can be activated/deactivated with enable-hypersim/disable-hypersim.
The hypersim-status command gives some details on what hypersim features that are currently active.
Hypersim patterns are typically fragile, since they depend on an exact instruction pattern. Simply changing the compiler revision or an optimizing flag to the compiler can break the pattern from being recognized.
The QSP-x86 machine does not use hypersim patterns, but with an old PPC-based machine we run the following example:
simics> disable-hypersim simics> system-perfmeter -realtime -mips Using real time sample slice of 1.000000s simics> c SystemPerf: Total-vt Total-rt Sample-vt Sample-rt Slowdown CPU Idle MIPS SystemPerf: -------- -------- --------- --------- -------- ---- ---- ----- SystemPerf: 0.1s 0.3s 0.09s 0.33s 3.4 100% 0% 29 SystemPerf: 0.7s 1.3s 0.56s 1.00s 1.8 97% 0% 55 SystemPerf: 0.8s 2.3s 0.13s 1.00s 7.6 99% 0% 13 SystemPerf: 2.0s 3.3s 1.22s 1.00s 0.8 95% 0% 122 SystemPerf: 4.2s 4.3s 2.24s 1.00s 0.4 78% 0% 223 SystemPerf: 5.8s 5.3s 1.54s 1.00s 0.6 97% 0% 153 SystemPerf: 11.3s 6.3s 5.46s 1.00s 0.2 99% 0% 543 SystemPerf: 15.9s 7.3s 4.65s 1.00s 0.2 98% 0% 462 SystemPerf: 21.7s 8.3s 5.82s 1.00s 0.2 99% 0% 579 SystemPerf: 27.5s 9.3s 5.82s 1.00s 0.2 100% 0% 579 SystemPerf: 33.3s 10.3s 5.80s 1.00s 0.2 99% 0% 579 simics> enable-hypersim simics> c SystemPerf: 65.6s 11.2s 32.23s 0.88s 0.0 98% 85% 3673 SystemPerf: 491.1s 12.2s 425.52s 1.00s 0.0 100% 100% 42382 SystemPerf: 908.4s 13.2s 417.36s 1.00s 0.0 99% 100% 41550 SystemPerf: 1305.9s 14.2s 397.44s 1.00s 0.0 100% 100% 39745 SystemPerf: 1746.3s 15.2s 440.44s 1.00s 0.0 99% 100% 44039 SystemPerf: 2200.9s 16.2s 454.59s 1.00s 0.0 99% 100% 45457
This configuration has a Linux idle loop optimizer by default. We disable hypersim and execute the code "normally" during boot. After 6 seconds (host) or 12 seconds (virtual) the boot is finished and the operating system starts executing the idle loop. The idle loop itself is executed quickly in Simics, running at 579 MIPS. When idling, almost 6 virtual seconds is executed for each host second. That is, Simics executes 6 times faster than the hardware (the processor is configured to be running at 100 MHz).
Next, we stop the execution, enable hypersim, and continue the simulation. Now we can see the idle loop optimizer kicking in and 400 virtual seconds is executed each host second, that is about 70 times faster than without hypersim enabled.