Argument description:
Create a new system-perfmeter object with a name. If name is not given a unique name will be created for it automatically. The system-perfmeter samples probes in the system either at a regular interval or when a notification is raised.
The sampling-mode argument specifies the mode used to perform sampling. Default is "realtime-sync" when the interval is in realtime (wallclock), but also synchronized so all processor have executed at least one quantum since last sample.
With the "realtime" sampling mode, sampling is based on the wallclock time only, without any synchronization. Some probes might yield strange results, when some processors have not executed at all since the last sample.
Mode can also be "virtual", where the virtual time is used to perform sampling. The time is based on the virtual time of the first processor found in the system, unless clock is set to override the default one with another clock or processor.
In "realtime-sync", "realtime" and "virtual" modes the interval is set in seconds by the interval argument.
Another available mode is "notifier" where sampling is performed each time a notification is raised. notifier-type specifies the notifier type and notifier-obj the object where the notifier is installed.
The timestamp-file specifies a file to be used recording specific timepoints when the sampling should take place. Together with realtime-sync the file will be created and filled with the cycle count from the clock argument, when the samples are taken.
With the time-stamp sampler, this file will instead be used as an input file, and the sampling will take place on the cycles specified in the file.
Probes to sample are added by the <system-perfmeter>.add-probe command.
Output handling. By default, each sample measured by the probe-monitor, will print a table row on standard output. (A table row can consist of multiple lines being printed, including repeated headers).
The output-file argument specifies if the run-time table rows should be printed to a file, including any summary output.
The -window switch will cause the run-time samples to be printed in a separate console instead of the standard output.
The -print-no-samples switch specifies that no samples are printed to standard output, or a window, during execution. Any file output, with the output-file argument will still occur.
If -summary is given a summary of all sampled probes will be printed every time the simulation is stopped.
The sample data history is also stored in memory, so the data can be viewed at any time through the <system-perfmeter>.print-table command. When sampling at a high frequency, it is recommended to not produce any sample output while running, reducing the overhead of the probe-monitor.
The probe-based system perfmeter automatically adds the probes: sim.time.virtual, sim.time.wallclock (both session and delta). These show the virtual time and wallclock time spent during the simulation. Note that any time spent when not simulating (standing at the Simics prompt), is removed from the wallclock time.
Further the sim.slowdown delta probe is automatically shown, giving the ratio between the virtual time passed compared to the wallclock time. That is, a number below 1.0 means the virtual time passes faster than the wallclock, a figure of 5.0 means that one virtual second takes five wallclock seconds to simulate.
The sim.process.cpu_percent delta probe shows much much host processor usage the Simics process is taking. Any value below 100% indicates Simics gets blocked on something, such as real-time mode. On a four processor host, the maximum value would be 400% indicating Simics manages to can schedule work on all processors simultaneously. Note that processor usage might be from from other threads, not just the execution threads which are used for the actual simulation.
Finally, the sim.load_percent delta probe, shows an average of much actual instructions that is being simulated per cycle. With 100%, all simulation time is spent actually executing instructions. Processors might also wait for interrupts or other events, when cycles are consumed without executing any instructions, reducing this value. This average value takes into account how much cycles each processor actually consume, so differences in frequencies matter. Any processor specific IPC value (other than 1.0) is also taken into consideration. The IPC value may not change during simulation however.
There are a number of additional flags to easily add more probes to the system-perfmeter directly when starting the tool. All of these probes shows the delta values, that is, the difference between each sample.
The -mips flag adds the sim.mips probe, which reports the overall number of instructions per wallclock second being executed. Similarly, the -cpu-mips adds the cpu.mips probe which tells how many instructions per "second", each individual CPU is executing, based on the amount of time it is actually scheduled.
The -exec-modes flags adds the sim.exec_mode.hypersim_percent, sim.exec_mode.vmp_percent, sim.exec_mode.jit_percent, and the sim.exec_mode.interpreter_percent probes. These report the summary of which execution modes all processors have been executed in.
Similarly, the -cpu-exec-modes flag adds the corresponding cpu.exec_mode. probes, reporting the execution modes per individual processor in the system.
The -cpu-schedule-percent flag adds the cpu.schedule_percent probe which reports the percentage of the scheduled simulation time spent in the specific processors. Processors with high percentage simulates more slowly.
The -cpu-load flag adds the cpu.load_percent which gives the individual load on each processor. See above for the description of the sim.load_percent probe.
The -module-profile flag adds the sim.module_profile probe which gives a low overhead performance profile of in which shared objects the execution is spent.
The -io flag adds the sim.io_intensity probe, reporting how frequently IO operations occurs, as number of executed instructions per detected IO operation. High values are good, low values could cause performance reductions.
The probe-collection specifies a shortcut name for adding suitable probes for given scenario.
The explore collection adds large amount of probes suitable for finding possible bottlenecks in the execution performance. Some of these probes can however have their own overhead when collecting them. The large amount of probes collected also impose some overhead.
The performance collection adds a few probes just to measure the performance of Simics, without much overhead.
These are just some generally useful switches for adding probes easily directly when creting the system-perfmeter. Once system-perfmeter object has been created, it is possible to remove existing probes or add other probes to the sampling.