5 Overview of SystemC Features 7 Limitations
SystemC* Library  / 

6 Execution of SystemC Models in the Simics Simulator

When a Simics adapter is created inside Simics, the elaboration phase is run which creates the SystemC object hierarchy. SystemC simulation phase involves the execution of the SystemC scheduler and is driven by Simics. The Simics simulator has a concept of virtual time that all models refer to. This chapter covers how SystemC models are executed inside the Simics simulator.

6.1 SystemC simulation time

The SystemC scheduler is event-driven and events occur at precise points in simulation time. Simulation time in SystemC is an integer multiple of the time resolution and increases monotonically during simulation. Typically, outside of Simics, the SystemC simulation time is advanced by the sc_start function.

The simulation time resolution used for SystemC in Simics is by default one picosecond. In the SystemC Library, sc_start is invoked by the adapter only. It must never be invoked from within a SystemC model. The adapter drives the simulation and keeps track of the current simulation time. To print the current simulation time, use the print-time (ptime) command on the adapter object with the time option:

simics> psel dev
simics> r 1 ps
simics> ptime -pico-seconds
┌─────────┬───────────┐
│Processor│Picoseconds│
├─────────┼───────────┤
│dev      │          1│
└─────────┴───────────┘

It returns the time in seconds as a floating-point value. When the dev object is selected as the command line frontend, the simulation time can be advanced using the run (r) command.

The description of these commands can be found from Simics Reference manual or help command output from Simics CLI.

The pending SystemC events currently registered with the kernel can be listed using the print-event-queue (peq) command:

simics> peq -i
┌──────────┬──────────┬──────────────┐
│  Cycle   │  Object  │ Description  │
├──────────┼──────────┼──────────────┤
│4999999999│dev.engine│Internal: stop│
└──────────┴──────────┴──────────────┘

┌─────────────┬──────┬───────────────────────────────────────────────────┐
│SystemC (ps) │Object│                    Description                    │
├─────────────┼──────┼───────────────────────────────────────────────────┤
│1234000000000│dev   │test_sc_devices.dummy_1_event                      │
│1234000000000│dev   │test_sc_devices.dummy_3_event                      │
│5678000000000│dev   │test_sc_devices.event_method (static method)       │
│5678000000000│dev   │test_sc_devices.trigger_method_event               │
│5678000000000│dev   │test_sc_devices.event_thread (dynamic thread)      │
│5678000000000│dev   │test_sc_devices.trigger_thread_event               │
│9876000000000│dev   │test_sc_devices.dummy_2_event                      │
│9999999999999│dev   │test_sc_devices.event_thread_timed (dynamic thread)│
└─────────────┴──────┴───────────────────────────────────────────────────┘

The events posted by the SystemC adapter are treated as Simics simulator internal events, thus -i is needed.

The time shown in the ptime command output is relative to the current simulation time. Thus, the next event will be triggered after 1234000000000 ps.

The simulation time depends on the context of the SystemC kernel. Each adapter has its own context of the SystemC kernel with its own simulation time and events.

simics> psel dev2
simics> ptime -pico-seconds
┌─────────┬───────────┐
│Processor│Picoseconds│
├─────────┼───────────┤
│dev2     │          0│
└─────────┴───────────┘
simics> peq -i
┌─────┬───────────┬──────────────┐
│Cycle│  Object   │ Description  │
├─────┼───────────┼──────────────┤
│    0│dev2.engine│Internal: stop│
└─────┴───────────┴──────────────┘

The above results show that dev and dev2 have different simulation time and events.

6.2 Simics processors driving the SystemC simulation

In the Simics simulation framework, the processor concept includes all models that actively drive the simulation forward and manage the simulation time. Each processor is event-driven and supports one or more types of event queues representing: cycles, steps and/or pico-seconds.

All Simics processors in the example configuration can be listed using the list-processors command.

Each adapter (SystemC subsystem) exposes two processors to the rest of the Simics simulation system.

simics> list-processors -all
┌───────────┬─┬─────────────────────────┬────────┬─────────┐
│ CPU Name  │ │        CPU Class        │  Freq  │Scheduled│
├───────────┼─┼─────────────────────────┼────────┼─────────┤
│clock      │ │clock                    │1.00 THz│yes      │
│dev        │ │test_sc_devices          │1.00 THz│no       │
│dev.engine │ │co-execute               │1.00 THz│yes      │
│dev2       │*│sample_tlm2_simple_device│1.00 THz│no       │
│dev2.engine│ │co-execute               │1.00 THz│yes      │
└───────────┴─┴─────────────────────────┴────────┴─────────┘
* = selected CPU

Processor dev and dev2 support two kinds of event queues: the cycle-based and pico-seconds-based. The frequency is hardcoded as 1000000 MHz (or equivalently, 1 THz). Thus, 1 cycle equals 1 ps. All SystemC events are posted on the SystemC clock using the ps event queue (see how to display SystemC events using peq above). Since the processor on dev/dev2 runs on SystemC simulation time, it is referred as the SystemC clock in this document.

The previous example of advancing SystemC simulation time can be achieved using cycle as well:

simics> r 1 cycles
simics> # using 'echo' below to illustrate that command return is a float value
simics> echo (ptime -t)
1e-12

Simics events can be posted on the SystemC clock using either the cycle-based or ps-based event queue. Below is an example showing how to post Simics events using the cycle-based event queue:

simics> bp.cycle.break 10
Breakpoint 1: dev2 will break at cycle 11
simics> peq -i
┌─────┬───────────┬───────────────────────────────┐
│Cycle│  Object   │          Description          │
├─────┼───────────┼───────────────────────────────┤
│   10│bp.cycle   │Break event on dev2 at cycle 11│
│  999│dev2.engine│Internal: stop                 │
└─────┴───────────┴───────────────────────────────┘

As shown in the above example, besides the user breakpoint set at cycle 10, the dev.engine processor is also posting events on the SystemC clock (as represented by the dev processor). The dev.engine is another Simics processor like the SystemC clock. It supports both a cycle-based event queue and a ps event queue. The only difference is how they are scheduled.

Figure 1. Simics schedules the processors

Figure 1 shows how Simics schedules the target processors in a single thread in default mode. Another mode (the free running mode) is described in section 6.4.3. All blue rectangles are Simics target processors which implement the execute interface. The thread calling the execute interface is a simulation thread managed by the Simics scheduler. clock, dev.engine and dev2.engine are three target processors scheduled directly by the Simics scheduler in a round-robin fashion. With temporal decoupling, each target processor runs multiple simulation steps or cycles (its time quantum) before handing over to the next processor.

The SystemC clock (dev and dev2) is not directly scheduled by the Simics scheduler. Instead it is indirectly scheduled via the adapter’s engine object (dev.engine) which is referred to as the Simics clock. This scheduler decoupling enables the SystemC clock to be driven both by the Simics clock as well as by the adapter. As described in section 6.1, the SystemC clock drives one SystemC kernel context.

In most cases, these two clocks are synced. But the SystemC clock could run ahead of the Simics clock if needed. For example when a synchronous Simics interface calls into the SystemC device, invoking the b_transport function which in turn invokes the wait function. In this case, SystemC time must run forward in order for the b_transport to return so that the Simics interface call can return. See Figure 2.

Figure 2. the SystemC clock could move ahead of the Simics clock

Besides the processors, from the object hierarchy, there are some other objects that handle time: vtime, vtime.cycles and vtime.ps. They provide the functionality used by both clocks. vtime is used to dispatch pending events and drive the cycle queues. vtime.cycles and vtime.ps contains the cycle-based event queue and the ps event queue respectively. These objects are considered internal and user should not interact with them.

simics> list-objects -show-port-objects substr = vtime -tree
┐
├ clock ┐
│       └ vtime ┐
│               ├ cycles 
│               └ ps 
├ dev ┐
│     ├ engine ┐
│     │        └ vtime ┐
│     │                ├ cycles 
│     │                └ ps 
│     └ vtime ┐
│             ├ cycles 
│             └ ps 
└ dev2 ┐
       ├ engine ┐
       │        └ vtime ┐
       │                ├ cycles 
       │                └ ps 
       └ vtime ┐
               ├ cycles 
               └ ps 

6.3 Performance tuning

The SystemC Library has been optimized to reduce the overhead when running SystemC models inside Simics. Normally, there is no need to do performance tuning. This section is targeting some advanced usage.

6.3.1 Disable DMI

In SystemC, using the TLM-2.0 Direct Memory Interface (DMI) offers potentially significant increases in simulation speed for simple memory accesses, since it bypasses the normal b_transport calls. An initiator can check the DMI allowed attribute of a TLM-2.0 transaction passed through the transport interface to see if the target supports it. Since an interconnect component is permitted to modify the address attribute and the extension pointers, the original transaction needs to be deep-copied for potential DMI purposes later on. This deep copy cost some performance. For a SystemC device that does not support DMI, the DMI check can be disabled to avoid this overhead.

For example, following command disables DMI check on the initiator implemented in the gasket:

simics> @conf.dev2.gasket_simple_device_target_socket.iface.sc_initiator_gasket.set_dmi(False)
None

6.3.2 Scaling

The SystemC simulation can be scaled. When the SystemC simulation runs very slowly, for example, when too many SystemC events are posted, the overall Simics simulation performance is affected. By scaling down the SystemC simulation, it allows the other processors to run faster. This can be achieved by setting the frequency attribute of dev.engine.vtime.

In the future, the frequency attribute can be changed directly from dev.engine.

simics> ptime -all
┌───────────┬─────┬──────┬────────┐
│ Processor │Steps│Cycles│Time (s)│
├───────────┼─────┼──────┼────────┤
│clock      │n/a  │     0│   0.000│
│dev        │n/a  │  1001│   0.000│
│dev2       │n/a  │     1│   0.000│
│dev2.engine│n/a  │     1│   0.000│
│dev.engine │n/a  │  1001│   0.000│
└───────────┴─────┴──────┴────────┘
simics> dev.engine.vtime->frequency = 1e11
simics> r 1000 cycles
simics> ptime -all
┌───────────┬─────┬──────┬────────┐
│ Processor │Steps│Cycles│Time (s)│
├───────────┼─────┼──────┼────────┤
│clock      │n/a  │  1000│   0.000│
│dev        │n/a  │  1101│   0.000│
│dev2       │n/a  │  1001│   0.000│
│dev2.engine│n/a  │  1001│   0.000│
│dev.engine │n/a  │  1101│   0.000│
└───────────┴─────┴──────┴────────┘

Here the frequency does not relate to how one cycle matches to ps but determines how many cycles the processor advances in one delta_tick. By dividing it with a factor of 10, dev and dev.engine run only 1101 – 1001 = 100 cycles compared with dev2 and dev2.engine who runs 1001 – 1 = 1000 cycles. This way, the rest of the simulation gets more wall clock time to run.

6.4 Performance scaling

SystemC Library supports the general Simics performance scaling feature. The feature is described in chapter "Scaling Simics" of Simics User's Guide. Here only the SystemC specific parts are covered.

Simics Accelerator has two different mechanisms that can operate alone or work together to improve performance. The first is Simics® Multimachine Accelerator which is based upon the cell concept. The other mechanism is Multicore Accelerator which can parallelize simulation even within cells.

6.4.1 Multimachine Accelerator for SystemC

Every Simics simulation is split into a set of cells and every processor belongs to a cell. By default, all cells run in parallel with each other. SystemC related processors from different Simics modules can reside in different cells and utilize the power of running in parallel. But SystemC related processors from the same Simics module cannot and by default end up in the same cell. This limitation comes from the Accellera SystemC kernel which is not thread safe (it contains global static variables/pointers). There is an automatic check for this requirement whenever the current SystemC related cell configuration changes.

simics> dev->cell
"default_cell0"
simics> dev.engine->cell
"default_cell0"
simics> @cell1=SIM_create_object('cell', 'cell1')
simics> dev->cell = cell1
[dev error] dev is not placed in the same cell with [dev.engine, ]. The simulation may run into errors or even segfault in multi-threading mode.

By default, all SystemC related processors from one Simics module reside in the same cell (default_cell0 in the above example). If processor dev is moved to a different cell (cell1), with dev.engine still in default_cell0, an error message is printed as show in the above example. Do not ignore this error, as the simulation will likely run into hard-to-debug type of errors or even segfaults in multi-threading mode.

The configuration is correct again when dev.engine is moved to cell1 as well. Since dev and dev2 belong to different Simics modules, they can reside in different cells.

simics> dev.engine->cell = cell1
simics> set-threading-mode serialized
simics> set-threading-mode
┌─────────────┬──────────┬───┬────────────┬─────────────┬───────────┐
│    cell     │   mode   │#td│time-quantum│max-time-span│min-latency│
├─────────────┼──────────┼───┼────────────┼─────────────┼───────────┤
│default_cell0│serialized│  1│      1.0 ns│     (1.0 ns)│    10.0 ms│
│cell1        │serialized│  1│    (1.0 ns)│     (1.0 ns)│    10.0 ms│
└─────────────┴──────────┴───┴────────────┴─────────────┴───────────┘
simics> list-thread-domains
┌─────────────┬──────┬───────────┐
│    Cell     │Domain│  Objects  │
├─────────────┼──────┼───────────┤
│default_cell0│    #0│clock      │
│             │      │dev2       │
│             │      │dev2.engine│
└─────────────┴──────┴───────────┘
┌─────┬──────┬──────────┐
│Cell │Domain│ Objects  │
├─────┼──────┼──────────┤
│cell1│    #0│dev       │
│     │      │dev.engine│
└─────┴──────┴──────────┘

6.4.2 Multicore Accelerator for SystemC

With Subsystem threading, multiple host threads can be used to simulate multiple processors within each cell concurrently provided that the processors do not share memory. SystemC Library supports this execution threading model. Just like the cell partition limitation, all SystemC processors from the same module must reside in the same thread domain. This is guaranteed by the adapter class in SystemC Library so the user can never break this invariant. When Multicore Accelerator is enabled, by default, all SystemC instances from same module are grouped within the same thread domain. SystemC instances from different modules can reside in different thread domains and will then benefit from parallel multi-threading.

Figure 3. cell and TD partitioning

Current thread domain partitioning can be checked with list-thread-domains command.

simics> @SIM_create_object('sample_tlm2_simple_device', 'dev3')
simics> set-threading-mode subsystem
simics> set-threading-mode
┌─────────────┬───────────────────┬───┬────────────┬─────────────┬───────────┐
│    cell     │       mode        │#td│time-quantum│max-time-span│min-latency│
├─────────────┼───────────────────┼───┼────────────┼─────────────┼───────────┤
│default_cell0│     subsystem     │  2│      1.0 ns│       1.0 ns│    10.0 ms│
│cell1        │multicore/subsystem│  1│    (1.0 ns)│     (1.0 ns)│    10.0 ms│
└─────────────┴───────────────────┴───┴────────────┴─────────────┴───────────┘
simics> list-thread-domains
┌─────────────┬──────┬───────────┐
│    Cell     │Domain│  Objects  │
├─────────────┼──────┼───────────┤
│default_cell0│    #0│clock      │
├─────────────┼──────┼───────────┤
│             │    #1│dev2       │
│             │      │dev2.engine│
│             │      │dev3       │
│             │      │dev3.engine│
└─────────────┴──────┴───────────┘
┌─────┬──────┬──────────┐
│Cell │Domain│ Objects  │
├─────┼──────┼──────────┤
│cell1│    #0│dev       │
│     │      │dev.engine│
└─────┴──────┴──────────┘

In the example above, there are three Simics processors scheduled by Simics inside default_cell0. The clock resides in the cell TD #0. Since dev2 and dev3 use the same Simics module, dev2.engine and dev3.engine reside in TD #1. This allows objects in one cell to run in parallel using multiple threads.

6.4.3 Free running

The SystemC simulation can also run in free running mode. In this mode, SystemC time synchronization is decoupled from the rest of Simics. The SystemC simulation is no longer scheduled in round robin with the other processors and clocks as show in Figure 1. This can be achieved by setting the run_continuously attribute of dev.engine.

Free running is only supported when the threading mode is subsystem or multicore.

simics> ptime -all
┌──────────┬─────┬──────┬────────┐
│Processor │Steps│Cycles│Time (s)│
├──────────┼─────┼──────┼────────┤
│clock     │n/a  │     0│   0.000│
│dev       │n/a  │     0│   0.000│
│dev.engine│n/a  │     0│   0.000│
└──────────┴─────┴──────┴────────┘
simics> psel dev
simics> r 10001 cycles
simics> ptime -all
┌──────────┬─────┬──────┬────────┐
│Processor │Steps│Cycles│Time (s)│
├──────────┼─────┼──────┼────────┤
│clock     │n/a  │ 10000│   0.000│
│dev       │n/a  │ 10001│   0.000│
│dev.engine│n/a  │ 10001│   0.000│
└──────────┴─────┴──────┴────────┘
simics> set-threading-mode subsystem
simics> dev.engine->run_continuously = TRUE
simics> r 10000 cycles
simics> ptime -all

In above example, the SystemC model inside the dev contains a heavy workload that will slow down the simulation. When it runs in the default mode, the clock and dev are coupled and advanced with same cycles. When switched to the free running mode, clock is decouple from dev and can move forward in a much faster pace. In the above example, when dev and dev.engine move 10000 cycles forward, clock has moved much further. The exact number of cycles for clock is not deterministic, following is one example of such a run.

┌──────────┬─────┬─────────────┬────────┐
│Processor │Steps│   Cycles    │Time (s)│
├──────────┼─────┼─────────────┼────────┤
│clock     │n/a  │4294999999998│   4.295│
│dev       │n/a  │        20001│   0.000│
│dev.engine│n/a  │        20001│   0.000│
└──────────┴─────┴─────────────┴────────┘
5 Overview of SystemC Features 7 Limitations