16 Scaling Simics 16.2 Multicore Accelerator
Simics User's Guide  /  IV Performance  /  16 Scaling Simics  / 

16.1 Intel® Simics® Multimachine Accelerator

The easiest way to parallelize a simulation is to use the Simics® Multimachine Accelerator feature. It requires that the models used in the simulation are marked as thread-safe. The rest of this section describes how to use Multimachine Accelerator.

With Multimachine Accelerator the simulation runs in a single Simics process: you control the entire simulation from a single point, and the entire simulation state gets saved in one checkpoint, just as when you run a single threaded simulation.

To use Multimachine Accelerator the configuration must be partitioned into simulation cells. Each cell contains a subset of the configuration objects in the simulation. The only communication allowed between cells is over links. A link transmits messages between objects with a latency measured in simulated time, for example an Ethernet cable.

Dividing the system into cells can be done automatically via the Simics's component system. This makes it easy to parallelize an existing model.

16.1.1 Multithread-Ready Models

Most models provided with Simics can run with Multimachine Accelerator enabled and are thus marked thread-safe. Loading modules that are not marked thread-safe will result in a warning message and Multimachine Accelerator will be disabled. Please contact your Simics provider if you are running a model that is not multithread-ready and you want to utilize Multimachine Accelerator.

If you developed your own models of devices, you should refer to the Model Builder User's Guide to learn how to make them multithread-compatible.

Whenever possible, all default components provided with Simics create simulation cells for use with Multimachine Accelerator. For example, instantiating two MPC8641-Simple boards in the same Simics session will create two cells, which can be scheduled on two simulation threads. The maximum possible parallelism is limited by the number of cells in a session (as well as the number of processor cores on your host, of course). You can list the cells instantiated in a configuration with the following command:

simics> list-objects -all class = cell

16.1.2 Enabling and Disabling Multimachine Accelerator

Simics® Multimachine Accelerator is enabled by default. It can be turned off using the command

simics> disable-multithreading

and on again with

simics> enable-multithreading

This command will also check that the configuration looks reasonable before switching on Multimachine Accelerator, and warn you if something is incorrect.

16.1.3 Controlling Cell Synchronization

To allow multi-cell simulation to perform well, Simics lets each thread run for a certain amount of virtual time on its own before it needs to resynchronize with the other cells. This time span is the synchronization latency. Because of the synchronization latency, Simics does not allow communication between objects of different cells. Even if all accesses were properly locked and performed in a thread-safe way, the objects would have no way to control at what time their access would be done in the other cell, and the simulation would stop being deterministic.

The solution is to communicate via link objects. Link objects ensure that messages send from one cell are delivered at the expected virtual time in the other cell, at the cost of a virtual time delay in the transmission. For links to send messages deterministically, the delay in transmission must be greater or equal to the synchronization latency. For this reason, the synchronization latency is often called the minimum latency for link communication.

The next two sections explain how to control the synchronization latency—and the link latencies—in multi-cell simulations.

The Simple Way

By default, Simics creates a single synchronization domain called default_sync_domain. Cells created later in the simulation will be attached to this synchronization domain, unless specified otherwise. Thus the synchronization latency in the simulation will be controlled by the min_latency attribute set in default_sync_domain.

The simplest way to control the synchronization latency is to use the set-min-latency command, which will immediately create a default synchronization domain if it does not exist, and set its min_latency attribute with the given valid latency. An error message will be printed out if the given latency value failed the validity check.

simics> set-min-latency 0.01
simics> list-objects class = sync_domain
┌───────────────────┬─────────────┐
│      Object       │    Class    │
├───────────────────┼─────────────┤
│default_sync_domain│<sync_domain>│
└───────────────────┴─────────────┘


simics> default_sync_domain->min_latency
0.01

One important thing to remember is that the time quantum in each multiprocessor cell must be less than half the minimum latency. In other words: sync_latency > 2 × time_quantum for every multiprocessor cell in the system. Simics will print an error if this condition is not respected.

Understanding Synchronization Domains

Synchronization latencies can be controlled in a much finer way. Synchronization domains can be organized in a hierarchy that allows different cells to be synchronized with different latencies. This organization is the foundation of the domain-based distribution system, described in chapter 16.3.

Let us build a networked system with two-tightly coupled machines communicating on a very fast network, associated with a control server that sends a command from time to time. The two machines require a low communication latency, while the communication latency between them and the server does not matter. Using a hierarchy of two domains allows all latency requirements to be fulfilled without sacrificing performance:

Top-domain (latency 1.0s)
 -> Server cell
 -> Sub-domain (latency 1e-6s)
     -> Machine0 cell
     -> Machine1 cell

In that configuration, the two machines can communicate with a latency of 1e-6 s while the communication latency between the machines and the server is 1 s. In practice, this allows Simics to give the server a 1 s synchronization window with the two machines, hence much less synchronization overhead and a better usage of parallel simulation.

More concretely, in Simics, the domains are setup in the following way (in Python):

@top_domain = pre_conf_object("top_domain", "sync_domain")
@top_domain.min_latency = 1.0

@sub_domain = pre_conf_object("sub_domain", "sync_domain")
@sub_domain.min_latency = 1e-6
@sub_domain.sync_domain = top_domain

@SIM_add_configuration([top_domain, sub_domain], None)

Cells created automatically can be assigned to a domain by using the domain attribute of the corresponding top-component. It is also possible to set a cell's sync_domain attribute when creating it manually.

Setting Latencies: the Complete Rules

Latencies must obey certain rules for the domain hierarchy to work properly:

16.1.4 Multimachine Accelerator and Scripting

Commands and script branches are never run multithreaded, thus parallelism can be safely ignored most of the time when scripting Simics. However, using Simics® Multimachine Accelerator has side-effects that may cause scripts to behave in a correct but indeterministic way. If we consider the following script, in a configuration consisting of two cells, cell0 and cell1:

cell0_console.break "foo"
c
cell1_console.input "bar"

Even with cell0 and cell1 running in parallel, the simulation will stop properly when the text breakpoint in cell0 is triggered. However, cell1 is not at a deterministic point in time: the only thing known about it is that it is within a certain window of virtual time in which it is allowed to drift without needing to re-synchronize with cell0, as explained in the previous section. So running this script twice in a row may not produce exactly the same results.

In many cases, it does not matter and the scripts will work fine. If perfect determinism is required, it is possible to save a checkpoint and run the sensitive part of the simulation single-threaded.

One aspect of Multimachine Accelerator that affects scripting directly is Python scripting. Hap handlers are run in the thread where they are triggered, which means that the same handler can run in parallel on different host processors. If the handler uses global state, it must use proper locks to access it. In general, this is not a problem since most haps are triggered for a specific object, so their handlers will only run in the thread where this object is scheduled. Some haps are triggered globally, however, and care must be taken when responding to them.

Python scripts are run with the global Python lock taken, so Python scripts never really run in parallel. However, the Python interpreter will schedule Python threads as it sees fit, so Python code that may run in several threads (device or extension code, hap handlers) should not assume that it has full control of the execution. The Python lock is also released every time a Simics API functions is called (including implicit calls like reading an attribute value).

When running Python scripts in a simulation thread, the script should not access state that is in a different cell, since this cell might be running on another host processor. When in need to access the whole simulation state, a callback function can be scheduled with SIM_run_alone() (this is currently how script branches and commands are handled).

Finally, running commands in the simulation thread is not allowed, as the CLI parser is not thread-safe and might cause unexpected problems. Commands must be scheduled with SIM_run_alone(). It is also possible to rewrite scripts to access directly objects and attributes instead of using the commands directly.

16.1.5 Dynamic Load Balancing

Simics uses dynamic load balancing to distribute the simulation workload across the available hardware resources (host threads). The dynamic load balancer optimizes the mapping of simulation threads onto available host resources.

When Simics is running with Simics® Multimachine Accelerator, CPUs belonging to the same cell can not be simulated concurrently by separate host threads. The available concurrency in this mode of operation is between CPUs belonging to different cells. Using as many cells as possible can potentially improve performance since this increases the parallelism of the simulation. Having many cells also makes it easier for the dynamic load balancer to keep all host threads fully loaded.

When Simics is running with Multicore Accelerator, CPUs belonging to the same cell can be simulated concurrently. Note that Multimachine Accelerator is implied by Multicore Accelerator.

Simics uses a non hierarchical scheduling algorithm based on simulated time and available work. By default, Simics spawns at most as many threads as there are host threads, but it is possible to limit this number using the set-thread-limit command. Setting a thread limit may be useful if the physical machine is shared by multiple users.

Simics does not interact with the host operating system with regards to scheduling. The details of the scheduling are internal and there exists no API for controlling it.

16 Scaling Simics 16.2 Multicore Accelerator