5.1 Introduction to Cache Simulation with Simics

By default, Simics does not model any cache system. It uses its own memory system to achieve high speed simulation, and modeling a hardware cache model would only slow it down. Simics exposes however, by the instrumentation API, the flow of memory operations coming from the processor, and thus allows users to write tracing tools and collect statistics on the memory behavior of their simulations.

Additionally, Simics lets user-written timing models control how long memory transaction takes. Stalling the execution, as it is called in Simics, helps improving the timing accuracy of the simulation, as compared to a real system. Historically, the timing model/snoop device interfaces has been use in Simics to model caches. Since those interfaces block memory accesses from being looked up in a fast address translation table in Simics, the execution speed has suffered considerably. From Simics 6, the instrumentation API has been available as a core feature and this has improved the performance on cache modeling. The instrumentation API works with the fast table lookup and makes Simics very suitable for various types of cache simulation:

Cache Profiling
The goal is to gather information about the cache behavior of a system or an application, e.g., explore different prefetching algorithms. Unless the application runs on multiprocessors, takes a lot of interrupts or runs a lot of system-level code, the timing of the memory operations is often irrelevant, thus no stalling is necessary.

Note that this type of simulation does not change the execution of the target program. It could be done by using Simics as a simple memory operation trace generator, and then computing the cache state evolution afterwards or during the simulation.
Cache Timing
The goal is to study the timing behavior of the transactions, in which case a transaction to memory should take much more time than, for example, a transaction to an L1 cache. This is useful when studying interactions between several CPUs, or to estimate the number of cycles per instruction (CPI) of an application. Simics models can be used for such a simulation.

This type of simulation modifies the execution, since interrupts and multi-processor interaction will be influenced by the timing provided by the cache model. However, unless the target program is not written properly, the execution will always be correct, although different from the execution obtained without any cache model.

There are two types of possible modeling techniques for this class. Either a mechanistic model, where you try to implement as many details as possible in the cache system, possibly with slower simulation speed, or a statistical model, where you try to estimate the time from a set of events that happens in the cache system. This can be achieved by using regression models calculated from more detailed simulation models or from real hardware. In this case the simulation speed may be better since you may cut the number of events that need be collected, but the regression models can still give you a good approximation. Of course both of these methods, has trade offs between simulation speed and accuracy. They can also be combined to estimate the time of different parts of the memory hierarchy with either a mechanistic approach or a statistical.
Cache Content Simulation
It is possible to change Simics coherency model by allowing a cache model to contain data that is different from the contents of the memory. Such a model needs to properly handle the memory transactions as it must be able to change the values of loads and stores.

Note that this kind of simulation is difficult to do and requires a well-written, bug-free cache model, since it can prevent the target program from executing properly. The instrumentation API supports this kind of modeling since it allows memory accesses to be redirected from memory into the cache model data storage for each cache line.

Simics comes with a cache model called simple_cache, which allows cached profiling and cache timing: it handles one transaction at a time in a flat way: all needed operations (copy-back, fetch, etc.) are performed in order and at once. The cache returns the sum of the stall times reported for each operation. This cache should not be considered as a final solution for cache modeling, rather it should be seen as starting point with some of the basic cache concepts modeled. The source code is available so it should be possible to extend it with new features (the module is called simple-cache-tool).

Before going further and describing the simple cache in more details, a few things should be mentioned: