42 Inspecting and Controlling the Virtual System 44 Connecting to the External World

Model Builder User's Guide / VII Extending Simics /

43 Memory Tracing and Timing

Simics provides extensive support for tracing and modifying memory transactions coming out of a processor. This chapter aims at describing how to access memory transactions programmatically to write extensions such as trace tools, timing models or cache simulation.

43.1 Tracing Instruction Execution

All processor models in Simics offer an interface that provides a registered listener with all executed instructions. This is used by the trace module, among others, to produce its execution trace.

Registering a function to listen to a trace interface is simple. Assuming that cpu is the traced processor, the following code will register the function trace_listener() to be called for each instruction executed by cpu:

void *data_for_trace_listener = some_data;
const exec_trace_interface_t *iface = 
        SIM_c_get_interface(cpu, EXEC_TRACE_INTERFACE);
iface->register_tracer(cpu, trace_listener, data_for_trace_listener);

Turning off tracing is just as simple:

void *data_for_trace_listener = some_data;
const exec_trace_interface_t *iface =
        SIM_c_get_interface(cpu, EXEC_TRACE_INTERFACE);
iface->unregister_tracer(cpu, trace_listener, data_for_trace_listener);

The listener function itself is expected to be defined as an instruction_trace_callback_t, defined as follow:

typedef void (*instruction_trace_callback_t)(lang_void *tracer_data,
                                             conf_object_t *cpu,
                                             linear_address_t la,
                                             logical_address_t va,
                                             physical_address_t pa,
                                             byte_string_t opcode);

It takes the following arguments:

tracer_data
The exact pointer passed as tracer_data when registering the listener function with register_tracker().
cpu
The cpu executing the instruction.
la
The linear address of the instruction for x86 processors, and nothing for all others.
va
The virtual address of the instruction as contained in the program counter, or instruction pointer register.
pa
The physical address of the instruction after doing a MMU lookup.
opcode
The bytes composing the instruction.

Tracer functions are not expected to return any value to Simics.

The trace module is provided along with Simics, both as a binary and source code. It is an excellent starting point for developing new tracing modules.

43.2 Tracing Memory Transactions

This section expects the reader to be familiar with memory spaces and how memory accesses are directed to the correct device or memory. More information on memory spaces is available in chapter 23.

43.2.1 Observing Memory Transactions

Memory-spaces provide a memory hierarchy interface for observing and modifying memory transactions passing through them. This interface is in fact composed of two different interfaces acting at different phases of a memory transaction execution:

The timing_model interface provides access to a transaction before it has been executed (i.e., it has just arrived at the memory-space).

To connect an object to the timing model interface, just set the timing_model attribute of the corresponding memory-space with the value of the object you want to connect:
```
simics> @conf.phys_mem0.timing_model = conf.listening_object
```
The timing model interface can also be used to change the timing and the execution of a memory transaction, as well as to modify the value of a store going to memory. This is described in more detail in section 43.3.2.
The snoop_memory interface provides access to a transaction after it has been executed.

Connecting an object to the snoop memory interface is done in the same way as for the timing model interface:
```
simics> @conf.phys_mem0.snoop_device = conf.listening_object
```
The trace module, for example, automatically connects itself (or rather one of its objects) to this interface when a trace is started. The advantage of using this interface is that the value of load operations is accessible, since the operation has already been performed.

The snoop memory interface can also be used to modify the value of a load operation. This is described in more detail in section 43.3.2.

Both interfaces can be used simultaneously, even by the same object. This property is used by the trace module, which is in fact connected both to the timing_model and the snoop_memory interfaces. The reason for this double connection is explained in section 43.2.4.

Information about implementing these two interfaces is available in section 43.2.6 and section 43.3.2.

43.2.2 Observing Instruction Fetches

For performance reasons, instruction fetches are not sent to the memory hierarchy by default.

Instruction fetches can be activated for each processor with the <cpu>.instruction-fetch-mode command. It can take several values:

no-instruction-fetch
No instruction fetches are sent to the memory hierarchy interface.
instruction-cache-access-trace
An instruction fetch is sent every time a different cache line is accessed by the processor. The size of the cache line is set by the processor attribute instruction_fetch_line_size.

This option is meant to be used for cache simulation where successive accesses to the same cache line do not modify the cache state.
instruction-fetch-trace
All instruction fetches are sent to the memory hierarchy interface. This option is often implemented as instruction-cache-access-trace with a line size equal to the size of one instruction.

This option is meant to provide a complete trace of fetch transactions.

Finally, instruction fetch transactions are not generated by all processor models. The section 43.4 contains a summary of which features are available on which models.

43.2.3 Observing Page-table Accesses

For performance reasons, page-table reads are not sent to the memory hierarchy by default on some CPU models. For PPC models with classic MMU, you have to set the mmu_mode attribute to get page-table reads. See the attribute description in the Reference Manual for more information.

43.2.4 Simulator Translation Cache (STC)

In order to improve the speed of the simulation, Simics does not perform all accesses through the memory spaces. The Simulator Translation Caches (STCs) try to serve most memory operations directly by caching relevant information. In particular, an STC is intended to contain the following:

The current logical-to-physical translation for the address;
A count of number of accesses to the address.

The general idea is that the STC will contain information about "harmless" memory addresses, i.e., addresses where an access would not cause any device state change or side-effect. A particular memory address is mapped by the STC only if:

The given logical-to-physical mapping is valid.
An access would not affect the MMU (TLB) state.
There are no breakpoints, callbacks, etc associated with the address.

Memory transactions targeting devices are also mapped by the STC.

The contents of the STCs can be flushed at any time, so models using them to improve speed can not rely on a specific address being cached. They can however let the STCs cache addresses when further accesses to these addresses do not change the state of the model (this is used by cache simulation with g-cache; see the Cache Simulation chapter in the Analyzer User's Guide).

The STCs are activated by default. They can be turned on or off at the command prompt, using the stc-enable/disable functions. An object connected to the timing_model interface can also mark a memory transaction so that it will not be cached by the STCs. For example, the trace module uses that method to ensure that no memory transaction will be cached, so that the trace will be complete.

Note that since information is inserted into the STCs when transactions are executed, only objects connected to the timing model interface can influence the STCs' behavior. The section 43.3 provides a complete description of the changes authorized on a memory transaction when using the memory hierarchy interface.

43.2.5 Summary of Simics Memory System

This diagram puts together the concepts introduced in chapter 23. It describes the path followed by a processor transaction through Simics memory system.

Figure 27. Transaction Path through Simics Memory System

The CPU executes a load instruction.
A memory transaction is created.
If the address is in the STC, the data is read and returned to the CPU using the cached information.
If the address is not in the STC, the transaction is passed along to the CPU memory-space.
If a timing-model is connected to the memory-space, it receives the transaction.
1. If the timing model returns a non-zero stalling time, the processor is stalled and the transaction will be reissued when the stall time is finished (see also section 43.3.2
2. If the timing model return a zero stall time, the memory-space is free to execute the transaction.
The memory-space determines the target object (in this example, a RAM object).
The RAM object receives the transactions and executes it.
If possible, the transaction is inserted in the STC.
If a snoop-memory is connected to the memory-space, it receives the transaction.
The transaction is returned to the CPU with the correct data.

Store operations works in the same way, but no data is returned to the CPU.

Simics's memory system is more complex than what is presented here, but from the point of view of a user timing-model or snoop-memory, this diagram explains correctly at which point the main events happen.

43.2.6 Implementing the Interface

The timing_model and snoop_memory contains only one function called operate():

static cycles_t
my_timing_model_operate(conf_object_t         *mem_hier,
                        conf_object_t         *mem_space,
                        map_list_t            *map_list,
                        generic_transaction_t *mem_op);

The four arguments are:

conf_object_t * mem_hier
This points to the timing model or snooper itself; it can be safely cast to the actual type of the listening object.
conf_object_t * mem_space
The mem_space argument is the memory space object that the timing model or snooper is connected to.
map_list_t * map_list
The map_list argument describes the entry in the map attribute list that this memory operation matched (as returned from a call to the space_lookup function in the memory_space interface).
generic_transaction_t * mem_op
This contains information about the current memory operation. It is manipulated using accessor functions defined in the API Reference Manual.

The return value is the number of cycles the transaction should stall before being executed (or reissued). Returning 0 disables all stalling.

43.2.7 Chaining Timing Models

Sometimes it is desirable to chain timing models, e.g., if you are implementing a multi-level cache model and want to model each level of the cache as an individual class. To do this, the operate() function must call the corresponding functions of the lower levels (a lower or next level cache means a cache further away from the CPU, closer to the actual memory).

The g-cache source code included with Simics is an example of how to do this. Whenever there is a miss in the cache, the g-cache object creates a new memory operation and calls the operate() method of the timing_model interface from the next level cache specified by the timing_model attribute.

43.3 Modifying Memory Transactions

43.3.1 Stalling Transactions

The precision of the simulation can be improved by adding timing controls for memory operations: memory-related instructions are no longer atomic operations, but actually take virtual time to execute.

Stalling is controlled via the timing_model interface. The interface simply allows the implementer to return a non-zero number of cycles to stall before the transaction is allowed to progress. During this time, the processor is given back control and lets time advance until the transaction's stall time has elapsed. The transaction is then reissued to the memory system.

Stalling a transaction is not always possible, depending on the processor model you are using in the simulation. The section 43.4 explains what is available for each model.

Cache models, described in the Analyzer User's Guide, are good examples of complex timing models. Finally, the Understanding Simics Timing application note goes into more details in the exact way Simics handles timing and multiprocessor systems.

43.3.2 Changing the Behavior of a Memory Transaction

43.3.2.1 In a Timing Model

An object listening on the timing_model interface is presented with memory transactions before they have been executed, and may therefore change both their semantics and their timing. Here is a list of changes that a timing model is authorized to perform:

SIM_mem_op_may_stall
Before attempting to stall a memory transaction, the timing model should call this function to check if a transaction is stallable. Typically only the first memory operation of a complex instruction is stallable. Page-crossing transactions and page-table walks are unstallable too. The timing model could accumulate stall time for these and stall after the current instruction has completed instead, see the SIM_stall_cycle function.
SIM_set_mem_op_reissue
If set, and the memory hierarchy is stalling, the memory transaction will be sent again to the timing model after the stalling was performed. If not set, the transaction will be executed without further calls to the timing model.
SIM_mem_op_ensure_future_visibility
If called, the transaction will not be cached in the STCs, ensuring that the next access to the same memory area will be sent to the timing model. Otherwise the next access to that area of memory may not be sent to the memory hierarchy at all. The address granularity for this purpose is given by the min_cacheline_size processor attribute, and is always a power of 2 no greater than the MMU page size.
Memory Store Value
Since the memory operation has not been executed yet, it is possible to change the value of a store operation in the timing model. However, it is important to restore the original value in the snoop device once the operation has been performed. The SIM_get_mem_op_value() and SIM_set_mem_op_value() functions (and their variants) can be used to get and set the value of a given memory transaction.
Stalling the Memory Transaction
When an object attached to the timing_model interface receives a memory-transaction, it is allowed to modify the timing of the transaction by returning a stall time (as a number of processor cycles). This behavior is a key to modeling caches and memory hierarchies in Simics.

By returning a non-zero number of cycles from the operate() function, Simics will stall the memory transaction that was passed as argument for that amount of time.

If a zero stall time is returned, some additional operations are allowed:

SIM_set_mem_op_exception
If set to an exception, the transaction will be interrupted and an exception will be taken. Default is no exception (Sim_PE_No_Exception).
SIM_set_mem_op_user_data
This void * pointer is not touched by Simics during the memory operation execution. It can be used to pass information from a timing model to a snoop device.

A transaction may go through several memory-spaces in hierarchical order before being executed. Each of these memory-spaces may have a timing-model connected to them. However, if the transaction is stalled by one timing model, other timing models connected to other memory spaces may see the transaction being reissued before it is executed. It is not supported to return a non zero stall time from these other timing models, that is, a transaction may be stalled by at most one timing model.

43.3.2.2 In a Snoop Device

An object listening on the snoop_memory interface is presented with memory transactions after they have completed. It cannot influence the execution of the operation and it may not return a non-zero value for stalling, but it is allowed to modify the value of the memory operation. Since the data returned by read operations are available at this stage, the snoop device is also an ideal place to trace memory transactions. Note that if you want to modify the properties of the memory transaction, such as future visibility and reissue, you have to do that in a timing_model interface operate function.

The following actions are allowed:

Change a Memory Store Value
If the value of a memory store has been changed in the timing model, it should be reset in the snoop device.
Change a Memory Load Value
Since the operation has been performed, the snoop device is the right place to change the value of a load. This is done with the usual SIM_get/set_mem_op_value() functions.

43.4 Memory Features Availability

All types of cache modeling features are not supported by all processor types. The instrumentation API need to be supported in order to do cache modeling for a specific processor.

Currently ARC, ARM, MIPS, PPC, X86 and Xtensa target architectures support instrumentation.

42 Inspecting and Controlling the Virtual System 44 Connecting to the External World