Processor Model Integration Guide

1 Introduction

This document describes the API used to implement models of processors in Simics. Simics will handle the simulation framework, including the user interfaces, standardized APIs, plug-in administration, and commonly used utilities and resources. The processor models use and implement the standardized APIs, which allows mixing simulation models and creating a uniform interface to the user.

The processor API is the part of the API that allows a model to get control over simulator scheduling and includes processor specific interfaces to enable standard functions and commands for the processor models as well as efficient memory interfaces to allow fast simulation. The most important interfaces are covered in this document, for the other interfaces please refer to the relevant Simics reference Manual.

Depending on use, the processors should implement and use different parts of the interfaces described.

A number of areas that a processor model may support can be identified as follows:

Simulation scheduling
There must be basic support for advancing the simulation. This support must be present for any fast functional processor model. Cycle driven models may be designed using the built-in clock class that handles the scheduling interaction with the framework.
Direct Access to Memory
Interfaces used to cache pointers to memory, allowing accesses to be carried out without involving the API.
Inspection and control of processor models
As processors are very visible to the user of the simulator, there are several standardized ways for the user to access and control a processor. To support these, the processor models in turn have to implement interfaces and trigger haps.
Breakpoints
The processor model is fully responsible for triggering virtual address breakpoints. For physical address breakpoints, the processor model needs to check if ranges mapped with direct memory access contain breakpoints and then either request more fine grained regions of memory without breakpoints or fall back to explicit memory operations, where the explicit memory operations will handle breakpoints if they happen to match.
Simulation of models
Covers the mechanisms used to communicate between processors and devices.

Like any other model, processor models should be written to integrate well into the Simics environment. See Simics Model Development Checklist for a good checklist to follow when designing and implementing the model. See also the Simics User's Guide for more information about the features discussed in this document.

How to configure a system and create start scripts is described in the Model Builder User's Guide. Example system configurations can be found in [simics]/targets/cosim/.

2 Processor Scheduling Examples

This chapter describes how to interface processors to Simics depending on the requirements on the simulation.

A processor model defines one or more classes to integrate with the Simics environment. Simics classes have attributes and implement interfaces. Even though Simics classes can be written in DML, C, C++, or Python, only C and C++ is currently supported for writing processor models. See the Configuration and Checkpointing chapter in the Simics User's Guide for more information about classes in Simics and how to instantiate them into Simics objects.

A key issue for fast simulation is how it is scheduled. A scheduler is an object that implements the execute interface. Such an object advances virtual time for itself and for the objects associated with it. A processor that also is a scheduler can thus simulate a sequence of instructions that occur at different points in virtual time without having to return back to the framework.

Processors can be scheduled in any of the following ways:

Processors without custom scheduling
Processors models do not have any customized scheduling, but rather use event callbacks to perform simulation. This setup is suitable for clock cycle driven simulation as it performs simulation in strict virtual time order. See figure 1.
Scheduling a set of processors
The processors are scheduled together, and a single object implements the execute and cycle interfaces used by the framework to schedule the set of processors, see figure 2. This type of scheduling allows the simulation to run faster since it requires less synchronization and fewer context switches. The drawback is that both the cycle interface and the optional step interface are hard to implement.
Scheduling each processor
The execute and cycle interfaces are here implemented in each thread/core objects instead of in a central scheduler object, see figure 3. This is the model used by all Simics standard processors models. The multiple clocks in this type of setup are forced by Simics to stay within a defined window of virtual time. As the clocks are driven individually, this leads to temporal decoupling between the processors.

The documentation for the execute interface describes how the Simics core controls the simulation, see Section 3.1.

2.1 Processor Models without Custom Scheduling

Cycle driven models are typically slow models. The setup only has one clock object which handles timing for all objects in the setup. All objects run in lock step mode where each object run one cycle at a time. Figure 1 shows an example setup of event driven processors. The type of simulation where this setup should be used are for example clock cycle accurate HDL simulator models.

Figure 1. Event driven processors in Simics.

The objects in figure 1 are:

Simics core
The Simics core providing simulation services through the Simics API.
clock
The Simics clock object implementing the execute interface and the cycle interface. The clock handles all timing in the system.
thread/core
The thread/core objects are placeholders for each simulated thread/core. They implement standard processor interfaces such as the processor_info_v2, int_register, and exception interfaces. These interfaces are used by standard Simics commands for showing time, events, register contents, instruction disassemble, etc.
memory-space
Generic Simics memory-space objects handling memory accesses.
device
Device models for the system.

2.2 Scheduling a Set of Processors

In the setup shown in figure 2, the processors export one implementation of the time event queue instead having a Simics object do that. The result is that the processor model has better control of when new events are added and also that the processor model triggers the events. The Simics core instructs the scheduler to run the simulation forward through the execute interface. All objects in a cell implementing the execute interface will be kept within a time window defined by the time_quantum attribute of the cell. Having a quantum that is significantly larger than a single cycle (as in a cycle driven model) allows for faster simulation since optimizations such as binary translation and hypersimulation can then be employed.

Figure 2. Scheduling a set of processors.

The objects in figure 2 are:

Simics core
The Simics core providing simulation services through the Simics API.
scheduler
Implements the execute and cycle interfaces. Simics core and devices will use this object for posting events and for driving the simulation. The scheduler object may correspond to something particular in hardware, such as a processors with multiple cores.
thread/core
The thread/core objects are placeholders for each simulated thread/core. They implement standard processor interfaces for example the processor_info_v2 interface. These interfaces are used by standard Simics commands for showing time, events, register contents, instruction disassemble, etc.
memory-space
Generic Simics memory-space objects handling memory accesses.
device
Device models for the system.

The processor model has control of time and can optimize for running long quanta. The Simics core notifies the execute object that it can start to execute and it is up to the implementation to control how everything is scheduled. The threads/cores may be simulated in parallel with multiple threads. Note that care must be taken if typical Simics processor model features such as deterministic execution and reverse execution are to be supported.

An example system with this setup can be found in [simics]/targets/cosim/queues-in-cosimulator.simics. The source code for the example can be found in [simics]/src/extensions/sample-risc.

2.3 Scheduling Each Processor

In the setup shown in figure 3, each processor implements the scheduling interfaces. This is a special case of the previous mechanism where each set of processors only include a single processor. The Simics core instructs the scheduler to run the simulation forward through the execute interface. All objects in a cell implementing the execute interface will be kept within a time window defined by the time_quantum attribute of the cell. Having a quantum that is significantly larger than a single cycle (as in a cycle driven model) allows for faster simulation since optimizations such as binary translation and hypersimulation can then be employed.

Figure 3. Scheduling Each Processor.

The objects in figure 3 are:

Simics core
The Simics core providing simulation services through the Simics API.
thread/core
The thread/core objects implement both the execution related interfaces (execute and cycle), and the interfaces used to control and inspect the individual cores (processor_info_v2, step, int_register, etc).
memory-space
Generic Simics memory-space objects handling memory accesses.
device
Device models for the system.

This model is similar to the previous model with the difference that each thread/core defines its own time domain. This also means that each object will be driven individually by the Simics core. This is the type of model used by all standard processor models in Simics.

3 Simulation Scheduling

This chapter describes the interfaces used for customized simulation scheduling.

3.1 Execute Interface

The execute interface is implemented by objects that drive a simulation, which is often processor models. The object does not have to implement cycle or step.

An object implementing the execute interface must be coupled with one object implementing the cycle interface. It can be the same object that implements the cycle interface.

The run function is called when the simulator starts or restarts the execution.

By default the Simics scheduler will assume that the object being called in with the execute interface also implements the corresponding processor_info and step interfaces. If this assumption is incorrect, the implementation of the run function is responsible for maintaining the simulators view of the current objects implementing the processor_info and step interfaces. It does that by using the appropriate functions in the cell_inspection interface. The current objects must always be correctly set when either the run function returns, when any API method is called, or when any other object is called through an interface. Several Simics features, such as CLI commands, device logging, and hap handling make use of the current objects.

To handle asynchronous events, and thus allow for reasonable interactive performance, the implementor of execute needs to either make sure that run returns after not having run for too long, or preferably regularly call the VT_check_async_events method. In the Simics library CPU models, VT_check_async_events is called after servicing events from the cycle or step interfaces.

The simulator core will call stop when it detects a condition that should interrupt the simulation. The callee should stop as soon as possible when in a stable state, typically when the current executing instruction is finished after getting a request to stop. In some cases the callee might receive multiple stop requests in a rapid sequence. Conditions leading to a stop request include SIM_break_simulation being called from a device or hap-handler, breakpoint triggered, low-memory situations, the user interrupting the simulation with Ctrl-C, and the Simics core halting the object when it is at the end of the allowed time window in temporal decoupled simulation. It is forbidden to do anything in the stop function that can lead to a new stop request, this includes posting events, printing SIM_log-messages, etc. Before returning from the run method, the VT_stop_event_processing function should be called. The requirement to call VT_stop_event_processing is likely to be lifted in future versions of Simics.

The switch_in function is called whenever the execute object is about to gain control of the simulation from some other execute object in the cell. Similarly, switch_out is invoked before control is relinquished. It should be noted that these functions are called in a deterministic manner which is not true for run.

The switch_in and switch_out functions are not called at simulation start (or checkpoint load), in general.

SIM_INTERFACE(execute) {
        void (*run)(conf_object_t *obj);
        void (*stop)(conf_object_t *obj);

        void (*switch_in)(conf_object_t *obj);
        void (*switch_out)(conf_object_t *obj);
};

#define EXECUTE_INTERFACE "execute"

In figure 4 there is an example call-graph of Simics core and processor scheduler communication. Simics posts events on the processor's event queues to be triggered, for instance, at the end of the allowed time window. Simics then calls the run function in the execute interface and the processor starts executing instructions. The processor which implements the event queues will after a while execute an event callback for the event posted by the Simics core in one of the event queues. The Simics core handles the event which will trigger a stop request via the stop function. Note that not all events will trigger a stop request. Having received stop, the processor model must stop the simulation as soon as possible and return to the Simics core. This is typically done as soon as the currently executing instruction is committed.

Figure 4. An example of Simics core and scheduler communication.

3.2 Cell Inspection Interface

The cell_inspection interface is implemented by objects of the cell class. It is used by objects implementing the execute interface to update the currently executing objects when control is transferred outside of the execute object.

The current object implementing processor_info is set with set_current_processor_obj(). Similarly, the current object implementing step is set with set_current_step_obj().

SIM_INTERFACE(cell_inspection) {
        void (*set_current_processor_obj)(conf_object_t *obj,
                                          conf_object_t *cpu_obj);
        void (*set_current_step_obj)(conf_object_t *obj,
                                     conf_object_t *step_obj);
};

#define CELL_INSPECTION_INTERFACE "cell_inspection"

3.3 Cycle Interface

Interface Methods
The cycle interface is typically implemented by processors, but can be implemented by other objects as well. Its purpose is to handle events based on time. The cycle queue has a cycle as the smallest time unit. The cycle queue also has an associated frequency which makes it possible to define events based on seconds or picoseconds.

The get_frequency function returns the frequency in Hertz for the queue. Most objects implementing cycle also have a notification mechanism for frequency changes through the simple_dispatcher interface in the cpu_frequency port. It is recommended that such a notification mechanism is used to get updates rather than polling with get_frequency.

The current number of cycles executed by the queue is returned by get_cycle_count. Time elapsed since the queue was created is returned by get_time (in seconds) and get_time_in_ps (in picoseconds); this will be equal to the value returned by get_cycle_count divided by the value returned by get_frequency if the frequency has been constant since time zero.

The cycles_delta function returns the highest number of cycles obj can run before it passes the absolute local time when, assuming no frequency change occurs in the meantime. Note that cycles_delta can raise an exception if when was too far ahead in the future. The cycles_delta_from_ps function performs the same function, for an absolute local time expressed in picoseconds.

The post_cycle function will schedule an event that will occur after cycles counted from local current time at queue. The post_time function is similar but takes seconds as argument, while post_time_in_ps takes a number of picoseconds. The arguments cycles, seconds and picoseconds must be nonnegative.

An event previously posted can be removed by calling cancel. The cancel function takes a function pred as argument which is called when a matching event is found. The event is only removed if pred returns 1.

The find_next_cycle, find_next_time and find_next_time_as_ps functions take the same arguments as cancel but only return the number of cycles, seconds or picoseconds before the event occur. The evclass is the event class, obj is the object posting the event, and user_data is pointer to data used as a parameter when calling the callback function defined in the evclass. If no matching event was found, find_next_cycle and find_next_time return −1; find_next_time_as_ps returns ILLEGAL_DURATION.

The events method returns a list of all pending events in expiration order. Each element is a four-element list containing the event object, the event class name, the expiration time counted in cycles as an integer and the event description as given by the event class describe method, or NIL for events whose event class does not define that method.

What happens to already posted events when a frequency change occurs is implementation dependent. Simics processors will scale the cycle queue to keep the time left before triggering events equal across the frequency change. Note that the new times will be rounded to entire cycles during the scaling, so approximations may occur when switching between high and low frequencies.

Implementation
It is implementation dependent how the queue is implemented, whether cycles or seconds are used as underlying time unit, and what happens when the frequency is changed.

Objects implementing the cycle interface are usually meant to be scheduled by Simics itself. For this to happen, a number of conditions must be fulfilled:

SIM_INTERFACE(cycle) {

        cycles_t (*get_cycle_count)(conf_object_t *queue);
        double (*get_time)(conf_object_t *queue);
        cycles_t (*cycles_delta)(conf_object_t *NOTNULL clock,
                                 double when);

        uint64 (*get_frequency)(conf_object_t *queue);

        void (*post_cycle)(
                conf_object_t *NOTNULL queue,
                event_class_t *NOTNULL evclass,
                conf_object_t *NOTNULL obj,
                cycles_t cycles,
                lang_void *user_data);
        void (*post_time)(
                conf_object_t *NOTNULL queue,
                event_class_t *NOTNULL evclass,
                conf_object_t *NOTNULL obj,
                double seconds,
                lang_void *user_data);

        void (*cancel)(
                conf_object_t *NOTNULL queue,
                event_class_t *NOTNULL evclass,
                conf_object_t *NOTNULL obj,
                int (*pred)(lang_void *data, lang_void *match_data),
                lang_void *match_data);

        cycles_t (*find_next_cycle)(
                conf_object_t *NOTNULL queue,
                event_class_t *NOTNULL evclass,
                conf_object_t *NOTNULL obj,
                int (*pred)(lang_void *data, lang_void *match_data),
                lang_void *match_data);

        double (*find_next_time)(
                conf_object_t *NOTNULL queue,
                event_class_t *NOTNULL evclass,
                conf_object_t *NOTNULL obj,
                int (*pred)(lang_void *data, lang_void *match_data),
                lang_void *match_data);

        attr_value_t (*events)(conf_object_t *NOTNULL obj);

        /* new picoseconds based functions */
        local_time_t (*get_time_in_ps)(conf_object_t *queue);
        cycles_t (*cycles_delta_from_ps)(conf_object_t *NOTNULL clock,
                                         local_time_t when);
        void (*post_time_in_ps)(
                conf_object_t *NOTNULL queue,
                event_class_t *NOTNULL evclass,
                conf_object_t *NOTNULL obj,
                duration_t picoseconds,
                lang_void *user_data);

        duration_t (*find_next_time_in_ps)(
                conf_object_t *NOTNULL queue,
                event_class_t *NOTNULL evclass,
                conf_object_t *NOTNULL obj,
                int (*pred)(lang_void *data, lang_void *match_data),
                lang_void *match_data);
};

#define CYCLE_INTERFACE "cycle"

3.4 Co-Execute Interface

Experimental, may change without notice.

SIM_INTERFACE(co_execute) {
        void (*start_thread)(conf_object_t *NOTNULL obj,
                             void (*entry)(lang_void *arg), lang_void *arg);
        void (*yield)(conf_object_t *NOTNULL obj);
};
#define CO_EXECUTE_INTERFACE "co_execute"

3.5 Synchronous Mode Interface

Experimental, may change without notice.

SIM_INTERFACE(synchronous_mode) {
        int (*enter)(conf_object_t *NOTNULL obj);
        int (*leave)(conf_object_t *NOTNULL obj);
};
#define SYNCHRONOUS_MODE_INTERFACE "synchronous_mode"

4 Direct Access to Memory

The processor API includes support for caching pointers to simulated memory so that the model can perform accesses directly instead of having to use the API for every individual access.

This chapter describes the interfaces and API calls used to setup and handle direct access to memory.

4.1 Direct Memory Lookup Interface

The direct_memory_lookup interface is implemented by Simics memory-spaces. The interface is used by simulator objects that want to do fast accesses to memory and/or want to build up a cached representation of memory. These objects are referred to as memory users, e.g., processors. Fast accesses are done via host pointers to simulated memory. The direct_memory_lookup interface is used in conjunction with the direct_memory interface which is implemented by objects that own the actual data storage, e.g., RAM/ROM objects. These objects are called direct-memory objects.

To access data, a memory-user object first calls the lookup method on the memory space obj. The requester is the memory-user doing the lookup. The lookup method traces the range specified by addr and size through memory spaces and translators until a direct-memory object is reached. The direct-memory object is returned in the target field and the offset into this object corresponding to addr is returned in the offs field.

The call to lookup fails if the specified range does not map continuously to a direct-memory object. A lookup failure is indicated by returning NULL in the target field.

The access argument is a bit field of at least one access_t value specifying what kind of accesses the memory user is interested in. All specified access types must reach the same direct-memory object and range for the lookup to succeed. If the memory space, for example, redirects reads and writes to different memory ranges or direct-memory objects, a lookup would fail if access specified both read and write. Note that the actual access permissions needed to access the real data must be requested from the direct-memory object using the request method of the direct_memory interface. The access argument is only used to locate the direct-memory object.

The return field access contains at least the access bits requested used in the lookup request, but may as an optimization contain a superset, indicating that the lookup result is valid for this superset. However, there is no guarantee that this optimization takes place.

Once a direct-memory object has been found, the direct_memory interface can be used to obtain a direct pointer to the contents of the direct-memory object.

The tracers and breakpoints fields in the return value contain information about installed tracers and breakpoints that intersect the range. Examples of tracers are timing models and snoop objects. In order to trigger breakpoints and invoke any tracers, the memory user should perform memory operations using the memory_space interface. Only breakpoints and tracers that overlap (binary and) with the provided access argument need to be considered.

typedef struct {
        conf_object_t     *target;
        uint64             offs;

        access_t           breakpoints;      // conflicting breakpoints
        access_t           tracers;          // conflicting tracers
        access_t           access;           // handle valid for access
} direct_memory_lookup_t;

SIM_INTERFACE(direct_memory_lookup) {
        direct_memory_lookup_t (*lookup)(conf_object_t *NOTNULL obj,
                                         conf_object_t *requester,
                                         physical_address_t addr,
                                         unsigned size,
                                         access_t access);
};

#define DIRECT_MEMORY_LOOKUP_INTERFACE "direct_memory_lookup"

The direct_memory_lookup and direct_memory interfaces replace the memory_page interface of Simics 4.8.

4.2 Direct Memory Interface

The direct_memory interface is implemented by objects that model memory, such as RAM and ROM objects. These are called direct-memory objects. A user of the interface is called a memory user and is typically a processor that wants to do fast accesses to memory. The direct-memory object corresponding to a particular physical address can be obtained using the lookup method of the direct_memory_lookup interface. See the documentation for the direct_memory_lookup interface for more information.

A memory user using the direct_memory interface must implement the direct_memory_update interface.

The get_handle method is used by a memory user to create or retrieve a handle to the memory region starting at offset offs with size size. The handle is typically used later on to request access permissions and to retrieve a direct pointer to the region. The handle returned by get_handle is private to the memory user specified in the requester parameter.

If get_handle is invoked multiple times for the same range, and with identical requester and subsystem arguments, then the same handle will be returned each time, assuming the original handle is still valid. Note that the original handle is only returned if the range matches exactly. A single memory user can obtain multiple distinct handles for the same memory range by using different values for the subsystem parameter.

For RAM and ROM, offs and size must specify a region which does not intersect a naturally aligned 8192 byte boundary, or the request will fail with a NULL return value. Other direct-memory objects might have different requirements.

The request method is used to request a host pointer to simulated memory. This pointer can be used to carry out fast memory operations without having to involve the Simics API. The handle argument is the handle obtained using get_handle.

Both the permission argument and the inhibit argument are access_t type bit fields. The permission argument is used to specify what kind of memory operations the memory user will perform. For example, if a memory user wants to read memory, the permission argument must include the Sim_Access_Read value. The inhibit argument specifies what other memory users are not allowed to do. For example, if inhibit is set to Sim_Access_Write other memory users are not allowed to write to the memory range. This protection mechanism can be used to create caches of simulated memory, request exclusive permissions to a memory range in order to carry out atomic operations, and similar. When a memory user is requesting permission to a memory range that another memory user has protected with conflicting inhibit bits, the direct-memory object will inform the other memory user of the lost permissions and protection through the direct_memory_update interface. A user can lose both the permission and protection for a memory range in this way. When this happens, a memory user may re-request permissions and inhibit protection.

Note: if a memory user has multiple handles which overlaps, then each handle is considered to be a distinct memory user. For example, if a memory user holds two handles, and requests write inhibit on one of them, then write permission will be revoked from the second handle (if such permission had been granted).

The request method returns a direct_memory_t value with information about the retrieved permissions and inhibit bits. These bits can be a super set of the bits that actually were requested. The returned data pointer can be used to access the memory range. Accesses are valid from the data pointer and up to the end of the range, i.e., addresses up to data pointer + size - 1, where size is the size valid for the handle. A call to request always succeeds and the corresponding memory range is valid until the permissions or the handle are revoked by the direct_memory_update interface. Note that the data pointer may change each time request is called (with the same handle) since Simics may move simulated memory. If the pointer changes, then the old pointer must not be used.

With set_user_data, a memory user can associate a user-defined pointer with a specific handle. The pointer can be retrieved using the get_user_data method, which takes a handle as an argument.

A memory user can use the release function to notify the direct-memory object when it is no longer interested in the memory region corresponding to handle. The handle is invalid and must not be used for anything after being released.

The ack method is used by a memory user to inform the direct-memory object that it has given up the corresponding permission and inhibit rights for a memory range when called by a method in the direct_memory_update interface.

Permissions can be revoked from all memory users by invoking the revoke method. The permission parameter specifies the permissions which will be revoked from all memory users. Similarly, inhibit specifies the inhibit privileges which will be revoked. For instance, calling revoke with permission set to Sim_Access_Write will ensure that nobody has write permissions to the direct-memory object.

typedef granted_mem_t *direct_memory_handle_t;

typedef struct {
#ifndef PYWRAP
        uint8                 *data;
#endif
        access_t               permission;
        access_t               inhibit;
} direct_memory_t;

typedef uint64 direct_memory_ack_id_t;

SIM_INTERFACE(direct_memory) {
        direct_memory_handle_t (*get_handle)(conf_object_t *NOTNULL obj,
                                             conf_object_t *NOTNULL requester,
                                             uint64 subsystem,
                                             uint64 offs,
                                             unsigned size);
        direct_memory_t (*request)(conf_object_t *NOTNULL obj,
                                   direct_memory_handle_t handle,
                                   access_t permission,
                                   access_t inhibit);
        void (*revoke)(conf_object_t *NOTNULL obj,
                       access_t access,
                       access_t permission,
                       access_t inhibit);
#ifndef PYWRAP
        void *(*get_user_data)(conf_object_t *NOTNULL obj,
                                    direct_memory_handle_t handle);
        void (*set_user_data)(conf_object_t *NOTNULL obj,
                              direct_memory_handle_t handle,
                              void *user_data);
#endif
        void (*release)(conf_object_t *NOTNULL obj,
                        direct_memory_handle_t handle);
        void (*ack)(conf_object_t *NOTNULL obj,
                    direct_memory_ack_id_t id);
};

#define DIRECT_MEMORY_INTERFACE "direct_memory"

4.3 Direct Memory Flush Interface

The direct_memory_flush interface is implemented by objects that model memory, such as RAM and ROM objects, and is used for flushing granted rights and for managing access rights.

The revoke method revokes granted access, permissions and inhibit rights from memory-user regions intersecting [base, base + size).

The set_access_bits method grants access rights access for the region [base, base + size) to the memory user requester. If requester is NULL, then access rights are granted to all memory users. If the set succeeds, true is returned, otherwise false.

SIM_INTERFACE(direct_memory_flush) {
        void (*revoke)(conf_object_t *NOTNULL obj,
                       uint64 base, uint64 size,
                       access_t access, access_t perm, access_t inhibit);
        bool (*set_access_bits)(conf_object_t *NOTNULL obj,
                                conf_object_t *requester,
                                uint64 base, uint64 size,
                                access_t access);
};
#define DIRECT_MEMORY_FLUSH_INTERFACE "direct_memory_flush"

4.4 Direct Memory Update Interface

The direct_memory_update interface must be implemented by memory-user objects that use the direct_memory interface.

The direct_memory_update interface replaces the memory_page_update interface from Simics 4.8.

Accesses to memory are controlled by a handle that the memory-user object requests by calling the get_handle method of the direct_memory interface. The object implementing the direct_memory interface through which the handle was established is passed to the functions in direct_memory_update as target.

If the release method is called, the corresponding handle and all the permissions and inhibit protections are recalled. The memory-user object must stop using the handle and associated data pointers and then call the acknowledge method ack in the direct_memory interface from which the handle was granted.

A call to the update_permission method revokes earlier requested rights for a handle. The lost_access argument recalls rights to use the handle for the given access bits. This means that the handle needs to be re-fetched (by a call to the lookup method of the direct_memory_lookup interface followed by a call to the get_handle method of the direct_memory interface) before the handle can be used again for the particular access. This typically happens if a new breakpoint is inserted or a remap of the memory system is done. In case of a remap it is possible that the same handle will never be returned again which means that any user data associated with the handle should be reclaimed.

The lost_permission and the lost_inhibit arguments describe which permission rights and inhibit protection that are revoked. However, in contrast to the lost_access, the handle is still valid and can be used to re-request permission rights and inhibit protection.

A call to the conflicting_access method informs a memory-user object that a conflicting memory operation will be performed. Hence, if the memory-user object have some protected representation of memory (such as decoded instructions in an internal cache), that representation of memory has to be flushed (or written back to memory in case of dirty data). Note however that the memory-user object does not lose any permission rights or any inhibit protection.

There is no mechanism for locking simulated memory in host memory.

All methods in this interface receive a direct_memory_ack_id_t value as an argument. The ack method of the direct_memory interface must be called with this id as an argument when the corresponding operation has been carried out. The ack method may be called after the direct_memory_update interface function has returned, which allows for queueing of update requests. This may be valuable if multiple simulator threads are used.

An exception to the allowed queueing of update requests is for update requests that are received while calling request in the direct_memory interface. Such requests must be handled immediately with ack being called before the return of the direct_memory_update interface function. This requirement avoids a dead-lock that would otherwise happen if request would wait for ack before returning, but ack is queued to be handled at some time after request has returned.

SIM_INTERFACE(direct_memory_update) {
        void (*release)(conf_object_t *NOTNULL obj,
                        conf_object_t *NOTNULL target,
                        direct_memory_handle_t handle,
                        direct_memory_ack_id_t id);
        void (*update_permission)(conf_object_t *NOTNULL obj,
                                  conf_object_t *NOTNULL target,
                                  direct_memory_handle_t handle,
                                  access_t lost_access,
                                  access_t lost_permission,
                                  access_t lost_inhibit,
                                  direct_memory_ack_id_t id);
        void (*conflicting_access)(conf_object_t *NOTNULL obj,
                                   conf_object_t *NOTNULL target,
                                   direct_memory_handle_t handle,
                                   access_t conflicting_permission,
                                   direct_memory_ack_id_t id);
};

#define DIRECT_MEMORY_UPDATE_INTERFACE "direct_memory_update"

5 Controlling and Inspecting Processor State

The processor state can be controlled and inspected using commands or GUI controls. This chapter describe the interfaces, haps, and breakpoint functionality required for supporting this.

5.1 Commands

Several generic Simics commands automatically works by just implementing standard interfaces. How the interfaces map onto objects depend on how the model is designed, for example the type of models described in Chapter 2.

5.2 Interfaces

This section lists the interfaces that can be implemented by the processor model to enable certain Simics features. They are not required, but implementing them will allow user defined processor models to support the same generic feature set as Simics standard processor models. If you intend to plug your model into an existing Simics-provided platform, then many of these interfaces are actually required for such a platform to function.

Processor Info Interface

The processor_info_v2 interface is implemented by processors models. The interface has processor generic functions that are architecture independent.

The disassemble function returns the disassemble string for an instruction at address with opcode according to instruction_data. The instruction_data is an attr_value_t value of data type with the bytes of the opcode. The bytes are in the same order as they are stored in memory. For VLIW architectures, sub_operation is used to select which sub-operation to disassemble. The sub-operations start at zero, and a request for the entire unit including all sub-operations is encoded with sub-operation -1. A request for a sub-operation that is not present (for example when sub-operation is neither 0 nor -1 for non-VLIW architectures) results in the integer part of the return tuple being set to zero. If successful, the function should return a tuple with the size of the instruction in bytes and the disassembly string. The disassembly string should be allocated with MM_MALLOC or similar and is to be freed by the caller. If more bytes are needed, then the function should indicate that by returning a negative number in the tuple where the absolute value of the number is the required number of bytes. The string should be NULL if more bytes are needed. The implementor of processor_info_v2 is allowed to request one additional byte at a time until enough bytes are passed to determine what the instruction is. Illegal instructions should still result in a valid returned tuple, where the integer part will be used by the disassemble command to skip that many bytes before disassembling the next instruction. The address can be used to display absolute destinations of program counter relative branches.

The set_program_counter function sets the program counter in the processor. The get_program_counter function returns the current program counter.

The logical_to_physical function translates a logical address to a physical address of the type defined by access_type. The function returns a physical_block_t struct with valid bit and the address. The address is valid when the valid bit is not 0. The logical_to_physical function also returns block_start and block_end. The start and end of a block has the same logical to physical transformation as the translated address. The range is inclusive, so block_end should be the address of the last byte of the block. This information can be used to figure out how often the logical_to_physical function needs to be called. An implementation would typically return the page start and end here, but it is free to return any power of 2 sized block as long as it includes the translated address.

The current operating mode of the processor is returned with get_processor_mode.

The processor can be enabled or disabled with the enable_processor or disable_processor functions. The functions should return 0 if the processor changed from enabled to disabled or from disabled to enabled, and 1 if the processor did not change state. The current state is returned by the get_enabled function. Enabled or disabled here refers to the state that the user of the model has put the processor into. In particular, it is independent of the power mode of the processor. A processor that has powered down does not count as disabled in this sense, nor does the enable_processor wake up a processor that is in a power-saving sleep state.

The endianness of the processor is returned by the get_endian function.

The physical memory object is returned by the get_physical_memory function. The object returned by get_physical_memory is used to set breakpoints by the global break command, and to read and write physical memory through set, get, load-binary, load-file, and the default implementation of disassemble. The object returned implements the memory_space and breakpoint interfaces. The memory_space interface for the returned object is only be used in inquiry mode corresponding to actions by the simulator itself rather than by the simulated software. An implementation may return NULL from this method, which will lead to the command listed above not being supported when such a processor is selected.

The get_logical_address_width function returns the number of logical/virtual address bits and the get_physical_address_width function returns the number of physical address bits.

The processor architecture is returned by calling the architecture function. The architecture should be one of arm, mips32, mips64, ppc32, ppc64, sparc-v8, sparc-v9, x86, x86-64, or something else if none of the listed is a good match.

All functions in the interface are optional. Each function can be set to NULL if it is not supported.

SIM_INTERFACE(processor_info_v2) {
        tuple_int_string_t (*disassemble)(conf_object_t *obj,
                                          generic_address_t address,
                                          attr_value_t instruction_data,
                                          int sub_operation);
        void (*set_program_counter)(conf_object_t *obj,
                                    logical_address_t pc);
        logical_address_t (*get_program_counter)(conf_object_t *obj);
        physical_block_t (*logical_to_physical)(conf_object_t *obj,
                                                logical_address_t address,
                                                access_t access_type);
        processor_mode_t (*get_processor_mode)(conf_object_t *obj);
        int (*enable_processor)(conf_object_t *obj);
        int (*disable_processor)(conf_object_t *obj);
        int (*get_enabled)(conf_object_t *obj);

        cpu_endian_t (*get_endian)(conf_object_t *obj);
        conf_object_t *(*get_physical_memory)(conf_object_t *obj);

        int (*get_logical_address_width)(conf_object_t *obj);
        int (*get_physical_address_width)(conf_object_t *obj);

        const char *(*architecture)(conf_object_t *obj);
};

#define PROCESSOR_INFO_V2_INTERFACE "processor_info_v2"

Note that the original version of this interface (processor_info) must also be implemented. The only difference between the two interfaces is that the original version lacks the get_processor_mode function.

Processor CLI Interface

Some commands and features in the CLI use the processor_cli interface. Those commands will have limited functionality if the interface is not fully implemented.

The first argument to each function is the object to act on. This object should implement both the processor_info interface and the processor_cli interface.

The get_disassembly function is used for the disassemble command as well as to disassemble the next instruction to be executed, when control is returned to the CLI prompt. For most architectures, get_disassembly can be set to NULL, in which case the command will use other interfaces to provide a generic disassembly. The get_disassembly function should return a tuple with the length of the instruction in bytes and the disassembly string. The addr_prefix parameter selects the address type of the address parameter, whether it is a physical address ("p"), a linear address ("l") or a virtual address ("v"), just as returned from get_address_prefix. The address parameter is the program counter for the instruction to disassemble. If print_cpu is non-zero, then the name of the processor should be included first in the disassembly line. If mnemonic is not NULL, then it should be output instead of the instruction disassemble. The mnemonic is used to print exception or interrupt information as returned by the get_pending_exception_string function.

get_pregs returns the string to output in the CLI for the print-processor-registers command. The all parameter is a boolean corresponding to the -all switch to the print-processor-registers command.

The diff_regs function is used by the stepi command when the -r flag is used. The diff_regs function returns a list of register names, where each register in that list will be read through the int_register interface before and after an instruction.

When returning to the CLI prompt, information about the next instruction or step to execute is printed. Normally, that is the disassemble of the instruction at the current program counter. The get_pending_exception_string function is called before the disassembly to find out if the next step will not be an instruction, but rather a taken exception or interrupt. The function should inspect the given cpu (an object implementing processor_info and processor_cli) and return NULL if the next step will be the execution of the instruction at the current program counter. If the next step will instead be the handling of an exception or interrupt, then a string saying that should be returned.

The get_address_prefix function returns a string with the default address prefix for memory related commands. Simics defines the generic prefixes "v" for virtual addresses, "l" for linear addresses, and "p" for physical addresses. The default if get_address_prefix is NULL is "v" for virtual addresses.

translate_to_physical translates an address to a physical address. If translate_to_physical is NULL, then the only allowed address prefixes are "v" (virtual) and "p" (physical), and the logical_to_physical function in the processor_info interface will be used to translate virtual addresses.

SIM_INTERFACE(processor_cli) {
	tuple_int_string_t (*get_disassembly)(conf_object_t *obj,
                                              const char *addr_prefix,
                                              generic_address_t address,
                                              bool print_cpu,
                                              const char *mnemonic);
	char *(*get_pregs)(conf_object_t *cpu,
                           bool all);
	attr_value_t (*get_diff_regs)(conf_object_t *obj);
	char *(*get_pending_exception_string)(conf_object_t *obj);
	char *(*get_address_prefix)(conf_object_t *obj);
	physical_block_t (*translate_to_physical)(conf_object_t *obj,
                                                  const char *prefix,
                                                  generic_address_t address);
};

#define PROCESSOR_CLI_INTERFACE "processor_cli"

Processor GUI Interface

The processor_gui interface is implemented by processors that support displays in the Simics native GUI. It is only registered to indicate support for the displays, and does not contain any actual functionality.

SIM_INTERFACE(processor_gui) {
        void (*dummy)(conf_object_t *obj);
};

#define PROCESSOR_GUI_INTERFACE "processor_gui"

Step Interface

The step interface is typically implemented by processors, but can be implemented by other objects as well. Its purpose is to handle step events using a queue.

The current number of steps for the queue is returned when calling get_step_count.

The post_step function will schedule an event that will occur after steps (which must be nonnegative) counted from local current step at queue. An event previously posted can be removed by calling cancel_step. The cancel_step function takes a function pred as argument which is called when a matching event is found. The event is only removed if pred returns 1. The find_next_step takes the same arguments as cancel_step but only returns the number of cycles before the event will occur. The evclass is the event class, obj is the object posting the event, and user_data is pointer to data used as a parameter when calling the callback function defined in the evclass. If no matching event was found, find_next_step returns −1.

The events method returns a list of all pending events in expiration order. Each element is a four-element list containing the event object, the event class name, the expiration time counted in steps as an integer and the event description as given by the event class describe method, or nil for events whose event class do not define that method.

The advance function will increment the number of steps for the queue, decrementing the number of steps to the first event to the value defined by steps. The number of steps remaining to the next event is returned. It is an error to advance beyond the next pending event, so the return value is never negative.

The implementor of the step interface can use any checkpoint representation. The name field in the event class data structure is unique, and the attribute setter function for checkpoint restore can use VT_get_event_class to get the event class structure corresponding to an event class name.

SIM_INTERFACE(step) {
        pc_step_t (*get_step_count)(conf_object_t *NOTNULL queue);
        void (*post_step)(
                conf_object_t *NOTNULL queue,
                event_class_t *NOTNULL evclass,
                conf_object_t *NOTNULL obj,
                pc_step_t steps,
                lang_void *user_data);
        void (*cancel_step)(
                conf_object_t *NOTNULL queue,
                event_class_t *NOTNULL evclass,
                conf_object_t *NOTNULL obj,
                int (*pred)(lang_void *data, lang_void *match_data),
                lang_void *match_data);
        pc_step_t (*find_next_step)(
                conf_object_t *NOTNULL queue,
                event_class_t *NOTNULL evclass,
                conf_object_t *NOTNULL obj,
                int (*pred)(lang_void *data, lang_void *match_data),
                lang_void *match_data);

        attr_value_t (*events)(conf_object_t *NOTNULL obj);

        pc_step_t (*advance)(conf_object_t *queue, pc_step_t steps);
};

#define STEP_INTERFACE "step"

Step-Cycle-Ratio Interface

The step_cycle_ratio interface is implemented by processors that support a changeable ratio between steps and cycles. The set-step-rate command uses this interface to set the ratio between steps and cycles.

The set_ratio sets the ratio between steps and cycles. Note that the introduction of stall cycles can skew the ratio. The get_ratio simply returns the current ratio.

The cycles and step arguments must be in the range [1..128] and cycles must be a power of two. Implementers of this interface may choose to ignore other values of cycles and step and may log an error.

typedef struct {
        uint32 steps;
        uint32 cycles;
} step_cycle_ratio_t;

SIM_INTERFACE(step_cycle_ratio) {
        step_cycle_ratio_t (*get_ratio)(conf_object_t *obj);
        void (*set_ratio)(conf_object_t *obj, uint32 steps, uint32 cycles);
};

#define STEP_CYCLE_RATIO_INTERFACE "step_cycle_ratio"

Stall Interface

The stall interface can be implemented by objects that also implement the cycle and step interfaces. The stall interface controls the addition of extra cycles between steps.

The get_stall_cycles function returns the remaining number of stall cycles. The object will advance that number of cycles before starting with the next step.

The set_stall_cycles function is used to change the number of stall cycles before the next step. It is legal to first call this function with a large value for cycles and then at a later point reduce the cycle count is resume execution earlier than indicated by the first call.

The get_total_stall_cycles returns the total accumulated number of stall cycles.

SIM_INTERFACE(stall) {
        cycles_t (*get_stall_cycles)(conf_object_t *obj);
        void (*set_stall_cycles)(conf_object_t *obj, cycles_t cycles);
        cycles_t (*get_total_stall_cycles)(conf_object_t *obj);
};
#define STALL_INTERFACE "stall"

Register Interface

The int_register interface is used for access to registers in a processor. It can be used to access any kind of integer register, not only the "normal" registers. This includes all kinds of control registers, hidden registers and anything else that might be useful to access as a register. The only limitation is that the register value should be representable as a 64-bit unsigned integer.

This interface can be implemented by other classes than processors, but it is likely to be found mostly in processors.

Registers are identified by a number, and there are two functions to translate from register names to register numbers and back. The translation need not be one-to-one, which means that one register can have several names. A register name can, however, only translate to a single register number.

Often, registers are grouped in register banks, where registers in the bank are numbered from 0 up. Registers in a bank should have consecutive numbers (unless their numbering is very sparse). This allows a user to deduce register numbers by calling get_number for the first register only. The first register numbers should be used for the general-purpose integer registers, if possible (so that integer register rN has number N).

Using this interface to read or write registers does not cause any side effects, such as triggering interrupts or signalling haps.

get_number translates a register name to its number. Returns -1 if the register does not exist.

get_name translates a register number to its canonical name.

read reads a register value.

write writes a new register value.

all_registers returns a list of all register numbers that can be used for this object.

register_info returns information about a single register. The information return depends on the info parameter.

Sim_RegInfo_Catchable
Return 1 if Core_Control_Register_Write and Core_Control_Register_Read are triggered when this register is written or read.
Return 0 otherwise.

typedef enum {
        Sim_RegInfo_Catchable
} ireg_info_t;

SIM_INTERFACE(int_register) {
        int (*get_number)(conf_object_t *NOTNULL obj,
                          const char *NOTNULL name);
        const char *(*get_name)(conf_object_t *NOTNULL obj, int reg);
        uint64 (*read)(conf_object_t *NOTNULL obj, int reg);
        void (*write)(conf_object_t *NOTNULL obj, int reg, uint64 val);
        attr_value_t (*all_registers)(conf_object_t *NOTNULL obj);
        int (*register_info)(conf_object_t *NOTNULL obj, int reg,
                             ireg_info_t info);
};

#define INT_REGISTER_INTERFACE "int_register"

Decoder Interface

SIM_INTERFACE(decoder) {
        void (*register_decoder)(conf_object_t *obj, 
                                 decoder_t *NOTNULL decoder);
        void (*unregister_decoder)(conf_object_t *obj, 
                                   decoder_t *NOTNULL decoder);
};

The decoder interface is implemented by processors that allows connecting user decoders. This allows a user to implement the semantics of instructions that are not available in the standard Simics model or change the semantics of instructions implemented by Simics. This interface replaces SIM_register_arch_decoder and SIM_unregister_arch_decoder functions.

The register_decoder function adds a decoder and unregister_decoder removes a decoder.

The decoder is installed/removed for every object of the same class as the obj argument which must be the same object from which the interface was fetched.

When Simics decodes an instruction, it will first see if any instruction decoders are registered for the current CPU class. For any decoders it finds, Simics will let it try to decode the instruction. The decoders are called in order, starting with the last registered decoder, and if one decoder accepts the instruction, the rest of the decoders will not be called.

The decoder is specified by the decoder_t data structure that the user supplies:

typedef struct {
        void *user_data;
        int (*NOTNULL decode)(uint8 *code,
                              int valid_bytes,
                              conf_object_t *cpu,
                              instruction_info_t *ii,
                              void *user_data);
        tuple_int_string_t (*NOTNULL disassemble)(uint8 *code,
                                                  int valid_bytes,
                                                  conf_object_t *cpu,
                                                  void *user_data);
        int (*NOTNULL flush)(instruction_info_t *ii,
                             void *user_data);
} decoder_t;

The decode function is called to decode an instruction pointed to by code. The first byte corresponds to the lowest address of the instruction in the simulated memory. valid_bytes tells how many bytes can be read. The CPU is given in the cpu parameter. When the decoder has successfully decoded an instruction, it should set the ii_ServiceRoutine, the ii_Arg, and the ii_Type members of the ii structure (see below), and returns the number of bytes used in the decoding. If it does not apply to the given instruction, it should return zero. If the decoder needs more data than valid_bytes it should return a negative number corresponding to the total number of bytes it will need to continue the decoding. The underlying architecture limits the number of bytes that can be requested, e.g. no more than 4 bytes can be requested on most RISC architectures. Simics will call the decoder again when more bytes are available. This process is repeated until the decoder accepts or rejects the instruction. A decoder should never request more data than it needs. For example, if an instructions can be rejected by looking at the first byte, the decoder should never ask for more bytes.

The instruction_info_t is defined as follows:

typedef struct instruction_info {
        service_routine_t  ii_ServiceRoutine;
        uint64             ii_Arg;
        unsigned int       ii_Type;
        lang_void         *ii_UserData;
        logical_address_t  ii_LogicalAddress;
        physical_address_t ii_PhysicalAddress;
} instruction_info_t;

ii_ServiceRoutine is a pointer to a function that will be called by Simics every time the instruction is executed. It has the following prototype:

typedef exception_type_t (*service_routine_t)(conf_object_t *cpu, 
                                              uint64 arg,
                                              lang_void *user_data);

The service routine function should return an exception when it is finished to signal its status. If no exception occurs Sim_PE_No_Exception should be returned.

See exception_type_t in src/include/simics/base/memory.h for the different exceptions available.

A special return value, Sim_PE_Default_Semantics, can be returned; this signals Simics to run the default semantics for the instruction. This is useful if the semantics of an instruction should be changed but the user routine does not want to handle it all the time.

Note that in a shared memory multiprocessor, the CPU used in decoding may differ from the CPU that executes the instruction, since the decoded instructions may be cached.

ii_Arg is the argument arg that will be passed on to the service routine function. Op code bit-fields for the instruction such as register numbers or intermediate values can be stored here. The ii_UserData field can also be used to pass information to the service routine if more data is needed.

ii_Type is either UD_IT_SEQUENTIAL or UD_IT_CONTROL_FLOW. A sequential type means that the instruction does not perform any branches and the update of the program counter(s) is handled by Simics. In a control flow instruction on the other hand it is up to the user to set the program counter(s).

ii_LogicalAddress and ii_PhysicalAddress holds the logical and physical addresses of the instruction to be decoded.

The disassemble function is called to disassemble an instruction. It uses the same code, valid_bytes, and cpu parameters as the decode function. If the disassembly is valid, then the string part of the returned tuple_int_string_t struct should be a MALLOCed string with the disassembly and the integer part should be its length in bytes. The caller is responsible for freeing the disassembly string. The string member should be NULL and the integer part should be zero if the disassembly is not valid. If the disassemble function needs more data than valid_bytes it should return a negative number in the integer part in the same way as the decode function, and set the string part to NULL.

The flush function is called to free any memory allocated when decoding an instruction and any user data associated with the instruction. It should return zero if it does not recognize the instruction, and non-zero if it has accepted it. Usually, the way to recognize if a decoded instruction is the right one to flush is to compare ii->ii_ServiceRoutine with the function that was set in the decode function. Note that the cpu parameter is the processor that caused the flush. It is more or less an arbitrary processor and should be ignored.

In addition to the function pointers, the decoder_t structure contains a user_data pointer that is passed to all the functions. This can be used for passing any data to the decoder functions.

Exception Interface

The exception interface is used together with the Core_Exception hap to enable inspection abilities for triggered exceptions.

The exception interface is used to translate exception numbers, as received by the Core_Exception hap, to names, and vice versa.

The get_number function returns the number associated with an exception name, or -1 if the no exception with the given name exist. The get_name returns the name associated with an exception number. The get_source function is only used on X86 targets and returns the source for an exception, as an exception number can be raised from different sources. The all_exceptions function returns a list of all exceptions numbers.

The exception numbers are architecturally defined, while their names are defined by the model.

SIM_INTERFACE(exception) {
        int (*get_number)(conf_object_t *NOTNULL obj,
                          const char *NOTNULL name);
        const char *(*get_name)(conf_object_t *NOTNULL obj, int exc);
        int (*get_source)(conf_object_t *NOTNULL obj, int exc);
        attr_value_t (*all_exceptions)(conf_object_t *NOTNULL obj);
};

#define EXCEPTION_INTERFACE "exception"

Context Handler Interface

Note: This interface is not supported, and may change in the future.
Get and set current context. The set_current_context function returns zero if the passed object is not of the context class, otherwise one is returned.

SIM_INTERFACE(context_handler) {
        conf_object_t *(*get_current_context)(conf_object_t *obj);
        int (*set_current_context)(conf_object_t *obj, conf_object_t *ctx);
};

#define CONTEXT_HANDLER_INTERFACE "context_handler"

Exec Trace Interface

The exec_trace interface is implemented by processor models that support tracing. A trace listener registers itself with the register_tracer call. The tracer callback will be called by the processor model when each instruction is just about to be executed, passing the tracer_data as passed to the register_tracer function in addition to information about the instruction that is executed. Invoke unregister_tracer with the same two pointers to deregister the listener.

typedef void (*instruction_trace_callback_t)(lang_void *tracer_data,
                                             conf_object_t *cpu,
                                             linear_address_t la,
                                             logical_address_t va,
                                             physical_address_t pa,
                                             byte_string_t opcode);

The pa parameter to the callback will always be valid, but some CPU architectures may not support la or va. The la argument is typically only valid for x86 CPUs. Lastly, the opcode of the instruction is passed in opcode. The opcode is passed without endian conversion, meaning that byte X in opcode corresponds to the byte at pa + X.

SIM_INTERFACE(exec_trace) {
        void (*register_tracer)(conf_object_t *NOTNULL cpu_obj,
                                instruction_trace_callback_t tracer,
                                lang_void *tracer_data);
        void (*unregister_tracer)(conf_object_t *NOTNULL cpu_obj,
                                  instruction_trace_callback_t tracer,
                                  lang_void *tracer_data);
};

#define EXEC_TRACE_INTERFACE "exec_trace"

Opcode Info Interface

The opcode_info interface is implemented by processors that need to communicate information about the encoding of instructions to the GUI.

The get_opcode_length function returns information about instruction encoding in the current operating mode of the processor. The min_alignment field indicates the smallest allowed alignment of instructions, typically 4 for regular RISC architectures. The max_length field specifies the maximum instruction length in bytes. The avg_length is an approximation of the average instruction size.

typedef struct {
        int min_alignment;
        int max_length;
        int avg_length;
} opcode_length_info_t;

SIM_INTERFACE(opcode_info) {
        opcode_length_info_t (*get_opcode_length_info)(conf_object_t *obj);
};

#define OPCODE_INFO_INTERFACE "opcode_info"

Virtual Data Breakpoint Interface

Add and remove virtual-address (and, on x86, linear-address) read and write breakpoints. On every read access that intersects a read breakpoint's interval, the registered callback function is called with the object that initiated the read, and the address and size of the read. (The interval includes both endpoints; first must be less than or equal to last.) Write breakpoints work exactly the same, except that the callback is given the actual value being written, not just its size.

The callback is called before the read or write has taken place, but may not intervene. If one or more breakpoint callbacks stop the simulation, the current instruction is completed before the stop takes effect. If more than one breakpoint is triggered by the same read or write, the implementation may call their callbacks in any order.

On x86, the Virtual_Breakpoint_Flag_Linear flag causes the breakpoint to use linear rather than virtual addresses. (Adding a breakpoint with unsupported flags is illegal.)

Note: This interface is preliminary and may change without prior notice.
typedef enum {
        Virtual_Breakpoint_Flag_Linear = 1
} virtual_breakpoint_flags_t;

SIM_INTERFACE(virtual_data_breakpoint) {
        virtual_data_bp_handle_t *NOTNULL (*add_read)(
                conf_object_t *NOTNULL obj,
                generic_address_t first, generic_address_t last,
                void (*NOTNULL callback)(
                        cbdata_call_t data, conf_object_t *NOTNULL initiator,
                        generic_address_t address, unsigned size),
                cbdata_register_t data, uint32 flags);
        virtual_data_bp_handle_t *NOTNULL (*add_write)(
                conf_object_t *NOTNULL obj,
                generic_address_t first, generic_address_t last,
                void (*NOTNULL callback)(
                        cbdata_call_t data, conf_object_t *NOTNULL initiator,
                        generic_address_t address, bytes_t value),
                cbdata_register_t data, uint32 flags);
        void (*remove)(conf_object_t *NOTNULL obj,
                       virtual_data_bp_handle_t *NOTNULL bp_handle);
};
#define VIRTUAL_DATA_BREAKPOINT_INTERFACE "virtual_data_breakpoint"

Virtual Instruction Breakpoint Interface

Add and remove virtual-address (and, on x86, linear-address) instruction breakpoints. Every time the processor executes an instruction that intersects the breakpoint's interval, the callback function is called with the processor, and the address and size of the instruction. (The interval includes both endpoints; first must be less than or equal to last.)

The callback is called before the instruction is executed. If one or more breakpoint callbacks stop the simulation, the stop takes effect before the instruction is run. (This means that once the simulation starts again, the same breakpoints will trigger immediately again. The callback can use VT_step_stamp to detect re-triggering.) If more than one breakpoint is triggered by the same instruction, the implementation may call their callbacks in any order.

If the filter function is non-null and returns false, the callback is not called. The filter function is supplied with the instruction opcode (the raw bytes of the instruction) and a processor (which may not be the same processor that the breakpoint is set on, but is guaranteed to be of the same class). The filter may base its decision only on the opcode bytes and the string obtained by asking the processor to disassemble the instruction; this allows the implementation to cache the result and omit future calls to the filter function where the opcode and disassembly string would be the same.

On x86, the Virtual_Breakpoint_Flag_Linear flag causes the breakpoint to use linear rather than virtual addresses. Calling with unsupported flags is illegal.

Note: This interface is preliminary and may change without prior notice.
typedef enum {
        Virtual_Breakpoint_Flag_Linear = 1
} virtual_breakpoint_flags_t;

SIM_INTERFACE(virtual_instruction_breakpoint) {
        virtual_instr_bp_handle_t *NOTNULL (*add)(
                conf_object_t *NOTNULL obj,
                generic_address_t first, generic_address_t last,
                bool (*filter)(cbdata_call_t filter_data,
                               conf_object_t *NOTNULL cpu, bytes_t opcode),
                cbdata_register_t filter_data,
                void (*NOTNULL callback)(
                        cbdata_call_t callback_data, conf_object_t *NOTNULL cpu,
                        generic_address_t address, unsigned size),
                cbdata_register_t callback_data, uint32 flags);
        void (*remove)(conf_object_t *NOTNULL obj,
                       virtual_instr_bp_handle_t *NOTNULL bp_handle);
};
#define VIRTUAL_INSTRUCTION_BREAKPOINT_INTERFACE \
        "virtual_instruction_breakpoint"

Describe Registers Interface

This interface is used by the Simics debugger to get certain information from a processor.

The first_child function returns the first description in the sequence of child descriptions of parent or NULL if parent has no children. Groups can have both registers and groups as children, registers can only have fields as children and fields cannot have any children. If parent is NULL, return the first description in the sequence of top-level descriptions.

Use next_description to deallocate the previous description and return the next description in the sequence or NULL if there are no more descriptions in the current sequence.

The free_description function is used to free the description without returning the next one in the sequence.

The first_named_value function returns the first named value in the sequence of named values for parent or NULL if there are no named values for parent. Only fields and registers can have named values.

Use next_named_value to deallocate the previous named value and return the next named value or NULL if there are no more named values in this sequence.

Use free_named_value to free the named value without returning the next one in the sequence.

The get and set functions are used to get and set the value of the register. To set the value pass in a bytes_t for the value. The value passed in must be long enough to contain the full value of the register. If the bytes_t is too long it will be truncated. To get the value pass in a buffer_t which is long enough to contain the register's value. The value is encoded in little endian byte order.

typedef enum {
        Description_Type_Group,

        Description_Type_Int_Reg,
        Description_Type_Float_Reg,
        Description_Type_Fields_Reg,

        Description_Type_Int_Field,
        Description_Type_Float_Field,
} description_type_t;

typedef enum {
        Reg_Role_None, /* No special role for the register. */
        Reg_Role_Program_Counter /* The register is the program counter. */
} reg_role_t;

typedef enum {
        Reg_Bitorder_Little_Endian,
        Reg_Bitorder_Big_Endian
} reg_bitorder_t;

typedef struct {
        const char *name;
        const char *description;
        const bytes_t value; /* Little endian byte order */
} named_value_t;

typedef struct {
        /* Common fields */
        description_type_t type;
        const char *name;
        const char *description;

        /* Register and field fields */
        int16 dwarf_id;            /* id used by dwarf for this register
                                      or -1 if no such id is defined. This
                                      is ABI specific, but the CPU will
                                      give the ids for the most common ABI
                                      for that architecture. */
        reg_bitorder_t bitorder;   /* Bitorder convention used in the
                                      documentation for this register or
                                      field. */
        reg_role_t role;           /* Role of this register in the ABI/HW. */
        bool memory_mapped;        /* True if the register is memory mapped. */
        uint64 offset;             /* Offset into the bank for memory mapped
                                      registers. */
        bool catchable;            /* True if Core_Control_Register_Write and
                                      Core_Control_Register_Read are triggered
                                      when this register is written or read. */
        int msb, lsb;              /* Most and least significant bit of the
                                      register or field. Always given in le
                                      bitorder. For groups msb == -1 and
                                      lsb == 0. */
        int regsize;               /* Number of bits in the register, or the
                                      register this field is a part of. */
        int reg_id;                /* For registers and fields the id to pass
                                      to the get and set methods to access the
                                      register's value. Fields have the same
                                      reg_id as the register they are a part
                                      of. Not valid for groups.*/
} description_t;

SIM_INTERFACE(describe_registers) {
        const description_t *(*first_child)(
                conf_object_t *NOTNULL obj, const description_t *parent);
        const description_t *(*next_description)(
                conf_object_t *NOTNULL obj, const description_t *prev);
        void (*free_description)(conf_object_t *NOTNULL obj,
                                 const description_t *desc);
        const named_value_t *(*first_named_value)(
                conf_object_t *NOTNULL obj, const description_t *parent);
        const named_value_t *(*next_named_value)(
                conf_object_t *NOTNULL obj, const named_value_t *prev);
        void (*free_named_value)(conf_object_t *NOTNULL obj,
                                 const named_value_t *nv);
        void (*get)(conf_object_t *NOTNULL obj, int reg_id, buffer_t dest);
        void (*set)(conf_object_t *NOTNULL obj, int reg_id, bytes_t value);
};

#define DESCRIBE_REGISTERS_INTERFACE "describe_registers"

5.3 Using Haps

A hap is an event or occurrence in Simics with some specific semantic meaning, either related to the target or to the internals of the simulator. Haps can for instance be used by the cosimulator to trigger events on control register reads or writes.

The cosimulator calls one of the SIM_c_hap_occurred... functions to trigger a hap. The Simics Reference Manual lists standard haps to trigger. The Simics User's Guide document describes how to get a callback when a hap is triggered.

To be fully compatible with regular Simics CPU models, a cosimulator should trigger the following haps:

Details about the haps can be found in the Simics Reference Manual, or online with the list-haps command. The sample-risc module in [simics]/src/extensions/sample-risc/ contains example code how to raise Core_Control_Register_Read and Core_Control_Register_Write haps.

The cosimulator should not trigger the following haps as they are triggered by the Simics core:

6 Breakpoints

Breakpoints in Simics are set on physical, linear, or virtual address for read, write, or execute accesses. Linear address breakpoints are usually only supported by x86 processors. Figure 5 shows how to carry out a memory access to support breakpoints.

Figure 5. Memory access path with breakpoint support and direct access to memory.

Simics keeps an updated list of virtual breakpoints for a given virtual context in a context object. A processor object must make sure that all accesses are checked against the list and trigger a breakpoint when there is a hit.

The processor object may cache the list internally for efficiency. A processor that does not optimize the breakpoint checking internally can request that the context checks and triggers breakpoints. It does that by calling the trigger_breakpoint function in the breakpoint_trigger interface with the handle set to BREAKPOINT_HANDLE_ANY.

There are two ways that the list of virtual address breakpoints can change and the processor must then refresh or invalidate its breakpoint information cache:

Breakpoint changes can be listened for either using the Core_Breakpoint_Change hap, or by registering callbacks through the breakpoint_change interface.

The active context is changed through the set_context function in the context_handler interface.

The direct_memory_lookup interface will return information flagging if breakpoints or tracers are present on a returned lookup. If breakpoints or tracers are flagged and overlap with the access type the direct memory user wants to perform, then the access should be performed using the memory_space interface. When accessing memory through the memory_space interface, physical address breakpoints will be handled by the memory-space and do not need to be handled in the cosimulator. Unlike for virtual breakpoints, the cosimulator does not need to listen for new or changed physical breakpoints. Any change to a physical breakpoint on a requested page will result in the page being invalidated through the direct_memory_update interface.

6.1 Breakpoint Query Interface

Objects of the context and memory-space classes implement the breakpoint_query_v2 interface. It is used by processors to check for breakpoints.

Implementors of this interface should use SIM_register_compatible_interfaces after SIM_register_interface to also register earlier versions of the interface.

The get_breakpoints function returns a set of breakpoints that intersect the range given in start and end, including both start and end in the range. Only breakpoints set on access types with bits set in read_write_execute will be returned.

When information from get_breakpoints has been processed, the breakpoints array in the breakpoint_set_t structure should be freed with MM_FREE.

SIM_INTERFACE(breakpoint_query_v2) {
        breakpoint_set_t (*get_breakpoints)(conf_object_t *obj,
                                            access_t read_write_execute,
                                            generic_address_t start,
                                            generic_address_t end);
};

#define BREAKPOINT_QUERY_V2_INTERFACE "breakpoint_query_v2"

6.2 Breakpoint Change Interface

The breakpoint_change interface is implemented by objects that wish to get notified when breakpoints are added, removed, or changed in some way.

The object that wants to get notified of changes to breakpoints in a another object would use the simple_dispatcher in the breakpoint_change port of that other object. The other object will then lookup the breakpoint_change interface in the object to notify and use that when breakpoint updates occur.

The breakpoint_change port is implemented by objects of the cell class, and a listener that registers on a cell with get notified on any breakpoint changes in that cell.

SIM_INTERFACE(breakpoint_change) {
        void (*breakpoint_added)(conf_object_t *obj,
                                 conf_object_t *bp_obj,
                                 breakpoint_handle_t handle);
        void (*breakpoint_removed)(conf_object_t *obj,
                                   conf_object_t *bp_obj,
                                   breakpoint_handle_t handle);
};

#define BREAKPOINT_CHANGE_INTERFACE "breakpoint_change"

6.3 Breakpoint Trigger Interface

Objects implementing the breakpoint interface typically also implement the breakpoint_trigger interface. Processors call the trigger_breakpoint function to signal that a memory access intersects a breakpoint. The trigger_breakpoint function will raise the appropriate haps and the currently scheduled execute object may receive a stop call during the call to trigger_breakpoint.

The obj argument is the object that holds the breakpoint, typically a memory space or a context object. initiator_obj is the source of the memory operation that triggers the breakpoint. The handle argument identifies the breakpoint to trigger. A value of BREAKPOINT_HANDLE_ANY can be passed as handle, in which case all the breakpoints that match the given address, size, and access will be triggered. The address, size, and access arguments specify information about the access that triggers the breakpoint. The data argument points to a buffer where the data for the access is kept.

If a stop is received during a call to trigger_breakpoint, then it is recommended that any software visible actions carried out after the breakpoint are logged. That could for example be to notify the user if the entire instruction that triggers a breakpoint will complete, and that the instruction will then not be re-executed when the simulation restarts.

SIM_INTERFACE(breakpoint_trigger) {
        void (*trigger_breakpoint)(conf_object_t *obj,
                                   conf_object_t *initiator_obj,
                                   breakpoint_handle_t handle,
                                   generic_address_t address,
                                   generic_address_t size,
                                   access_t access,
                                   uint8 *data);
};

#define BREAKPOINT_TRIGGER_INTERFACE "breakpoint_trigger"

7 Simics Environment Interaction

How a processor model fits into the Simics environment is the topic of this Chapter.

Simics API
Although the processor model does not directly expose any part of the Simics API, many API calls make use of the interfaces described in this document.
Command Line Interface
Some commands are bound to interfaces, and a processor model that implements a set of interfaces therefore automatically gets a set of commands defined. See Section 5.1 for more information about such commands. The processor model can implement additional commands beyond the pre-defined commands.
Debug Functionality
Low-level debug functionality that can be supported by a processor model is breakpoints (see Chapter 6) and the context_handler interface. The context_handler interface is used by a context switcher object to change target software context based on information from an operating system awareness module.
Checkpointing
Just like any other model, a processor model needs to properly save and restore all simulated state through attributes to support checkpointing.
Reverse Execution
Reverse execution is built upon the checkpointing system. Any deterministic model that records all external stimuli through recorder objects and fully supports checkpointing will automatically support reverse execution.
System Model Devices
Device models interact with the processor model through interfaces and can be reused as is when using a new processor model. Naturally, modifications need to be made if a different system is to be simulated. Close interaction with the system model builder is typically needed when changing the CPU model. For some system models, information about this is covered in the associated target guide.
System Model Components
Components are usually built to abstract away the low-level configuration of a model. Components for a user defined processor model can usually be derived from a Simics-provided component with only minor changes needed to bridge low-level differences in attributes and configuration.
Operating System Awareness
The Simics environment provides support for application debugging and analysis through operating system awareness modules that define process tracker objects. The operating system awareness models need to be modified to support new architectures, new revisions of operating systems, and also for completely new operating systems. When using an existing operating system awareness module with a user defined processor model, the naming of registers in the int_register interface needs to consistent.
Software Development Environment Awareness
Software development environments such as WindRiver Workbench are supported on multiple levels, including symbolic debugging capabilities and integration with vendor IDEs. The software development environment awareness support in Simics can be used as-is with a user defined processor model.

8 Simics API Functions

This section lists Simics API functions that a processor model may need to use beyond what is described in other parts of this document. Simics API SIM_register_clock is described in API Reference Manual.

NAME
VT_check_async_events — check for asynchronous events
SYNOPSIS
bool
VT_check_async_events(void);

DESCRIPTION
Check if any async events are pending and if so, make the current executing object exit in order for them to be handled. Return nonzero if there are pending events.

The Simics platform will check for asynchronous events between invocations of the run method in the execute. If significant time passes between such invocations, which will typically be the case unless there are multiple simulated CPUs that are switched between, then the implementor of execute needs to either call this method or exit pre-maturely from run to allow asynchronous events to be handled. If neither is performed, then interactive performance will suffer as user input would be waiting significant time before being processed.

If only the async event state is requested, the faster VT_async_events_pending should be used instead.

EXECUTION CONTEXT
Cell Context

NAME
VT_new_code_block — inform platform of new code block
SYNOPSIS
void
VT_new_code_block(void *start, size_t len);

DESCRIPTION
This function informs the Simics platform that there is a len bytes block at start that can be executed from. The call should be made before the passed code area is executed from. This information is used by Simics to communicate with systems like Pin or Valgrind that do not automatically detect run-time generated code. If modifications are done to a block, then a new call to this function is needed to inform the platform of that.
EXECUTION CONTEXT
Cell Context

NAME
VT_set_object_clock — set clock association for object
SYNOPSIS
void
VT_set_object_clock(conf_object_t *NOTNULL obj, conf_object_t *clock);

DESCRIPTION
The simulation infrastructure keeps track of clock associations for every object. For most objects, this is initialized through the queue attribute.

An object that is itself a clock needs to call this function to initialize the clock coupling to itself. The call to initialize the coupling should be made from the init method registered for the class in its class_info_t (or init_object if registered using class_data_t).

EXECUTION CONTEXT
Global Context

NAME
VT_clock_frequency_about_to_change — inform before clock frequency change
SYNOPSIS
void
VT_clock_frequency_about_to_change(conf_object_t *obj);

DESCRIPTION
To be called by a clock prior to a clock frequency change, with events and cycles reflecting the frequency before the change.
EXECUTION CONTEXT
Cell Context

NAME
VT_clock_frequency_change — inform after clock frequency change
SYNOPSIS
void
VT_clock_frequency_change(conf_object_t *obj, uint64 cycles_per_second);

DESCRIPTION
To be called by a clock when the clock frequency has changed, with events and cycles reflecting the frequency after the change.
EXECUTION CONTEXT
Cell Context

NAME
VT_stop_event_processing — stop event processing
SYNOPSIS
void
VT_stop_event_processing(conf_object_t *clock);

DESCRIPTION
This method is to be called by a clock when it has received a stop request through the execute interface and the clock is about to return from the run method in the execute interface.

In the future, the simulation framework may keep track of this thus relieving the clock from the duty of having to call this method.

EXECUTION CONTEXT
Cell Context

9 Miscellaneous Remarks