4.5 Raw Code Coverage Format

The idea about the raw format files is to allow any user to create code coverage reports or do some automated analysis on them, in those cases that the HTML report is not the best match.

The raw format file can contain all information required to create a coverage report, such as the HTML report. It may also contain partial data, such as information about available memory mappings and executed instructions. Which can later be extended in a way that it contains all required information.

The raw format contains Python data structures that has been marshalled using the Python pickle library. In the following description a Python-like syntax is used. Where citation around a word means that the string has that exact name. Without any quotations, that item gives a short description of what it represents and a longer description will follow further down. Items within <> notation describes the type of the element. An asterisk at the end of a list means zero or more of the previous elements, while a plus at the end means one or more of the previous element.


      {"version"<string>: version<int>,
       "features"<string>: {feature<string>: feature_specific_field<unknown>},
       "errors"<string>: [[code<int>, message<string>]*],
       "unknown"<string>: {address<int>: count<int>},
       "mappings"<string>: [
           {"map"<string>: {"symbol_file"<string>: file<string>,
                             "address"<string>: address<int>, "size"<string>: size<int>,
                             "file_offset"<string>: offset<int>,
                             "file_size"<string>: size<int>,
                             "relocation"<string>: relocation<int>,
                             "section"<string>: section<string>},
            "covered"<string>: {address<int>: count<int>},
            "branches"<string>: {address<int>: {"taken"<string>: count<int>,
                                                 "not_taken"<string>: count<int>}}
            "file_table"<string>: {file_id<string>: source_file<string>},
            "functions"<string>: {address<int>: {"name"<string>: name<string>,
                                                  "size"<string>: size<int>}},
            "data_labels"<string>: {address<int>: {"name"<string>: name<string>}},
            "info"<string>: [{"address"<string>: address<int>, "op": [op<int>+],
                               "mnemonic"<string>: mnemonic<string>,
                               "format"<string>: format<string>,
                               "executable_lines"<string>: {line<int>: True<bool>},
                               "file_id"<string>: file_id<string>}*],
            "src_info"<string>: {"file_id"<string>:
                                 {line<int>: [[start_address<int>, end_address<int>]*]}},
            "removed_data"<string>: {address<int>: {"size"<string>: size<int>}},
            "cpu_classes": [cpu_class<string>*],
            "disassembly_class": cpu_class<string>,
            "errors"<string>: [[code<int>, message<string>]*]}*],
       "cpu_classes": [cpu_class<string>*],
       "unknown_mappings"<string>: [
           {"map"<string>: {"address"<string>: address<int>,
                            "size"<string>: size<int>},
            "covered"<string>: {address<int>: count<int>}}]*}

version - The version of the code coverage raw format that was used to create the raw file. This version number is only increased when there is a change to the format that is either extending the previous version or breaking it (should be very rare).
features - A dictionary that maps a feature to its feature_specific_field information. Available features are:
- access_count - This feature is used to count how many times an instruction has been executed. The feature_specific_field is a bool, where True indicates that the feature is enabled.
- branch_coverage - This feature is used to collect branch coverage on instruction level. The feature_specific_field is a bool, where True indicates that the feature is enabled.
errors - A list with zero or more entries, containing an error code and the matching error message for any errors that are not bound to a specific mapping.
unknown - A dictionary which in turn maps executed addresses without any known mappings, at the execution time, to a count of how many times it was executed. In order for the execution counter to be valid, the access_count feature must be enabled. Otherwise, it will always be one for such executed instruction.
mappings - This entry contains a list of each mapping that was known to the code coverage system while collecting data. The following entries are available
- map - A key containing a unique description of the mapping. Which has the following identifiers:
  - symbol_file - Gives the location of the module backing up the mapping.
  - address - Specifies the loaded address of the mapping.
  - size - Specifies the size of the mapping.
  - file_offset - Specifies the offset into the file where this segment or section can be found.
  - file_size - Specifies the size of the entry in the file.
  - relocation - Specifies the relocation address.
  - section - Optional field, included when only one section of a symbol file is used as mapping.
- covered - This field works just like unknown field, except that this is per module and contains all executed addresses that were mapped to this mapping while executing.
- branches - Keeps track of all branch instructions and how many times a branch at the given address was taken or not taken. Only available if branch_coverage feature was enabled when collecting data.
- file_table - Maps file_id to a full source file path.
- functions - Contains a function's start address, name, and size.
- data_labels - Non-executable symbols in executable sections, containing the symbol's address and name.
- info - Contains a list, with one element per disassembled address, where each entry has the following members:
  - address - The address of the disassembled instruction
  - op - A list of opcodes making up the instruction
  - mnemonic - The mnemonic of the instruction
  - format - Instruction format, currently only present for ARM. Tells if the instruction has arm, thumb or aarch64 format.
  - file_id - A reference to the source file in the file_table. Only exists if source info has been added and exists for this address
  - executable_lines - Describes which source lines the instruction belongs to. Only exists if source info has been added and exists for this address
The raw data contains either this field or src_info, never both.
- src_info - A dictionary with information about executable lines and addresses for source files. Each item in the dictionary is source file where the key corresponds to a file_id in file_table.
The value of that item is another dictionary where there is an executable source line of that file and the value is a list of address ranges that correspond to that source line. Each address range in this list is a list of two elements, start_address and end_address, where the latter is inclusive.

A source line can be considered executed if any of the addresses inside its address ranges is found in the covered entry.

The raw data contains either this field or info, never both.
- cpu_classes - Optional field. Keeps track of which cpu classes have been run in this specific mapping when collecting coverage.
- disassembly_class - Optional field. Specifies which processor class has been used to disassemble this mapping. This field is only present when disassembling was done using a processor interface, not when a disassemble module was used.
- removed_data - Address ranges in the mapping that have been removed because they were considered to be data. The dictionary has the start address as key the size of the removed region as value. The name of the removed symbol should be retrievable from data_labels using the start address.
- errors - A per module error list, containing an error code and a message.
cpu_classes - Optional field. Keeps track of which cpu classes have been used to collect coverage with.
unknown_mappings - This entry contains a list of each mapping that was known to the code coverage system while collecting data, but did not have a symbol file name. The following entries are available
- map - A key containing the location of the unknown mapping. It has the following identifiers:
  - address - Specifies the loaded address of the mapping.
  - size - Specifies the size of the mapping.
- covered - This field works just like unknown field, except that this is per module and contains all executed addresses that were mapped to this mapping while executing.

4.4 Commands 4.6 Explanation of source coverage