Emulating Compiler Behavior#

When CBI processes a file, it tries to obey all of the arguments that it can see in the compilation database. Unfortunately, compilers often have behaviors that are not reflected on the command line (such as their default include paths, or compiler version macros).

If we believe (or already know!) that these behaviors will impact the divergence calculation for a code base, we can use a configuration file to instruct CBI to append additional options when emulating certain compilers.

Attention

If you encounter a situation that is not supported by CBI and which cannot be described by our existing configuration files, please open an issue.

Motivating Example#

The foo.cpp files in our sample code base include specialization that we have ignored so far, which selects a line based on the value of the __GNUC__ preprocessor macro:

 1// Copyright (c) 2024 Intel Corporation
 2// SPDX-License-Identifier: 0BSD
 3#include <cstdio>
 4
 5void foo() {
 6#if __GNUC__ >= 13
 7    printf("Using a feature that is only available in GCC 13 and later.\n");
 8#else
 9    printf("Running the rest of foo() on the CPU.\n");
10#endif
11}

This macro is defined automatically by all GNU compilers and is set based on the compiler’s major version. For example, gcc version 13.0.0 would set __GNUC__ to 13. Checking the values of macros like this one can be useful when specializing code paths to workaround bugs in specific compilers, or when specializing code paths to make use of functionality that is only available in newer compiler versions.

Let’s take another look at the compilation database entry for this file:

[
{
  "directory": "/home/username/src/build-cpu",
  "command": "/usr/bin/c++ -o CMakeFiles/tutorial.dir/main.cpp.o -c /home/username/src/main.cpp",
  "file": "/home/username/src/main.cpp"
},
{
  "directory": "/home/username/src/build-cpu",
  "command": "/usr/bin/c++ -o CMakeFiles/tutorial.dir/third-party/library.cpp.o -c /home/username/src/third-party/library.cpp",
  "file": "/home/username/src/third-party/library.cpp"
},
{
  "directory": "/home/username/src/build-cpu",
  "command": "/usr/bin/c++ -o CMakeFiles/tutorial.dir/cpu/foo.cpp.o -c /home/username/src/cpu/foo.cpp",
  "file": "/home/username/src/cpu/foo.cpp"
}
]

CBI can see that the compiler used for foo.cpp is called /usr/bin/c++, but there is not enough information to decide what the value of __GNUC__ should be.

Defining Behaviors#

codebasin searches for a file called .cbi/config, and uses the information found in that file to determine implicit compiler behavior. Each compiler definition is a TOML table, of the form shown below:

[compiler.name]
options = [
  "option",
  "option"
]

In our example, we would like to define __GNUC__ for the c++ compiler, so we can add the following compiler definition:

[compiler."c++"]
options = [
  "-D__GNUC__=13",
]

Important

The quotes around “c++” are necessary because of the + symbols. The quotes would not be necessary for other compilers.

With the __GNUC__ macro set, the two lines of code that were previously considered “unused” are assigned to platforms, and the output of codebasin becomes:

-----------------------
Platform Set LOC % LOC
-----------------------
       {cpu}   8 29.63
       {gpu}   8 29.63
  {cpu, gpu}  11 40.74
-----------------------
Code Divergence: 0.59
Unused Code (%): 0.00
Total SLOC: 27