Understanding Specialization#

The goal of CBI is to help developers to reason about how a code base uses specialization to adapt to the capabilities and requirements of the different platforms it supports. By measuring specialization, we can reason about its impact upon maintenance effort.

Platforms#

The definition of platform used by CBI was first introduced in “Implications of a Metric for Performance Portability”, and is shared with the P3 Analysis Library:

A collection of software and hardware on which an application may run a problem.

This definition is deliberately very flexible, so a platform can represent any execution environment for which code may be specialized. A platform could be a compiler, an operating system, a micro-architecture or some combination of these options.

Specialization#

There are many forms of specialization. What they all have in common is that these specialization points act as branches: different code is executed on different platforms based on some set of conditions. These conditions express a platform’s capabilities, properties of the input problem, or both.

The simplest form of specialization point is a run-time branch, which is easily expressed but can incur run-time overheads and prevent compiler optimizations. Compile-time specialization avoids these issues, and in practice a lot of specialization is performed using preprocessor tools or with some kind of metaprogramming.

Code Divergence#

Code divergence is a metric proposed by Harrell and Kitson in “Effective Performance Portability”, which uses the Jaccard distance to measure the distance between two source codes.

For a given set of platforms, $H$ , the code divergence $C D$ of an application $a$ solving problem $p$ is an average of pairwise distances:

C D (a, p, H) = {(\binom{| H |}{2})}^{- 1} \sum_{{i, j} \in H \times H} d_{i, j} (a, p)

where $d_{i, j} (a, p)$ represents the distance between the source code required by platforms $i$ and $j$ for application $a$ to solve problem $p$ .

The distance is calculated as:

d_{i, j} (a, p) = 1 - \frac{| c_{i} (a, p) \cap c_{j} (a, p) |}{| c_{i} (a, p) \cup c_{j} (a, p) |}

where $c_{i}$ and $c_{j}$ are the lines of code required to compile application $a$ and solve problem $p$ using platforms $i$ and $j$ . A distance of 0 means that all code is shared between the two platforms, whereas a distance of 1 means that no code is shared.

Note

It is sometimes useful to talk about code convergence instead, which is simply the code divergence subtracted from 1.