Understanding Specialization#
The goal of CBI is to help developers to reason about how a code base uses specialization to adapt to the capabilities and requirements of the different platforms it supports. By measuring specialization, we can reason about its impact upon maintenance effort.
Platforms#
The definition of platform used by CBI was first introduced in “Implications of a Metric for Performance Portability”, and is shared with the P3 Analysis Library:
A collection of software and hardware on which an application may run a problem.
This definition is deliberately very flexible, so a platform can represent any execution environment for which code may be specialized. A platform could be a compiler, an operating system, a micro-architecture or some combination of these options.
Specialization#
There are many forms of specialization. What they all have in common is that these specialization points act as branches: different code is executed on different platforms based on some set of conditions. These conditions express a platform’s capabilities, properties of the input problem, or both.
The simplest form of specialization point is a run-time branch, which is easily expressed but can incur run-time overheads and prevent compiler optimizations. Compile-time specialization avoids these issues, and in practice a lot of specialization is performed using preprocessor tools or with some kind of metaprogramming.
Code Divergence#
Code divergence is a metric proposed by Harrell and Kitson in “Effective Performance Portability”, which uses the Jaccard distance to measure the distance between two source codes.
For a given set of platforms, \(H\), the code divergence \(CD\) of an application \(a\) solving problem \(p\) is an average of pairwise distances:
where \(d_{i, j}(a, p)\) represents the distance between the source code required by platforms \(i\) and \(j\) for application \(a\) to solve problem \(p\).
The distance is calculated as:
where \(c_i\) and \(c_j\) are the lines of code required to compile application \(a\) and solve problem \(p\) using platforms \(i\) and \(j\). A distance of 0 means that all code is shared between the two platforms, whereas a distance of 1 means that no code is shared.
Note
It is sometimes useful to talk about code convergence instead, which is simply the code divergence subtracted from 1.