Understanding Specialization#

The goal of CBI is to help developers to reason about how a code base uses specialization to adapt to the capabilities and requirements of the different platforms it supports. By measuring specialization, we can reason about its impact upon maintenance effort.

Platforms#

The definition of platform used by CBI was first introduced in “Implications of a Metric for Performance Portability”, and is shared with the P3 Analysis Library:

A collection of software and hardware on which an application may run a problem.

This definition is deliberately very flexible, so a platform can represent any execution environment for which code may be specialized. A platform could be a compiler, an operating system, a micro-architecture or some combination of these options.

Specialization#

There are many forms of specialization. What they all have in common is that these specialization points act as branches: different code is executed on different platforms based on some set of conditions. These conditions express a platform’s capabilities, properties of the input problem, or both.

The simplest form of specialization point is a run-time branch, which is easily expressed but can incur run-time overheads and prevent compiler optimizations. Compile-time specialization avoids these issues, and in practice a lot of specialization is performed using preprocessor tools or with some kind of metaprogramming.

Code Divergence#

Code divergence is a metric proposed by Harrell and Kitson in “Effective Performance Portability”, which uses the Jaccard distance to measure the distance between two source codes.

For a given set of platforms, H, the code divergence CD of an application a solving problem p is an average of pairwise distances:

CD(a,p,H)=(|H|2)1{i,j}H×Hdi,j(a,p)

where di,j(a,p) represents the distance between the source code required by platforms i and j for application a to solve problem p.

The distance is calculated as:

di,j(a,p)=1|ci(a,p)cj(a,p)||ci(a,p)cj(a,p)|

where ci and cj are the lines of code required to compile application a and solve problem p using platforms i and j. A distance of 0 means that all code is shared between the two platforms, whereas a distance of 1 means that no code is shared.

Note

It is sometimes useful to talk about code convergence instead, which is simply the code divergence subtracted from 1.