p3analysis.metrics package

Module contents

p3analysis.metrics.application_efficiency(df, foms='lower')

Calculate application efficiency.

Application efficiency represents achieved performance relative to the best-known performance of any other application solving the same problem on the same platform. This function determines the “best-known” performance for each combination of “platform” and “problem” in the input DataFrame.

Calculated values will lie in the range \([0, 1]\).

Parameters:
  • df (DataFrame) – A pandas DataFrame storing performance data. The following columns are required: “problem”, “platform”, “application”, “fom”.

  • foms (string) – The interpretation of the figure of merit: “lower” if lower values are better, and “higher” if higher values are better.

Returns:

A new pandas DataFrame storing the application efficiency values calculated from the performance data provided in df.

Return type:

DataFrame

Raises:
  • ValueError – If any of the required columns are missing from df. If foms is not “lower” or “higher”.

  • TypeError – If any value in the “fom” column of df is a non-numeric value.

p3analysis.metrics.divergence(df, cov=None)

Calculate code divergence.

Code divergence is calculated as proposed by Harrell and Kitson in “Effective Performance Portability”, using the Jaccard distance to measure the distance between two source codes.

For a given set of platforms, \(H\), the code divergence \(CD\) of an application \(a\) solving problem \(p\) is an average of pairwise distances:

\[CD(a, p, H) = \binom{|H|}{2}^{-1} \sum_{\{i, j\} \in H \times H} {d_{i, j}(a, p)}\]

where \(d_{i, j}(a, p)\) represents the distance between the source code required by platforms \(i\) and \(j\) for application \(a\) to solve problem \(p\).

The distance is calculated as:

\[d_{i, j}(a, p) = 1 - \frac{|c_i(a, p) \cap c_j(a, p)|} {|c_i(a, p) \cup c_j(a, p)|}\]

where \(c_i\) and \(c_j\) are the lines of code required to compile application \(a\) and solve problem \(p\) using platforms \(i\) and \(j\). A distance of 0 means that all code is shared between the two platforms, whereas a distance of 1 means that no code is shared.

Parameters:
  • df (DataFrame) –

    A pandas DataFrame storing performance data. The following columns are required: “problem”, “platform”, “application”.

    If cov is None, a “coverage” column is required. Values of the “coverage” column must be coverage traces adhering to the P3 Analysis Library coverage schema. Otherwise, a “coverage_key” column is required.

  • cov (DataFrame, optional) –

    A pandas DataFrame storing coverage data. The following columns are required: “coverage_key”, “coverage”.

    Values of the “coverage” column must be coverage traces adhering to the P3 Analysis Library coverage schema.

Returns:

A new pandas DataFrame storing the code divergence values calculated from the configuration and coverage data provided.

Return type:

DataFrame

Raises:
  • ValueError – If any of the required columns are missing. If any coverage string fails to validate against the P3 coverage schema.

  • TypeError – If any value in the “coverage” column is not a JSON string.

p3analysis.metrics.pp(df)

Calculate performance portability from architectural and/or application efficiency.

Performance portability is calculated as proposed by Pennycook, Sewall and Lee in “A Metric for Performance Portability”. For a given set of platforms, \(H\), the performance portability \(PP\) of an application \(a\) solving problem \(p\) is:

\[PP(a, p, H) = \cases{ \dfrac{|H|}{\sum_{i \in H} \dfrac{1}{e_i(a,p)}} & $\text{if } i \text{ is supported } \forall i \in H$ \cr 0 & $\text{ otherwise }$ }\]

where \(e_i(a,p)\) is the performance efficiency of application \(a\) solving problem \(p\) on platform \(i\).

Parameters:

df (DataFrame) – A pandas DataFrame storing performance data. The following columns are always required: “problem”, “platform”, “application”. At least one of the following two columns are required: “arch eff”, “app eff”.

Returns:

A new pandas DataFrame storing the performance portability values calculated from the architectural efficiency and/or application efficiency data provided in df.

Return type:

DataFrame

Raises:
  • ValueError – If any of the required columns are missing from df. If any (application, platform) pair has multiple efficiency values, since the pp metric calculation for each application expects one efficiency value per platform.

  • TypeError – If any of the values in the efficiency column(s) are non-numeric.