.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/metrics/multiple_components.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_metrics_multiple_components.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_metrics_multiple_components.py:


Handling Software with Multiple Components
==========================================

Viewing applications as composites.

When working with very large and complex pieces of software, reporting
performance using a single number (e.g., total time-to-solution) obscures
details about the performance of different software components. Using such
totals during P3 analysis therefore prevents us from understanding how
different software components behave on different platforms.

Identifying which software components have poor P3 characteristics is necessary
to understand what action(s) we can take to improve the P3 characteristics of
a software package as a whole. Although accounting for multiple components can
make data collection and analysis slightly more complicated, the additional
insight it provides is very valuable.

.. tip::
    This approach can be readily applied to parallel software written to
    heterogeneous programming frameworks (e.g., CUDA, OpenCL, SYCL, Kokkos),
    where distinct "kernel"s can be identified and profiled easily. For a
    real-life example of this approach in practice, see "`A
    Performance-Portable SYCL Implementation of CRK-HACC for Exascale
    <https://dl.acm.org/doi/10.1145/3624062.3624187>`_.

Data Preparation
----------------

To keep things simple, let's imagine that our software package consists of just
two components, and that each component has two different implementations that
can both be run on two different machines:

 .. list-table::
     :widths: 20 20 20 20
     :header-rows: 1

     * - component
       - implementation
       - machine
       - fom

     * - Component 1
       - Implementation 1
       - Cluster 1
       - 2.0

     * - Component 2
       - Implementation 1
       - Cluster 1
       - 5.0

     * - Component 1
       - Implementation 2
       - Cluster 1
       - 3.0

     * - Component 2
       - Implementation 2
       - Cluster 1
       - 4.0

     * - Component 1
       - Implementation 1
       - Cluster 2
       - 1.0

     * - Component 2
       - Implementation 1
       - Cluster 2
       - 2.5

     * - Component 1
       - Implementation 2
       - Cluster 2
       - 0.5

     * - Component 1
       - Implementation 2
       - Cluster 2
       - 3.0

Our first step is to project this data onto P3 definitions,
treating the functionality provided by each component as a
separate problem to be solved:

.. GENERATED FROM PYTHON SOURCE LINES 91-101

.. code-block:: Python


    proj = p3analysis.data.projection(
        df,
        problem=["component"],
        application=["implementation"],
        platform=["machine"],
    )
    print(proj)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

           problem       application   platform  fom
    0  Component 1  Implementation 1  Cluster 1  2.0
    1  Component 2  Implementation 1  Cluster 1  5.0
    2  Component 1  Implementation 2  Cluster 1  3.0
    3  Component 2  Implementation 2  Cluster 1  4.0
    4  Component 1  Implementation 1  Cluster 2  1.0
    5  Component 2  Implementation 1  Cluster 2  2.5
    6  Component 1  Implementation 2  Cluster 2  0.5
    7  Component 2  Implementation 2  Cluster 2  3.0


.. GENERATED FROM PYTHON SOURCE LINES 118-127

.. note::
    See ":ref:`Understanding Data Projection <understanding_projection>`" for
    more information about projection.

Application Efficiency per Component
------------------------------------

Having projected the performance data onto P3 definitions, we can now compute
the application efficiency for each component:

.. GENERATED FROM PYTHON SOURCE LINES 127-131

.. code-block:: Python


    effs = p3analysis.metrics.application_efficiency(proj)
    print(effs)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

           problem   platform       application  fom   app eff
    0  Component 1  Cluster 1  Implementation 1  2.0  1.000000
    1  Component 2  Cluster 1  Implementation 1  5.0  0.800000
    2  Component 1  Cluster 1  Implementation 2  3.0  0.666667
    3  Component 2  Cluster 1  Implementation 2  4.0  1.000000
    4  Component 1  Cluster 2  Implementation 1  1.0  0.500000
    5  Component 2  Cluster 2  Implementation 1  2.5  1.000000
    6  Component 1  Cluster 2  Implementation 2  0.5  1.000000
    7  Component 2  Cluster 2  Implementation 2  3.0  0.833333


.. GENERATED FROM PYTHON SOURCE LINES 132-139

.. note::
    See ":ref:`Working with Application Efficiency
    <working_with_app_efficiency>`" for more information about application
    efficiency.

Plotting a graph for each platform separately is a good way to visualize and
compare the application efficiency of each component:

.. GENERATED FROM PYTHON SOURCE LINES 139-160

.. code-block:: Python


    cluster1 = effs[effs["platform"] == "Cluster 1"]
    pivot = cluster1.pivot(index="application", columns=["problem"])["app eff"]
    pivot.plot(
        kind="bar",
        xlabel="Component",
        ylabel="Application Efficiency",
        title="Cluster 1",
    )
    plt.savefig("cluster1_application_efficiency_bars.png")

    cluster2 = effs[effs["platform"] == "Cluster 2"]
    pivot = cluster2.pivot(index="application", columns=["problem"])["app eff"]
    pivot.plot(
        kind="bar",
        xlabel="Component",
        ylabel="Application Efficiency",
        title="Cluster 2",
    )
    plt.savefig("cluster2_application_efficiency_bars.png")


.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /examples/metrics/images/sphx_glr_multiple_components_001.png
         :alt: Cluster 1
         :srcset: /examples/metrics/images/sphx_glr_multiple_components_001.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /examples/metrics/images/sphx_glr_multiple_components_002.png
         :alt: Cluster 2
         :srcset: /examples/metrics/images/sphx_glr_multiple_components_002.png
         :class: sphx-glr-multi-img


.. GENERATED FROM PYTHON SOURCE LINES 161-174

On Cluster 1, Implementation 1 delivers the best performance for Component 1,
but Implementation 2 delivers the best performance for Component 2. On
Cluster 2, that trend is reversed. Clearly, there is no single implementation
that delivers the best performance everywhere.

Overall Application Efficiency
------------------------------

Computing the application efficiency of the software package as a whole
requires a few more steps.

First, we need to compute the total time taken by each application on each
platform:

.. GENERATED FROM PYTHON SOURCE LINES 174-179

.. code-block:: Python


    package = proj.groupby(["platform", "application"], as_index=False)["fom"].sum()
    package["problem"] = "Package"
    print(package)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

        platform       application  fom  problem
    0  Cluster 1  Implementation 1  7.0  Package
    1  Cluster 1  Implementation 2  7.0  Package
    2  Cluster 2  Implementation 1  3.5  Package
    3  Cluster 2  Implementation 2  3.5  Package


.. GENERATED FROM PYTHON SOURCE LINES 180-181

Then, we can use this data to compute application efficiency, as below:

.. GENERATED FROM PYTHON SOURCE LINES 181-185

.. code-block:: Python


    effs = p3analysis.metrics.application_efficiency(package)
    print(effs)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

       problem   platform       application  fom  app eff
    0  Package  Cluster 1  Implementation 1  7.0      1.0
    1  Package  Cluster 1  Implementation 2  7.0      1.0
    2  Package  Cluster 2  Implementation 1  3.5      1.0
    3  Package  Cluster 2  Implementation 2  3.5      1.0


.. GENERATED FROM PYTHON SOURCE LINES 186-208

These latest results suggest that both Implementation 1 and Implementation 2
are both achieving the best-known performance when running the package as a
whole. This isn't *strictly* incorrect, since the values of their combined
figure-of-merit *are* the same, but we know from our earlier per-component
analysis that it could be possible to achieve better performance results.

Specifically, our per-component analysis shows us that an application that
could pick and choose the best implementation of different components for
different platforms would achieve better overall performance.

.. important::
    Combining component implementations in this way is purely hypothetical,
    and there may be very good reasons (e.g., incompatible data structures)
    that an application is unable to use certain combinations. Although
    removing such invalid combinations would result in a tighter upper
    bound, it is much simpler to leave them in place. Including all
    combinations may even identify potential opportunities to combine
    approaches that initially appeared incompatible (e.g., by writing
    routines to convert between data structures).

We can fold that observation into our P3 analysis by creating an entry in our
dataset that represents the results from a hypothetical application:

.. GENERATED FROM PYTHON SOURCE LINES 208-215

.. code-block:: Python


    hypothetical_components = proj.groupby(["problem", "platform"], as_index=False)[
        "fom"
    ].min()
    hypothetical_components["application"] = "Hypothetical"
    print(hypothetical_components)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

           problem   platform  fom   application
    0  Component 1  Cluster 1  2.0  Hypothetical
    1  Component 1  Cluster 2  0.5  Hypothetical
    2  Component 2  Cluster 1  4.0  Hypothetical
    3  Component 2  Cluster 2  2.5  Hypothetical


.. GENERATED FROM PYTHON SOURCE LINES 216-227

.. code-block:: Python


    # Calculate the combined figure of merit for both components
    hypothetical_package = hypothetical_components.groupby(
        ["platform", "application"], as_index=False,
    )["fom"].sum()
    hypothetical_package["problem"] = "Package"

    # Append the hypothetical package data to our previous results
    package = pd.concat([package, hypothetical_package], ignore_index=True)
    print(package)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

        platform       application  fom  problem
    0  Cluster 1  Implementation 1  7.0  Package
    1  Cluster 1  Implementation 2  7.0  Package
    2  Cluster 2  Implementation 1  3.5  Package
    3  Cluster 2  Implementation 2  3.5  Package
    4  Cluster 1      Hypothetical  6.0  Package
    5  Cluster 2      Hypothetical  3.0  Package


.. GENERATED FROM PYTHON SOURCE LINES 228-231

As expected, our new hypothetical application achieves better performance
by mixing and matching different implementations. And if we now re-compute
application efficiency with this data included:

.. GENERATED FROM PYTHON SOURCE LINES 231-235

.. code-block:: Python


    effs = p3analysis.metrics.application_efficiency(package)
    print(effs)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

       problem   platform       application  fom   app eff
    0  Package  Cluster 1  Implementation 1  7.0  0.857143
    1  Package  Cluster 1  Implementation 2  7.0  0.857143
    2  Package  Cluster 2  Implementation 1  3.5  0.857143
    3  Package  Cluster 2  Implementation 2  3.5  0.857143
    4  Package  Cluster 1      Hypothetical  6.0  1.000000
    5  Package  Cluster 2      Hypothetical  3.0  1.000000


.. GENERATED FROM PYTHON SOURCE LINES 236-248

... we see that the application efficiency of Implementation 1 and
Implementation 2 has been reduced accordingly. Including hypothetical
upper-bounds of performance in our dataset can therefore be a simple and
effective way to improve the accuracy of our P3 analysis, even if a
true theoretical upper-bound (i.e., from a performance model) is unknown.

.. note::
    The two implementations still have the *same* efficiency, even after
    introducing the hypothetical implementation. Per-component analysis is
    still required to understand how each component contributes to the
    overall efficiency, and to identify which component(s) should be improved
    on which platform(s).

.. GENERATED FROM PYTHON SOURCE LINES 250-267

Further Analysis
----------------

Computing application efficiency is often simply the first step of a
more detailed P3 analysis.

The examples below show how we can use the visualization capabilities
of the P3 Analysis Library to compare the efficiency of different
applications running across the same platform set, or to gain insight
into how an application's efficiency relates to the code it uses on each
platform.

.. minigallery::
    :add-heading: Examples

    ../../examples/cascade/plot_simple_cascade.py
    ../../examples/navchart/plot_simple_navchart.py


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.244 seconds)


.. _sphx_glr_download_examples_metrics_multiple_components.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: multiple_components.ipynb <multiple_components.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: multiple_components.py <multiple_components.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: multiple_components.zip <multiple_components.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_