Propagation of optimization levels used by front-end compiler to backend

In order to ease the process of debugging, there is a user requirement to compile different modules with different levels of optimization. This document proposes a compiler flow that will enable propagation of compiler options specified from front-end to the runtimes and eventually to the backend. Currently, only O0/O1/O2/O3 options are handled. Please note that this document only describes support for JIT path. AOT path support will be added later.

Background

When building an application with several source and object files, it is possible to specify the optimization parameters individually for each source file/object file (for each invocation of the DPCPP compiler). The SYCL runtime should pass the original optimization options (e.g. -O0 or -O2) used when building an object file to the device backend compiler. This will improve the debugging experience by selectively disabling/enabling optimizations for each source file, and therefore achieving better debuggability and better performance as needed.

The current behavior is that the optimization level option is captured at link time and converted into its backend-specific equivalent. This option is propagated to the backend. For example, If -O0 option is specified during link-time when using the OpenCL backend, the SYCL runtime will pass -cl-opt-disable option to the backend device compiler for all modules essentially disabling optimizations globally. Otherwise, if the -O0 option is not specified for linker, it will not pass -cl-opt-disable option at all, therefore making the kernels mostly undebuggable, regardless of the original front-end compiler options. Link-time capturing of optimization option is the essence of the current implementation and this leads to loss of information about the compile-time options. Proposed design aims to rectify this behavior.

Here is an example that demonstrates this pain point:

clang++ -c test_host.cpp -o test_host.o
clang++ -c -fsycl test_device_1.cpp -o test_device_1.o
clang++ -c -fsycl -g -O0 test_device_2.cpp -o test_device_2.o
clang++ -fsycl -g test_host.o test_device_1.o test_device_2.o -o test

In this scenario, the fat binary is ‘test’ and there are no compilation flags sent across to the backend compiler. Though the user wanted to have full debuggability with test_device_2.cpp module, some of the debuggability is lost.

Another scenario is shown below:

clang++ -c -g -O0 -fsycl test.cpp -o test.o
clang++ -g -fsycl test.o -o test

In this scenario, the fat binary is ‘test’ and there are no compilation flags sent across to the backend compiler. Though the user wanted to have full debuggability with test.cpp module, some of the debuggability is lost. The user was not able to set a breakpoint inside device code.

Requirements

In order to support module-level debuggability, the user will compile different module files with different levels of optimization. These optimization levels must be preserved and made use of during the backend compilation. The following is a key requirement for this feature.

  • If the user specifies -Ox as a front-end compile option for a particular module, this option must be converted to appropriate backend option and then propagated fo use during backend JIT compilation.

The following table specifies the appropriate backend options for level-zero and OpenCL backends.

Front-end option

L0 backend option

OpenCL backend option

-O0

-ze-opt-disable

-cl-opt-disable

-O1

-ze-opt-level=2

/* no option */

-O2

-ze-opt-level=2

/* no option */

-O3

-ze-opt-level=2

/* no option */

Proposed design

This chapter discusses changes required in various stages of the compilation pipeline.

Changes to the clang front-end

For each function in SYCL device code, we add a new function attribute that is named sycl-optlevel. Value of this attribute is set to the optimization level which was used to compile the overlying module.

Changes to the sycl-post-link tool

During device code split performed in the sycl-post-link tool, optimization level attribute sycl-optlevel is treated as an optional feature, i.e. device code split algorithm ensures that no kernels with different values of sycl-optlevel are bundled into the same device image. See also optional kernel features design document. The sycl-post-link tool also adds a new property into the SYCL/misc properties property set for each device code module. This entry will be used to store the optimization level. Name of this property is optLevel and the value is stored as a 32-bit integer. If there is a module where the user did not specify an optimization module, there is no new entry in the property set.

Changes to the SYCL runtime

In the SYCL runtime, the device image properties can be accessed to extract the associated optimization level. Once the optimization level is available, it is converted to its equivalent frontend option string (-O0, -O1, -O2, or -O3). This frontend option string is passed into a query that is made to the adapter to identify the correct backend option. This backend option is added to the existing list of compiler options and is sent to the backend.

Changes to the adapter

A new unified runtime API has been added. It takes the frontend option string as input in string format and returns ur_result_t. A string format is used for sending the frontend option so that this API can be used for querying other frontend options as well. The signature of this API is as follows:

ur_result_t urPlatformGetBackendOption(ur_platform_handle_t hPlatform,
                                       const char *pFrontendOption,
                                       const char **ppPlatformOption);

In the level-zero and OpenCL adapters, the table provided in the ‘Requirements’ section is used as a guide to identify the appropriate backend option. The option is returned in ppPlatformOption. For other adapters (HIP, cuda), empty string is returned. This API returns UR_RESULT_SUCCESS for valid inputs (frontend_option != “”). For invalid inputs, it returns UR_RESULT_ERROR_INVALID_VALUE.