Implementation design for “device_if” and “device_architecture”¶
This document describes the design for the DPC++ implementation of the sycl_ext_oneapi_device_if and sycl_ext_oneapi_device_architecture extensions.
Phased implementation¶
The implementation is divided into two phases. In the first phase, we support only sycl_ext_oneapi_device_architecture and it is supported only in AOT mode. The second phase adds support for both extensions in both AOT and JIT modes.
Changes to compiler driver¶
Both phases require changes to the -fsycl-targets option that is recognized
by the compiler driver. The problem is that the current form of that option
does not identify a specific device name. As a reminder, the current command
line for AOT compilation on GPU looks like this:
$ clang++ -fsycl -fsycl-targets=spir64_gen -Xs "-device skl" ...
Notice that -fsycl-targets option specifies only the generic name
spir64_gen whereas the device name is passed directly to ocloc (the Intel
GPU AOT compiler) via -Xs "-device skl". Since the compiler driver merely
passes the -Xs options directly to the underlying ocloc without
understanding them, the compiler driver does not currently know the target
device(s) of the AOT compilation.
To fix this, the -fsycl-targets option should be changed to accept the
following GPU device names in addition to the target names it currently
recognizes:
intel_gpu_bdwintel_gpu_sklintel_gpu_kblintel_gpu_cflintel_gpu_aplintel_gpu_glkintel_gpu_whlintel_gpu_amlintel_gpu_cmlintel_gpu_icllpintel_gpu_tgllpintel_gpu_rklintel_gpu_adl_sintel_gpu_rpl_sintel_gpu_adl_pintel_gpu_adl_nintel_gpu_dg1intel_gpu_acm_g10intel_gpu_acm_g11intel_gpu_acm_g12intel_gpu_pvcintel_gpu_8_0_0(alias forintel_gpu_bdw)intel_gpu_9_0_9(alias forintel_gpu_skl)intel_gpu_9_1_9(alias forintel_gpu_kbl)intel_gpu_9_2_9(alias forintel_gpu_cfl)intel_gpu_9_3_0(alias forintel_gpu_apl)intel_gpu_9_4_0(alias forintel_gpu_glk)intel_gpu_9_5_0(alias forintel_gpu_whl)intel_gpu_9_6_0(alias forintel_gpu_aml)intel_gpu_9_7_0(alias forintel_gpu_cml)intel_gpu_11_0_0(alias forintel_gpu_icllp)intel_gpu_12_0_0(alias forintel_gpu_tgllp)intel_gpu_12_10_0(alias forintel_gpu_dg1)nvidia_gpu_sm_50nvidia_gpu_sm_52nvidia_gpu_sm_53nvidia_gpu_sm_60nvidia_gpu_sm_61nvidia_gpu_sm_62nvidia_gpu_sm_70nvidia_gpu_sm_72nvidia_gpu_sm_75nvidia_gpu_sm_80nvidia_gpu_sm_86nvidia_gpu_sm_87nvidia_gpu_sm_89nvidia_gpu_sm_90amd_gpu_gfx700amd_gpu_gfx701amd_gpu_gfx702amd_gpu_gfx703amd_gpu_gfx704amd_gpu_gfx705amd_gpu_gfx801amd_gpu_gfx802amd_gpu_gfx803amd_gpu_gfx805amd_gpu_gfx810amd_gpu_gfx900amd_gpu_gfx902amd_gpu_gfx904amd_gpu_gfx906amd_gpu_gfx908amd_gpu_gfx909amd_gpu_gfx90aamd_gpu_gfx90camd_gpu_gfx940amd_gpu_gfx941amd_gpu_gfx942amd_gpu_gfx1010amd_gpu_gfx1011amd_gpu_gfx1012amd_gpu_gfx1013amd_gpu_gfx1030amd_gpu_gfx1031amd_gpu_gfx1032amd_gpu_gfx1033amd_gpu_gfx1034amd_gpu_gfx1035amd_gpu_gfx1036amd_gpu_gfx1100amd_gpu_gfx1101amd_gpu_gfx1102amd_gpu_gfx1103amd_gpu_gfx1150amd_gpu_gfx1151amd_gpu_gfx1200amd_gpu_gfx1201
The above listed device names may not be mixed with the existing target name
spir64_gen on the same command line. In addition, the user must not pass the
-device option to ocloc via -Xs or related command line options because
the compiler driver will pass this option to ocloc automatically.
Note that in the first phase of implementation, only one of the above listed GPU device names may appear on the command line. As a result, the first phase of implementation supports AOT compilation in this new mode only for a single GPU device.
Phase 1¶
The first phase requires changes only to the compiler driver and to the device headers.
Compiler driver macro predefines¶
Most of the changes to the compiler driver are described above, but there are
a few small additional changes that are specific to phase 1. If the user
invokes the compiler driver with -fsycl-targets=<device> where <device> is
one of the GPU device names listed above, the compiler driver must predefine
one of the following corresponding C++ macro names:
__SYCL_TARGET_INTEL_GPU_BDW____SYCL_TARGET_INTEL_GPU_SKL____SYCL_TARGET_INTEL_GPU_KBL____SYCL_TARGET_INTEL_GPU_CFL____SYCL_TARGET_INTEL_GPU_APL____SYCL_TARGET_INTEL_GPU_GLK____SYCL_TARGET_INTEL_GPU_WHL____SYCL_TARGET_INTEL_GPU_AML____SYCL_TARGET_INTEL_GPU_CML____SYCL_TARGET_INTEL_GPU_ICLLP____SYCL_TARGET_INTEL_GPU_TGLLP____SYCL_TARGET_INTEL_GPU_RKL____SYCL_TARGET_INTEL_GPU_ADL_S____SYCL_TARGET_INTEL_GPU_RPL_S____SYCL_TARGET_INTEL_GPU_ADL_P____SYCL_TARGET_INTEL_GPU_ADL_N____SYCL_TARGET_INTEL_GPU_DG1____SYCL_TARGET_INTEL_GPU_ACM_G10____SYCL_TARGET_INTEL_GPU_ACM_G11____SYCL_TARGET_INTEL_GPU_ACM_G12____SYCL_TARGET_INTEL_GPU_PVC____SYCL_TARGET_NVIDIA_GPU_SM_50____SYCL_TARGET_NVIDIA_GPU_SM_52____SYCL_TARGET_NVIDIA_GPU_SM_53____SYCL_TARGET_NVIDIA_GPU_SM_60____SYCL_TARGET_NVIDIA_GPU_SM_61____SYCL_TARGET_NVIDIA_GPU_SM_62____SYCL_TARGET_NVIDIA_GPU_SM_70____SYCL_TARGET_NVIDIA_GPU_SM_72____SYCL_TARGET_NVIDIA_GPU_SM_75____SYCL_TARGET_NVIDIA_GPU_SM_80____SYCL_TARGET_NVIDIA_GPU_SM_86____SYCL_TARGET_NVIDIA_GPU_SM_87____SYCL_TARGET_NVIDIA_GPU_SM_89____SYCL_TARGET_NVIDIA_GPU_SM_90____SYCL_TARGET_NVIDIA_GPU_SM_90A____SYCL_TARGET_AMD_GPU_GFX700____SYCL_TARGET_AMD_GPU_GFX701____SYCL_TARGET_AMD_GPU_GFX702____SYCL_TARGET_AMD_GPU_GFX703____SYCL_TARGET_AMD_GPU_GFX704____SYCL_TARGET_AMD_GPU_GFX705____SYCL_TARGET_AMD_GPU_GFX801____SYCL_TARGET_AMD_GPU_GFX802____SYCL_TARGET_AMD_GPU_GFX803____SYCL_TARGET_AMD_GPU_GFX805____SYCL_TARGET_AMD_GPU_GFX810____SYCL_TARGET_AMD_GPU_GFX900____SYCL_TARGET_AMD_GPU_GFX902____SYCL_TARGET_AMD_GPU_GFX904____SYCL_TARGET_AMD_GPU_GFX906____SYCL_TARGET_AMD_GPU_GFX908____SYCL_TARGET_AMD_GPU_GFX909____SYCL_TARGET_AMD_GPU_GFX90A____SYCL_TARGET_AMD_GPU_GFX90C____SYCL_TARGET_AMD_GPU_GFX940____SYCL_TARGET_AMD_GPU_GFX941____SYCL_TARGET_AMD_GPU_GFX942____SYCL_TARGET_AMD_GPU_GFX1010____SYCL_TARGET_AMD_GPU_GFX1011____SYCL_TARGET_AMD_GPU_GFX1012____SYCL_TARGET_AMD_GPU_GFX1013____SYCL_TARGET_AMD_GPU_GFX1030____SYCL_TARGET_AMD_GPU_GFX1031____SYCL_TARGET_AMD_GPU_GFX1032____SYCL_TARGET_AMD_GPU_GFX1033____SYCL_TARGET_AMD_GPU_GFX1034____SYCL_TARGET_AMD_GPU_GFX1035____SYCL_TARGET_AMD_GPU_GFX1036____SYCL_TARGET_AMD_GPU_GFX1100____SYCL_TARGET_AMD_GPU_GFX1101____SYCL_TARGET_AMD_GPU_GFX1102____SYCL_TARGET_AMD_GPU_GFX1103____SYCL_TARGET_AMD_GPU_GFX1150____SYCL_TARGET_AMD_GPU_GFX1151____SYCL_TARGET_AMD_GPU_GFX1200____SYCL_TARGET_AMD_GPU_GFX1201__
If the user invokes the compiler driver with -fsycl-targets=spir64_x86_64,
the compiler driver must predefine the following C++ macro name:
__SYCL_TARGET_INTEL_X86_64__
These macros are an internal implementation detail, so they should not be documented to users, and user code should not make use of them.
Changes to the device headers¶
The device headers implement the sycl_ext_oneapi_device_architecture
extension using these predefined macros and leverage if constexpr to discard
statements in the “if” or “else” body when the device does not match one of the
listed architectures. The following code snippet illustrates the technique:
namespace sycl {
namespace ext::oneapi::experimental {
enum class architecture {
x86_64,
intel_gpu_bdw,
intel_gpu_skl,
intel_gpu_kbl
// ...
};
} // namespace ext::oneapi::experimental
namespace detail {
#ifndef __SYCL_TARGET_INTEL_X86_64__
#define __SYCL_TARGET_INTEL_X86_64__ 0
#endif
#ifndef __SYCL_TARGET_INTEL_GPU_BDW__
#define __SYCL_TARGET_INTEL_GPU_BDW__ 0
#endif
#ifndef __SYCL_TARGET_INTEL_GPU_SKL__
#define __SYCL_TARGET_INTEL_GPU_SKL__ 0
#endif
#ifndef __SYCL_TARGET_INTEL_GPU_KBL__
#define __SYCL_TARGET_INTEL_GPU_KBL__ 0
#endif
// ...
// This is true when the translation unit is compiled in AOT mode with target
// names that supports the "if_architecture_is" features. If an unsupported
// target name is specified via "-fsycl-targets", the associated invocation of
// the device compiler will set this variable to false, and that will trigger
// an error for code that uses "if_architecture_is".
static constexpr bool is_allowable_aot_mode =
(__SYCL_TARGET_INTEL_X86_64__ == 1) ||
(__SYCL_TARGET_INTEL_GPU_BDW__ == 1) ||
(__SYCL_TARGET_INTEL_GPU_SKL__ == 1) ||
(__SYCL_TARGET_INTEL_GPU_KBL__ == 1)
// ...
;
// One entry for each enumerator in "architecture" telling whether the AOT
// target matches that architecture.
static constexpr bool is_aot_for_architecture[] = {
(__SYCL_TARGET_INTEL_X86_64__ == 1),
(__SYCL_TARGET_INTEL_GPU_BDW__ == 1),
(__SYCL_TARGET_INTEL_GPU_SKL__ == 1),
(__SYCL_TARGET_INTEL_GPU_KBL__ == 1)
// ...
};
// Read the value of "is_allowable_aot_mode" via a template to defer triggering
// static_assert() until template instantiation time.
template<ext::oneapi::experimental::architecture... Archs>
constexpr static bool allowable_aot_mode() {
return is_allowable_aot_mode;
}
// Tells if the current device has one of the architectures in the parameter
// pack.
template<ext::oneapi::experimental::architecture... Archs>
constexpr static bool device_architecture_is() {
return (is_aot_for_architecture[static_cast<int>(Archs)] || ...);
}
// Helper object used to implement "else_if_architecture_is" and "otherwise".
// The "MakeCall" template parameter tells whether a previous clause in the
// "if-elseif-elseif ..." chain was true. When "MakeCall" is false, some
// previous clause was true, so none of the subsequent
// "else_if_architecture_is" or "otherwise" member functions should call the
// user's function.
template<bool MakeCall>
class if_architecture_helper {
public:
template<ext::oneapi::experimental::architecture ...Archs, typename T>
constexpr auto else_if_architecture_is(T fnTrue) {
if constexpr (MakeCall && device_architecture_is<Archs...>()) {
fnTrue();
return if_architecture_helper<false>{};
} else {
return if_architecture_helper<MakeCall>{};
}
}
template<typename T>
constexpr void otherwise(T fn) {
if constexpr (MakeCall) {
fn();
}
}
};
} // namespace detail
namespace ext::oneapi::experimental {
template<architecture ...Archs, typename T>
constexpr static auto if_architecture_is(T fnTrue) {
static_assert(detail::allowable_aot_mode<Archs...>(),
"The if_architecture_is function may only be used when AOT "
"compiling with '-fsycl-targets=spir64_x86_64' or "
"'-fsycl-targets=intel_gpu_*'");
if constexpr (detail::device_architecture_is<Archs...>()) {
fnTrue();
return detail::if_architecture_helper<false>{};
} else {
return detail::if_architecture_helper<true>{};
}
}
} // namespace ext::oneapi::experimental
} // namespace sycl
Analysis of error checking for unsupported AOT modes¶
The header file code presented above triggers a static_assert if the
if_architecture_is function is used in a translation unit that is compiled
for an unsupported target. The supported targets are spir64_x86_64,
the new intel_gpu_*, nvidia_gpu_* and amd_gpu_* GPU device names.
The error checking relies on the fact that the device compiler is invoked
separately for each target listed in -fsycl-target. If any target is
unsupported, the associated device compilation will compute
is_allowable_aot_mode as false, and this will trigger the static_assert
in that compilation phase.
Phase 2¶
TBD.