Implementation design for “device_if” and “device_architecture”

This document describes the design for the DPC++ implementation of the sycl_ext_oneapi_device_if and sycl_ext_oneapi_device_architecture extensions.

Phased implementation

The implementation is divided into two phases. In the first phase, we support only sycl_ext_oneapi_device_architecture and it is supported only in AOT mode. The second phase adds support for both extensions in both AOT and JIT modes.

Changes to compiler driver

Both phases require changes to the -fsycl-targets option that is recognized by the compiler driver. The problem is that the current form of that option does not identify a specific device name. As a reminder, the current command line for AOT compilation on GPU looks like this:

$ clang++ -fsycl -fsycl-targets=spir64_gen -Xs "-device skl" ...

Notice that -fsycl-targets option specifies only the generic name spir64_gen whereas the device name is passed directly to ocloc (the Intel GPU AOT compiler) via -Xs "-device skl". Since the compiler driver merely passes the -Xs options directly to the underlying ocloc without understanding them, the compiler driver does not currently know the target device(s) of the AOT compilation.

To fix this, the -fsycl-targets option should be changed to accept the following GPU device names in addition to the target names it currently recognizes:

  • intel_gpu_bdw

  • intel_gpu_skl

  • intel_gpu_kbl

  • intel_gpu_cfl

  • intel_gpu_apl

  • intel_gpu_glk

  • intel_gpu_whl

  • intel_gpu_aml

  • intel_gpu_cml

  • intel_gpu_icllp

  • intel_gpu_tgllp

  • intel_gpu_rkl

  • intel_gpu_adl_s

  • intel_gpu_rpl_s

  • intel_gpu_adl_p

  • intel_gpu_adl_n

  • intel_gpu_dg1

  • intel_gpu_acm_g10

  • intel_gpu_acm_g11

  • intel_gpu_acm_g12

  • intel_gpu_pvc

  • intel_gpu_8_0_0 (alias for intel_gpu_bdw)

  • intel_gpu_9_0_9 (alias for intel_gpu_skl)

  • intel_gpu_9_1_9 (alias for intel_gpu_kbl)

  • intel_gpu_9_2_9 (alias for intel_gpu_cfl)

  • intel_gpu_9_3_0 (alias for intel_gpu_apl)

  • intel_gpu_9_4_0 (alias for intel_gpu_glk)

  • intel_gpu_9_5_0 (alias for intel_gpu_whl)

  • intel_gpu_9_6_0 (alias for intel_gpu_aml)

  • intel_gpu_9_7_0 (alias for intel_gpu_cml)

  • intel_gpu_11_0_0 (alias for intel_gpu_icllp)

  • intel_gpu_12_0_0 (alias for intel_gpu_tgllp)

  • intel_gpu_12_10_0 (alias for intel_gpu_dg1)

  • nvidia_gpu_sm_50

  • nvidia_gpu_sm_52

  • nvidia_gpu_sm_53

  • nvidia_gpu_sm_60

  • nvidia_gpu_sm_61

  • nvidia_gpu_sm_62

  • nvidia_gpu_sm_70

  • nvidia_gpu_sm_72

  • nvidia_gpu_sm_75

  • nvidia_gpu_sm_80

  • nvidia_gpu_sm_86

  • nvidia_gpu_sm_87

  • nvidia_gpu_sm_89

  • nvidia_gpu_sm_90

  • amd_gpu_gfx700

  • amd_gpu_gfx701

  • amd_gpu_gfx702

  • amd_gpu_gfx703

  • amd_gpu_gfx704

  • amd_gpu_gfx705

  • amd_gpu_gfx801

  • amd_gpu_gfx802

  • amd_gpu_gfx803

  • amd_gpu_gfx805

  • amd_gpu_gfx810

  • amd_gpu_gfx900

  • amd_gpu_gfx902

  • amd_gpu_gfx904

  • amd_gpu_gfx906

  • amd_gpu_gfx908

  • amd_gpu_gfx909

  • amd_gpu_gfx90a

  • amd_gpu_gfx90c

  • amd_gpu_gfx940

  • amd_gpu_gfx941

  • amd_gpu_gfx942

  • amd_gpu_gfx1010

  • amd_gpu_gfx1011

  • amd_gpu_gfx1012

  • amd_gpu_gfx1013

  • amd_gpu_gfx1030

  • amd_gpu_gfx1031

  • amd_gpu_gfx1032

  • amd_gpu_gfx1033

  • amd_gpu_gfx1034

  • amd_gpu_gfx1035

  • amd_gpu_gfx1036

  • amd_gpu_gfx1100

  • amd_gpu_gfx1101

  • amd_gpu_gfx1102

  • amd_gpu_gfx1103

  • amd_gpu_gfx1150

  • amd_gpu_gfx1151

  • amd_gpu_gfx1200

  • amd_gpu_gfx1201

The above listed device names may not be mixed with the existing target name spir64_gen on the same command line. In addition, the user must not pass the -device option to ocloc via -Xs or related command line options because the compiler driver will pass this option to ocloc automatically.

Note that in the first phase of implementation, only one of the above listed GPU device names may appear on the command line. As a result, the first phase of implementation supports AOT compilation in this new mode only for a single GPU device.

Phase 1

The first phase requires changes only to the compiler driver and to the device headers.

Compiler driver macro predefines

Most of the changes to the compiler driver are described above, but there are a few small additional changes that are specific to phase 1. If the user invokes the compiler driver with -fsycl-targets=<device> where <device> is one of the GPU device names listed above, the compiler driver must predefine one of the following corresponding C++ macro names:

  • __SYCL_TARGET_INTEL_GPU_BDW__

  • __SYCL_TARGET_INTEL_GPU_SKL__

  • __SYCL_TARGET_INTEL_GPU_KBL__

  • __SYCL_TARGET_INTEL_GPU_CFL__

  • __SYCL_TARGET_INTEL_GPU_APL__

  • __SYCL_TARGET_INTEL_GPU_GLK__

  • __SYCL_TARGET_INTEL_GPU_WHL__

  • __SYCL_TARGET_INTEL_GPU_AML__

  • __SYCL_TARGET_INTEL_GPU_CML__

  • __SYCL_TARGET_INTEL_GPU_ICLLP__

  • __SYCL_TARGET_INTEL_GPU_TGLLP__

  • __SYCL_TARGET_INTEL_GPU_RKL__

  • __SYCL_TARGET_INTEL_GPU_ADL_S__

  • __SYCL_TARGET_INTEL_GPU_RPL_S__

  • __SYCL_TARGET_INTEL_GPU_ADL_P__

  • __SYCL_TARGET_INTEL_GPU_ADL_N__

  • __SYCL_TARGET_INTEL_GPU_DG1__

  • __SYCL_TARGET_INTEL_GPU_ACM_G10__

  • __SYCL_TARGET_INTEL_GPU_ACM_G11__

  • __SYCL_TARGET_INTEL_GPU_ACM_G12__

  • __SYCL_TARGET_INTEL_GPU_PVC__

  • __SYCL_TARGET_NVIDIA_GPU_SM_50__

  • __SYCL_TARGET_NVIDIA_GPU_SM_52__

  • __SYCL_TARGET_NVIDIA_GPU_SM_53__

  • __SYCL_TARGET_NVIDIA_GPU_SM_60__

  • __SYCL_TARGET_NVIDIA_GPU_SM_61__

  • __SYCL_TARGET_NVIDIA_GPU_SM_62__

  • __SYCL_TARGET_NVIDIA_GPU_SM_70__

  • __SYCL_TARGET_NVIDIA_GPU_SM_72__

  • __SYCL_TARGET_NVIDIA_GPU_SM_75__

  • __SYCL_TARGET_NVIDIA_GPU_SM_80__

  • __SYCL_TARGET_NVIDIA_GPU_SM_86__

  • __SYCL_TARGET_NVIDIA_GPU_SM_87__

  • __SYCL_TARGET_NVIDIA_GPU_SM_89__

  • __SYCL_TARGET_NVIDIA_GPU_SM_90__

  • __SYCL_TARGET_NVIDIA_GPU_SM_90A__

  • __SYCL_TARGET_AMD_GPU_GFX700__

  • __SYCL_TARGET_AMD_GPU_GFX701__

  • __SYCL_TARGET_AMD_GPU_GFX702__

  • __SYCL_TARGET_AMD_GPU_GFX703__

  • __SYCL_TARGET_AMD_GPU_GFX704__

  • __SYCL_TARGET_AMD_GPU_GFX705__

  • __SYCL_TARGET_AMD_GPU_GFX801__

  • __SYCL_TARGET_AMD_GPU_GFX802__

  • __SYCL_TARGET_AMD_GPU_GFX803__

  • __SYCL_TARGET_AMD_GPU_GFX805__

  • __SYCL_TARGET_AMD_GPU_GFX810__

  • __SYCL_TARGET_AMD_GPU_GFX900__

  • __SYCL_TARGET_AMD_GPU_GFX902__

  • __SYCL_TARGET_AMD_GPU_GFX904__

  • __SYCL_TARGET_AMD_GPU_GFX906__

  • __SYCL_TARGET_AMD_GPU_GFX908__

  • __SYCL_TARGET_AMD_GPU_GFX909__

  • __SYCL_TARGET_AMD_GPU_GFX90A__

  • __SYCL_TARGET_AMD_GPU_GFX90C__

  • __SYCL_TARGET_AMD_GPU_GFX940__

  • __SYCL_TARGET_AMD_GPU_GFX941__

  • __SYCL_TARGET_AMD_GPU_GFX942__

  • __SYCL_TARGET_AMD_GPU_GFX1010__

  • __SYCL_TARGET_AMD_GPU_GFX1011__

  • __SYCL_TARGET_AMD_GPU_GFX1012__

  • __SYCL_TARGET_AMD_GPU_GFX1013__

  • __SYCL_TARGET_AMD_GPU_GFX1030__

  • __SYCL_TARGET_AMD_GPU_GFX1031__

  • __SYCL_TARGET_AMD_GPU_GFX1032__

  • __SYCL_TARGET_AMD_GPU_GFX1033__

  • __SYCL_TARGET_AMD_GPU_GFX1034__

  • __SYCL_TARGET_AMD_GPU_GFX1035__

  • __SYCL_TARGET_AMD_GPU_GFX1036__

  • __SYCL_TARGET_AMD_GPU_GFX1100__

  • __SYCL_TARGET_AMD_GPU_GFX1101__

  • __SYCL_TARGET_AMD_GPU_GFX1102__

  • __SYCL_TARGET_AMD_GPU_GFX1103__

  • __SYCL_TARGET_AMD_GPU_GFX1150__

  • __SYCL_TARGET_AMD_GPU_GFX1151__

  • __SYCL_TARGET_AMD_GPU_GFX1200__

  • __SYCL_TARGET_AMD_GPU_GFX1201__

If the user invokes the compiler driver with -fsycl-targets=spir64_x86_64, the compiler driver must predefine the following C++ macro name:

  • __SYCL_TARGET_INTEL_X86_64__

These macros are an internal implementation detail, so they should not be documented to users, and user code should not make use of them.

Changes to the device headers

The device headers implement the sycl_ext_oneapi_device_architecture extension using these predefined macros and leverage if constexpr to discard statements in the “if” or “else” body when the device does not match one of the listed architectures. The following code snippet illustrates the technique:

namespace sycl {
namespace ext::oneapi::experimental {

enum class architecture {
  x86_64,
  intel_gpu_bdw,
  intel_gpu_skl,
  intel_gpu_kbl
  // ...
};

} // namespace ext::oneapi::experimental

namespace detail {

#ifndef __SYCL_TARGET_INTEL_X86_64__
#define __SYCL_TARGET_INTEL_X86_64__ 0
#endif
#ifndef __SYCL_TARGET_INTEL_GPU_BDW__
#define __SYCL_TARGET_INTEL_GPU_BDW__ 0
#endif
#ifndef __SYCL_TARGET_INTEL_GPU_SKL__
#define __SYCL_TARGET_INTEL_GPU_SKL__ 0
#endif
#ifndef __SYCL_TARGET_INTEL_GPU_KBL__
#define __SYCL_TARGET_INTEL_GPU_KBL__ 0
#endif
// ...

// This is true when the translation unit is compiled in AOT mode with target
// names that supports the "if_architecture_is" features.  If an unsupported
// target name is specified via "-fsycl-targets", the associated invocation of
// the device compiler will set this variable to false, and that will trigger
// an error for code that uses "if_architecture_is".
static constexpr bool is_allowable_aot_mode =
  (__SYCL_TARGET_INTEL_X86_64__ == 1) ||
  (__SYCL_TARGET_INTEL_GPU_BDW__ == 1) ||
  (__SYCL_TARGET_INTEL_GPU_SKL__ == 1) ||
  (__SYCL_TARGET_INTEL_GPU_KBL__ == 1)
  // ...
  ;

// One entry for each enumerator in "architecture" telling whether the AOT
// target matches that architecture.
static constexpr bool is_aot_for_architecture[] = {
  (__SYCL_TARGET_INTEL_X86_64__ == 1),
  (__SYCL_TARGET_INTEL_GPU_BDW__ == 1),
  (__SYCL_TARGET_INTEL_GPU_SKL__ == 1),
  (__SYCL_TARGET_INTEL_GPU_KBL__ == 1)
  // ...
};

// Read the value of "is_allowable_aot_mode" via a template to defer triggering
// static_assert() until template instantiation time.
template<ext::oneapi::experimental::architecture... Archs>
constexpr static bool allowable_aot_mode() {
  return is_allowable_aot_mode;
}

// Tells if the current device has one of the architectures in the parameter
// pack.
template<ext::oneapi::experimental::architecture... Archs>
constexpr static bool device_architecture_is() {
  return (is_aot_for_architecture[static_cast<int>(Archs)] || ...);
}

// Helper object used to implement "else_if_architecture_is" and "otherwise".
// The "MakeCall" template parameter tells whether a previous clause in the
// "if-elseif-elseif ..." chain was true.  When "MakeCall" is false, some
// previous clause was true, so none of the subsequent
// "else_if_architecture_is" or "otherwise" member functions should call the
// user's function.
template<bool MakeCall>
class if_architecture_helper {
 public:
  template<ext::oneapi::experimental::architecture ...Archs, typename T>
  constexpr auto else_if_architecture_is(T fnTrue) {
    if constexpr (MakeCall && device_architecture_is<Archs...>()) {
      fnTrue();
      return if_architecture_helper<false>{};
    } else {
      return if_architecture_helper<MakeCall>{};
    }
  }

  template<typename T>
  constexpr void otherwise(T fn) {
    if constexpr (MakeCall) {
      fn();
    }
  }
};

} // namespace detail

namespace ext::oneapi::experimental {

template<architecture ...Archs, typename T>
constexpr static auto if_architecture_is(T fnTrue) {
  static_assert(detail::allowable_aot_mode<Archs...>(),
    "The if_architecture_is function may only be used when AOT "
    "compiling with '-fsycl-targets=spir64_x86_64' or "
    "'-fsycl-targets=intel_gpu_*'");
  if constexpr (detail::device_architecture_is<Archs...>()) {
    fnTrue();
    return detail::if_architecture_helper<false>{};
  } else {
    return detail::if_architecture_helper<true>{};
  }
}

} // namespace ext::oneapi::experimental
} // namespace sycl

Analysis of error checking for unsupported AOT modes

The header file code presented above triggers a static_assert if the if_architecture_is function is used in a translation unit that is compiled for an unsupported target. The supported targets are spir64_x86_64, the new intel_gpu_*, nvidia_gpu_* and amd_gpu_* GPU device names.

The error checking relies on the fact that the device compiler is invoked separately for each target listed in -fsycl-target. If any target is unsupported, the associated device compilation will compute is_allowable_aot_mode as false, and this will trigger the static_assert in that compilation phase.

Phase 2

TBD.