This is the list of SYCL specific options supported by compiler and some examples.
Options marked as [DEPRECATED] are going to be removed in some future updates. Options marked as [EXPERIMENTAL] are expected to be used only in limited cases and not recommended to use in production environment.
General enabling option for SYCL compilation and linking mode. List of targets can be specified with `-fsycl-targets`. This is fundamental option for any SYCL compilation. All other SYCL specific options require it.
Enables ahead of time (AOT) compilation for specified device targets. T is a compiler target triple string, representing a target device architecture. You can specify more than one target, comma separated. Default just in time (JIT) compilation target can be added to the list to produce a combination of AOT and JIT code in the resulting fat binary. The following triples are supported by default: * spir64 - this is the default generic SPIR-V target; * spir64_x86_64 - generate code ahead of time for x86_64 CPUs; * spir64_fpga - generate code ahead of time for Intel FPGA; * spir64_gen - generate code ahead of time for Intel Processor Graphics; Full target triples can also be used: * spir64-unknown-unknown, spir64_x86_64-unknown-unknown, spir64_fpga-unknown-unknown, spir64_gen-unknown-unknown Available in special build configuration: * nvptx64-nvidia-cuda - generate code ahead of time for CUDA target; * native_cpu - allows to run SYCL applications with no need of an additional backend (note that this feature is WIP and experimental, and currently overrides all the other specified SYCL targets when enabled.) Special target values specific to Intel, NVIDIA and AMD Processor Graphics support are accepted, providing a streamlined interface for AOT. Only one of these values at a time is supported. * intel_gpu_pvc - Ponte Vecchio Intel graphics architecture * intel_gpu_acm_g12, intel_gpu_dg2_g12 - Alchemist G12 Intel graphics architecture * intel_gpu_acm_g11, intel_gpu_dg2_g11 - Alchemist G11 Intel graphics architecture * intel_gpu_acm_g10, intel_gpu_dg2_g10 - Alchemist G10 Intel graphics architecture * intel_gpu_dg1, intel_gpu_12_10_0 - DG1 Intel graphics architecture * intel_gpu_adl_n - Alder Lake N Intel graphics architecture * intel_gpu_adl_p - Alder Lake P Intel graphics architecture * intel_gpu_rpl_s - Raptor Lake Intel graphics architecture (equal to intel_gpu_adl_s) * intel_gpu_adl_s - Alder Lake S Intel graphics architecture * intel_gpu_rkl - Rocket Lake Intel graphics architecture * intel_gpu_tgllp, intel_gpu_12_0_0 - Tiger Lake Intel graphics architecture * intel_gpu_jsl - Jasper Lake Intel graphics architecture (equal to intel_gpu_ehl) * intel_gpu_ehl - Elkhart Lake Intel graphics architecture * intel_gpu_icllp, intel_gpu_11_0_0 - Ice Lake Intel graphics architecture * intel_gpu_cml, intel_gpu_9_7_0 - Comet Lake Intel graphics architecture * intel_gpu_aml, intel_gpu_9_6_0 - Amber Lake Intel graphics architecture * intel_gpu_whl, intel_gpu_9_5_0 - Whiskey Lake Intel graphics architecture * intel_gpu_glk, intel_gpu_9_4_0 - Gemini Lake Intel graphics architecture * intel_gpu_bxt - Broxton Intel graphics architecture (equal to intel_gpu_apl) * intel_gpu_apl, intel_gpu_9_3_0 - Apollo Lake Intel graphics architecture * intel_gpu_cfl, intel_gpu_9_2_9 - Coffee Lake Intel graphics architecture * intel_gpu_kbl, intel_gpu_9_1_9 - Kaby Lake Intel graphics architecture * intel_gpu_skl, intel_gpu_9_0_9 - Skylake Intel graphics architecture * intel_gpu_bdw, intel_gpu_8_0_0 - Broadwell Intel graphics architecture * nvidia_gpu_sm_50 - NVIDIA Maxwell architecture (compute capability 5.0) * nvidia_gpu_sm_52 - NVIDIA Maxwell architecture (compute capability 5.2) * nvidia_gpu_sm_53 - NVIDIA Maxwell architecture (compute capability 5.3) * nvidia_gpu_sm_60 - NVIDIA Pascal architecture (compute capability 6.0) * nvidia_gpu_sm_61 - NVIDIA Pascal architecture (compute capability 6.1) * nvidia_gpu_sm_62 - NVIDIA Pascal architecture (compute capability 6.2) * nvidia_gpu_sm_70 - NVIDIA Volta architecture (compute capability 7.0) * nvidia_gpu_sm_72 - NVIDIA Volta architecture (compute capability 7.2) * nvidia_gpu_sm_75 - NVIDIA Turing architecture (compute capability 7.5) * nvidia_gpu_sm_80 - NVIDIA Ampere architecture (compute capability 8.0) * nvidia_gpu_sm_86 - NVIDIA Ampere architecture (compute capability 8.6) * nvidia_gpu_sm_87 - NVIDIA Jetson/Drive AGX Orin architecture * nvidia_gpu_sm_89 - NVIDIA Ada Lovelace architecture * nvidia_gpu_sm_90 - NVIDIA Hopper architecture * amd_gpu_gfx700 - AMD GCN GFX7 (Sea Islands (CI)) architecture * amd_gpu_gfx701 - AMD GCN GFX7 (Sea Islands (CI)) architecture * amd_gpu_gfx702 - AMD GCN GFX7 (Sea Islands (CI)) architecture * amd_gpu_gfx801 - AMD GCN GFX8 (Volcanic Islands (VI)) architecture * amd_gpu_gfx802 - AMD GCN GFX8 (Volcanic Islands (VI)) architecture * amd_gpu_gfx803 - AMD GCN GFX8 (Volcanic Islands (VI)) architecture * amd_gpu_gfx805 - AMD GCN GFX8 (Volcanic Islands (VI)) architecture * amd_gpu_gfx810 - AMD GCN GFX8 (Volcanic Islands (VI)) architecture * amd_gpu_gfx900 - AMD GCN GFX9 (Vega) architecture * amd_gpu_gfx902 - AMD GCN GFX9 (Vega) architecture * amd_gpu_gfx904 - AMD GCN GFX9 (Vega) architecture * amd_gpu_gfx906 - AMD GCN GFX9 (Vega) architecture * amd_gpu_gfx908 - AMD GCN GFX9 (Vega) architecture * amd_gpu_gfx90a - AMD GCN GFX9 (Vega) architecture * amd_gpu_gfx1010 - AMD GCN GFX10.1 (RDNA 1) architecture * amd_gpu_gfx1011 - AMD GCN GFX10.1 (RDNA 1) architecture * amd_gpu_gfx1012 - AMD GCN GFX10.1 (RDNA 1) architecture * amd_gpu_gfx1013 - AMD GCN GFX10.1 (RDNA 1) architecture * amd_gpu_gfx1030 - AMD GCN GFX10.3 (RDNA 2) architecture * amd_gpu_gfx1031 - GCN GFX10.3 (RDNA 2) architecture * amd_gpu_gfx1032 - GCN GFX10.3 (RDNA 2) architecture * amd_gpu_gfx1034 - GCN GFX10.3 (RDNA 2) architecture
SYCL language standard to compile for. Possible values: * 121 - SYCL 1.2.1 [DEPRECATED] * 2020 - SYCL 2020 It doesn't guarantee specific standard compliance, but some selected compiler features change behavior. It is under development and not recommended to use in production environment. Default value is 2020.
Enables/Disables unnamed SYCL lambda kernels support. The default value depends on the SYCL language standard: it is enabled by default for SYCL 2020, and disabled for SYCL 1.2.1.
The option was used to enable/disable SYCL explicit SIMD extension. Not used anymore.
Enables (or disables) intermediate representation optimization pipeline before translation to SPIR-V. Have effect only if optimizations are turned on by standard compiler options (-O1 or higher). Enabled by default.
Enables (or disables) LLVM IR dead argument elimination pass to remove unused arguments for the kernel functions before translation to SPIR-V. Currently has effect only on spir64\* targets. Enabled by default.
Assume/Do not assume that SYCL ID queries fit within MAX_INT. It assumes that these values fit within MAX_INT: * id class get() member function and operator * item class get_id() member function and operator * nd_item class get_global_id()/get_global_linear_id() member functions Enabled by default.
Enables/Disables inlining of the kernel lambda operator into the compiler generated entry point function. This flag does not apply to ESIMD kernels. Disabled when optimizations are disabled (-O0 or equivalent). Enabled otherwise.
Sets the inline threshold for device compilation to <n>. Note that this option only affects the behaviour of the DPC++ compiler, not target- specific compilers (e.g. OpenCL/Level Zero/Nvidia/AMD target compilers) which may or may not perform additional inlining. Default value is 225.
Target toolchain options¶
Pass "options" to the backend of target device compiler, specified by triple T. The backend of device compiler generates target machine code from intermediate representation. This option can be used to tune code generation for a specific target. The "options" are used during AOT compilation. For JIT compilation "options" are saved in a fat binary and used when code is JITed during runtime. -Xs is a shortcut to pass "options" to all backends specified via the '-fsycl-targets' option (or default one).
Pass "options" to the frontend of target device compiler, specified by triple T. This option can be used to control of intermediate representation generation during offline or online compilation.
Pass "options" to the device code linker, when linking multiple device object modules. T is specific target device triple.
Intel FPGA specific options¶
Perform ahead of time compilation for Intel FPGA. It sets the target to FPGA and turns on the debug options that are needed to generate FPGA reports. It is functionally equivalent shortcut to `-fsycl-targets=spir64_fpga -g -MMD` on Linux and `-fsycl-targets=spir64_fpga -Zi -MMD` on Windows.
Controls FPGA target binary output format. Same as -fsycl-link, but optional output can be one of the following: * early - generate html reports and an intermediate object file that avoids a full Quartus compile. Usually takes minutes to generate. Link can later be resumed from this point using -fsycl-link=image. * image - generate a bitstream which is ready to be linked and used on a FPGA board. Usually takes hours to generate.
Speed up FPGA backend compilation if the device code in <binary> is unchanged. If it's safe to do so the compiler will re-use the device binary embedded within it. This can be used to minimize or avoid long Quartus compile times for FPGA targets when the device code is unchanged.
Compile only device part of the code and ignore host part.
Emit SYCL device code in LLVM-IR bitcode format. When disabled, SPIR-V is emitted. Enabled by default.
Specify format of device code stored in the resulting object. The <arg> can be one of the following: "spirv" - SPIR-V is emitted, "llvmir" - LLVM-IR bitcode format is emitted (default).
Emit help information from device compiler backend. Backend can be one of the following: "x86_64", "fpga", "gen", or "all". Specifying "all" is the same as specifying -fsycl-help with no argument and emits help for all backends.
Informs the compiler driver that the host compilation step that is performed as part of the greater compilation flow will be performed by the compiler <arg>. It is expected that <arg> is the compiler to be called, either by name (in which the PATH will be used to discover it) or a fully qualified directory with compiler to invoke. This option is only useful when -fsycl is provided on the command line.
Passes along the space separated quoted "opts" string as option arguments to the compiler specified with the -fsycl-host-compiler=<arg> option. It is expected that the options used here are compatible with the compiler specified via -fsycl-host-compiler=<arg>. NOTE: Using -fsycl-host-compiler-options to pass any kind of phase limiting options (e.g. -c, -E, -S) may interfere with the expected output set during the host compilation. Doing so is considered undefined behavior.
Enable use of correctly rounded `sycl::sqrt` function as defined by IEE754. Without this flag, the default precision requirement for `sycl::sqrt` is 3 ULP. NOTE: This flag is currently only supported with the CUDA and HIP targets.
Enforces stateless memory access and enables the automatic conversion of "stateful" memory access via SYCL accessors to "stateless" within ESIMD (Explicit SIMD) kernels. -fsycl-esimd-force-stateless-mem disables the intrinsics and methods accepting SYCL accessors or "surface-index" which cannot be automatically converted to their "stateless" equivalents. -fno-sycl-esimd-force-stateless-mem is used to tell compiler not to enforce usage of stateless memory accesses. This is the default behavior. NOTE: "Stateful" access is the one that uses SYCL accessor or a pair of "surface-index" + 32-bit byte-offset and uses specific memory access data port messages to read/write/fetch. "Stateless" memory access uses memory location represented with virtual memory address pointer such as USM pointer. The "stateless" memory may be beneficial as it does not have the limit of 4Gb per surface. Also, some of Intel GPUs or GPU run-time/drivers may support only "stateless" memory accesses.
Instructs the target backend to reduce compilation time, potentially
at the cost of runtime performance. Currently only supported on Intel GPUs.
Exposes exported symbols in a generated target library to allow for visibility to other modules. NOTE: This flag is only supported for spir64_gen AOT targets.
Example: SYCL device code compilation¶
To invoke SYCL device compiler set
$ clang++ -fsycl-device-only sycl-app.cpp -o sycl-app.bc
By default the output format for SYCL device is LLVM bytecode.
-fno-sycl-use-bitcode can be used to emit device code in SPIR-V format.
$ clang++ -fsycl-device-only -fno-sycl-use-bitcode sycl-app.cpp -o sycl-app.spv