sycl_ext_oneapi_private_alloca
¶
SYCL does not allow dynamically sized private memory allocations. The
sycl_ext_oneapi_private_alloca
extension is a way to lift this restriction by allowing memory allocations whose
length is given by a SYCL specialization constant.
Use-case example¶
#include <sycl/sycl.hpp>
constexpr sycl::specialization_id<std::size_t> size(1);
namespace syclex = sycl::ext::oneapi::experimental;
class Kernel;
SYCL_EXTERNAL void impl(const float *in, float *out,
sycl::span<float> ptr);
void run(sycl::queue q, const float *in, float *out, size_t n) {
q.submit([&](sycl::handler &h) {
h.set_specialization_constant<size>(n);
h.parallel_for<Kernel>(n, [=](sycl::id<1> i, sycl::kernel_handler kh) {
sycl::span<float> tmp{
syclex::private_alloca<float, size, sycl::access::decorated::no>(kh).get_raw(),
kh.get_specialization_constant<size>()};
impl(in, out, tmp);
});
});
}
In this use-case, every work-item allocates a private memory region capable of
hosting kh.get_specialization_constant<size>()
elements of type float
. This
might be used for performance improvements in some algorithms, e.g., needing
more than one iteration on a sub-sequence of the input. However, as the length
of this sub-sequence is an input argument, this can only be achieved via
sycl_ext_oneapi_private_alloca
. Combining
sycl::handler::set_specialization_constant<size>
and calling private_alloca
,
we can get a dynamically sized memory allocation. Note size
is guaranteed to
be constant during kernel execution.
Design¶
sycl_ext_oneapi_private_alloca
is currently only supported on targets with
native spec constants support when AOT compilation is not used.
For non-SPIR-V targets, aspect checking is used to check
sycl_ext_oneapi_private_alloca
support. In case of AOT compilation, a
compile-time error is produced as this kind of checks cannot be performed at
runtime via aspects.
Usage in SYCL host code¶
Calling either of the functions defined in this extension in host code results
in a synchronous exception with the errc::feature_not_supported
error code.
#ifdef __SYCL_DEVICE_ONLY__
...
#else
template <typename ElementType, auto &SizeSpecName,
access::decorated DecorateAddress>
private_ptr<ElementType, DecorateAddress> private_alloca(kernel_handler &kh) {
throw sycl::exception(sycl::errc::feature_not_supported,
"Images are not supported by this device.");
}
#endif
New __builtin_intel_sycl_alloca
and __builtin_intel_sycl_alloca_with_align
builtins¶
private_alloca
and aligned_private_alloca
APIs are defined as builtin
aliases of __builtin_intel_sycl_alloca
and
__builtin_intel_sycl_alloca_with_align
respectively using the
clang::builtin_alias
attribute. This way, calls to these functions are handled
as calls to the builtins in the frontend. These builtins cannot be called
directly, only via their aliases defined in the SYCL headers.
template <typename ElementType, auto &SizeSpecName,
access::decorated DecorateAddress>
__SYCL_BUILTIN_ALIAS(__builtin_intel_sycl_alloca)
[[__sycl_detail__::__uses_aspects__(aspect::ext_oneapi_private_alloca)]] private_ptr<
ElementType, DecorateAddress> private_alloca(kernel_handler &kh);
template <typename ElementType, std::size_t Alignment,
auto &SizeSpecName, access::decorated DecorateAddress>
__SYCL_BUILTIN_ALIAS(__builtin_intel_sycl_alloca_with_align)
[[__sycl_detail__::__uses_aspects__(aspect::ext_oneapi_private_alloca)]] private_ptr<
ElementType, DecorateAddress> aligned_private_alloca(kernel_handler &kh);
As builtins cannot be passed template arguments,
Sema::CheckIntelSYCLAllocaBuiltinFunctionCall
and
CodeGenFunction::EmitIntelSYCLAllocaBuiltin
do not use the builtin for
checking or code generation. Instead, the original function declaration is
queried and used.
Following __builtin_alloca_with_align
, Alignment
must be lower than
std::numeric_limits<int32_t>::max() / 8
.
Note using clang::builtin_alias
required modification to code checking this
attribute to add SYCL to the list of contexts in which you can use
it. Implementation of this extension upstream would need to port these changes
upstream or modify code generation and sema of private_alloca
and
aligned_private_alloca
APIs.
llvm.sycl.alloca.*
intrinsic¶
The builtins mentioned in the previous section are represented via the new
llvm.sycl.alloca
intrinsic in code generation. This intrinsic receives three
arguments encoding the specialization constant used as array length,
corresponding to the arguments received by builtins implementing SYCL 2020
specialization constants; a type hint argument encoding the allocation element
type, and the required alignment, which must be an immediate argument. Note
sycl-post-link
usage of __spirv_SpecConstant
to represent specialization
constant queries is preserved, as we can reuse most of the handling of
specialization constant builtins defined in the corresponding design
document.
declare ptr @llvm.sycl.alloca.p0.p4.p4.p4.f32(ptr addrspace(4), ptr addrspace(4), ptr addrspace(4), float, i64)
declare ptr @llvm.sycl.alloca.p0.p4.p4.p4.f64(ptr addrspace(4), ptr addrspace(4), ptr addrspace(4), double, i64)
The alignment argument is set to alignof(ElementType)
for private_alloca
;
Alignment
is used for aligned_private_alloca
.
The private_alloca
call in the use-case above is compiled to the following
LLVM IR:
@_ZL4size = internal addrspace(1) constant %"class.sycl::_V1::specialization_id" { i64 1 }, align 8
@__usid_str.1 = private unnamed_addr constant [31 x i8] c"uid2c9b8e1a387f5dba____ZL4size\00", align 1
...
%alloca.i = tail call ptr @llvm.sycl.alloca.p0.p4.p4.p4.f32(ptr addrspace(4) addrspacecast (ptr @__usid_str.1 to ptr addrspace(4)), ptr addrspace(4) addrspacecast (ptr addrspace(1) @_ZL4size to ptr addrspace(4)), ptr addrspace(4) null, float 0.000000e+00, i64 4)
Note: the third argument is set to null
for now. That argument will be set
to the RTBuffer
for the specialization constants, i.e., a pointer to the
corresponding member of the input sycl::kernel_handler
. This change will be
needed to support this extension in non-SPIR-V targets, i.e., in targets with
emulated specialization constants.
llvm.sycl.alloca
is handled in sycl-post-link
.
sycl-post-link
¶
sycl-post-link
was modified to handle the intrinsic above in addition to SYCL
2020 specialization constants builtins. These are replaced by an alloca
instruction of the element type given by the intrinsic type hint and the size
given by the input specialization constant. Code used to obtain a specialization
constant value is reused, as the intrinsic receives the same three arguments as
the regular builtins.
This way, the -spec-const
pass (using the native
option) transforms the LLVM
IR code above into:
%size = call i64 @_Z20__spirv_SpecConstantix(i32 0, i64 1)
%alloca.i = alloca float, i64 %size, align 4
The builtin in conjunction with the alloca
instruction are handled by the
LLVM-SPIR-V translator to generate valid SPIR-V code.
LLVM-SPIR-V translator¶
The LLVM-SPIR-V translator can handle alloca
instructions as the one above
generating only standard SPIR-V operations. SPIR-V array types can have a
specialization constant length, so the running example is translated to SPIR-V
as follows:
Decorate %size SpecId 0
%sizety = OpTypeInt 64 0
%floatty = OpTypeFloat 32
%size = SpecConstant %sizety 1
%arrty = OpTypeArray %floatty %size
%floatptrty = OpTypePointer Function %floatty
%arrptrty = OpTypePointer Function %arrty
%genfloatptrty = OpTypePointer Generic %floatty
...
%alloca = OpVariable %arrptrty Function
%bitcast = OpBitcast %floatptrty %alloca
When passed a specialization constant as size, a single alloca
instruction is
represented as an OpVariable
operation of an array type of specialization
constant size and an OpBitcast
operation to cast the variable to the required
pointer type, i.e., to a pointer type of the array element type.