Calling convention#
A function argument corresponds to one or multiple arguments in the generated OpenCL-C code.
Scalar types#
A scalar argument always corresponds to a single scalar argument in the OpenCL-C code.
Argument type |
OpenCL C type |
Kernel argument type |
---|---|---|
i1 |
bool |
n/a [1] |
i8 |
char |
cl_char |
i16 |
short |
cl_short |
i32 |
int |
cl_int |
i64 |
long |
cl_long |
index |
long |
cl_long |
f32 |
float |
cl_float |
f64 |
double |
cl_double |
For example,
func @scalar_example(%a: i16) {}
leads to
kernel void scalar_example(short a) {}
Footnotes
Memref types#
A memref argument might require multiple arguments in the OpenCL-C code. The rule is that the first argument in the OpenCL kernel is a global pointer to the underlying scalar type and then an argument follows for every ‘?’ in the memref’s shape or stride, ordered from left-to-right.
For example,
func @memref_example1(%a: memref<f32x5x10>) {}
func @memref_example2(%a: memref<f64x5x?,strided<1,5>>) {}
func @memref_example3(%a: memref<i64x5x?x6,strided<1,7,?>>) {}
func @memref_example4(%a: memref<i64x5x?x6>) {}
leads to
kernel void memref_example1(global float* a) {}
kernel void memref_example2(global double* a, long a_shape1) {}
kernel void memref_example3(global long* a, long a_shape1, long a_stride2) {}
kernel void memref_example4(global long* a, long a_shape1, long a_stride2) {}
Note that memref_example3 and memref_example4 have the same signature, because memref<i64x5x?x6> has the canonical stride strided<1,5,?>.
Group types#
A group argument might require multiple arguments in the OpenCL-C code. The rule is that the first argument in the OpenCL kernel is a global pointer to a global pointer to the underlying scalar type of the memref. Then a global pointer argument follows for every ‘?’ in the memref’s shape or stride, ordered from left-to-right. If an dynamic offset is given, the offset is the last argument.
func @group_example1(%a: group<memref<i16x5x6>) {}
func @group_example2(%a: group<memref<i32x5x?x6>>) {}
func @group_example3(%a: group<memref<f32x?>, offset: ?>) {}
leads to
kernel void group_example1(global short*global* a) {}
kernel void group_example2(global int*global* a, global long* a_shape1, global long* a_stride2) {}
kernel void group_example3(global float*global* a, global long* a_shape0, long a_offset) {}
Note that a_shape_0, a_shape1, and a_stride2 must contain at least as many values as the group size. That is, if a is accessed with load %a[%id] : group<memref<i32x5x?x6>>, then *(a_shape0 + id), *(a_shape1 + id), and *(a_stride2 + id) must not lead to out-of-bounds memory access.