DPC++ Runtime
Runtime libraries for oneAPI DPC++
elem_type_traits.hpp
Go to the documentation of this file.
1 //==------------ - elem_type_traits.hpp - DPC++ Explicit SIMD API ----------==//
2 //
3 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4 // See https://llvm.org/LICENSE.txt for license information.
5 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6 //
7 //===----------------------------------------------------------------------===//
8 // This header provides basic infrastructure to support non-standard C++ types
9 // as simd element types. This non-standard element types are usually structs or
10 // classes (example: sycl::half).
11 // Terms:
12 // - "wrapper type" - a non-standard element type
13 // - "raw type" - the real types used to represent real storage type of the data
14 // bits wrapped by the corresponding wrapper structure/class
15 // By design, user program never uses the raw types, so they are not exposed at
16 // user level.
17 //
18 // The main reasons why the infrastructure is needed are:
19 // - attempt to create a clang vector with wrapper element type
20 // vector_type_t<WrapperT, N> will result in compilation error
21 // - C++ operations on WrapperT are usually supported by the Intel GPU hardware
22 // (which is the main reason of supporting them in ESIMD) and need to be
23 // mapped to efficient hardware code sequences.
24 //
25 // To make a wrapper type appear as first-class element type, the following
26 // major components must be available/implemented for the type:
27 // 1) Storage ("raw") type must be defined. The raw type must be bit-castable to
28 // the wrapper type and thus must have the same bit size and alignment
29 // requirements.
30 // 2) "Nearest enclosing" standard C++ type must be defined. This is a standard
31 // C++ type which can represent values of the wrapper type. The enclosing type
32 // can be used as a fall-back type for default implementations of operations
33 // on the wrapper type
34 // 3) Type conversion intrinsics between the bit representation of a wrapper
35 // type value and the equivalent enclosing C++ type value
36 // 4) The above three are enough to emulate any wrapper type, as all operations
37 // can be performed on the enclosing type values, converting from raw to
38 // enclosing before the operation and converting back from enclosing to raw
39 // after the operation. But this would be inefficient in some cases - when
40 // enclosing C++ type does not match the raw type, as H/W usually supports
41 // many operations directly on the raw type (which is bit representation of
42 // the wrapper type). So mapping to efficient H/W operations must be defined.
43 // For example, for SYCL half type efficient mapping primitive operations to
44 // Intel GPU harware is as easy as "unwrapping" sycl::half value, which yields
45 // "_Float16" natively supported by the device compiler and hardware, then
46 // using standard C++, operations such as '+', on _Float16 values. For other
47 // types like bfloat16 this will require mapping to appropriate intrinsics.
48 // 5) The type must be marked as wrapper type explicitly, for the API to behave
49 // correctly.
50 // Important note: some of these components might have different definition for
51 // the same wrapper type depending on host vs device compilation. E.g. for SYCL
52 // half the raw type is uint16_t on host and _Float16 on device.
53 //
54 // - The mechanism to define components 1) and 2) for a new wrapper type is to
55 // provide a specialization of the `element_type_traits` structure for this
56 // type.
57 // - Component 3) is provided via implementing specializations of the following
58 // intrinsics:
59 // * __esimd_wrapper_type_bitcast_to/__esimd_wrapper_type_bitcast_from (should
60 // not be necessary with C++ 20 where there is a standard bitcast operation)
61 // to bitcast between the raw and the wrapper types.
62 // * __esimd_convertvector_to/__esimd_convertvector_from to type-convert
63 // between clang vectors of the wrapper type (bit-represented with the raw
64 // type) and clang vectors the the enclosing std type values.
65 // - Component 4) is provided via:
66 // * (primitive operations) Specializations of the
67 // __esimd_binary_op
68 // __esimd_unary_op
69 // __esimd_cmp_op
70 // __esimd_vector_binary_op
71 // __esimd_vector_unary_op
72 // __esimd_vector_cmp_op
73 // intrinsics. If the `use_native_cpp_ops` element type trait is true, then
74 // implementing those intrinsics is not necessary and std C++ operations
75 // will be used.
76 // * (math operations) Overloading std math functions for the new wrapper
77 // type.
78 // - Component 5) is provided via adding the new type to the list of types in
79 // `is_wrapper_elem_type_v` meta function.
80 //===----------------------------------------------------------------------===//
81 
82 #pragma once
83 
85 
86 #include <CL/sycl/half_type.hpp>
87 
89 
91 namespace __ESIMD_DNS {
92 
93 // Primitive C++ operations supported by simd objects and templated upon by some
94 // of the functions/classes.
95 
96 enum class BinOp {
97  add,
98  sub,
99  mul,
100  div,
101  rem,
102  shl,
103  shr,
104  bit_or,
105  bit_and,
106  bit_xor,
107  log_or,
108  log_and
109 };
110 
111 enum class CmpOp { lt, lte, gte, gt, eq, ne };
112 
113 enum class UnaryOp { minus, plus, bit_not, log_not };
114 
115 // If given type is a special "wrapper" element type.
116 template <class T>
117 static inline constexpr bool is_wrapper_elem_type_v =
118  std::is_same_v<T, sycl::half>;
119 
120 template <class T>
121 static inline constexpr bool is_valid_simd_elem_type_v =
122  (is_vectorizable_v<T> || is_wrapper_elem_type_v<T>);
123 
124 struct invalid_raw_element_type;
125 
126 // Default (unusable) definition of the element type traits.
127 template <class T, class SFINAE> struct element_type_traits {
128  // The raw element type of the underlying clang vector used as a
129  // storage.
130  using RawT = invalid_raw_element_type;
131  // A starndard C++ type which this one can be converted to/from.
132  // The conversions are usually H/W-supported, and the C++ type can
133  // represent the entire range of values of this type.
134  using EnclosingCppT = void;
135  // Whether a value or clang vector value the raw element type can be used
136  // directly as operand to std C++ operations.
137  static inline constexpr bool use_native_cpp_ops = true;
138  // W/A for MSVC compiler problems which thinks
139  // std::is_floating_point_v<_Float16> is false; so require new element types
140  // implementations to state "is floating point" trait explicitly
141  static inline constexpr bool is_floating_point = false;
142 };
143 
144 // Element type traits specialization for C++ standard element type.
145 template <class T>
146 struct element_type_traits<T, std::enable_if_t<is_vectorizable_v<T>>> {
147  using RawT = T;
148  using EnclosingCppT = T;
149  static inline constexpr bool use_native_cpp_ops = true;
150  static inline constexpr bool is_floating_point = std::is_floating_point_v<T>;
151 };
152 
153 // --- Type conversions
154 
155 // Low-level conversion functions to and from a wrapper element type.
156 // Must be implemented for each supported
157 // <wrapper element type, C++ std type pair>.
158 
159 // These are default implementations for wrapper types with native cpp
160 // operations support for their corresponding raw type.
161 template <class WrapperTy, class StdTy, int N>
162 ESIMD_INLINE vector_type_t<__raw_t<WrapperTy>, N>
163 __esimd_convertvector_to(vector_type_t<StdTy, N> Val)
164 #ifdef __SYCL_DEVICE_ONLY__
165  ; // needs to be implemented for WrapperTy's for which
166  // element_type_traits<WrapperTy>::use_native_cpp_ops is false.
167 #else
168 {
169  // TODO implement for host
170  __ESIMD_UNSUPPORTED_ON_HOST;
171 }
172 #endif // __SYCL_DEVICE_ONLY__
173 
174 template <class WrapperTy, class StdTy, int N>
175 ESIMD_INLINE vector_type_t<StdTy, N>
176 __esimd_convertvector_from(vector_type_t<__raw_t<WrapperTy>, N> Val)
177 #ifdef __SYCL_DEVICE_ONLY__
178  ; // needs to be implemented for WrapperTy's for which
179  // element_type_traits<WrapperTy>::use_native_cpp_ops is false.
180 #else
181 {
182  // TODO implement for host
183  __ESIMD_UNSUPPORTED_ON_HOST;
184 }
185 #endif // __SYCL_DEVICE_ONLY__
186 
187 // TODO should be replaced by std::bit_cast once C++20 is supported.
188 template <class WrapperTy>
189 WrapperTy __esimd_wrapper_type_bitcast_to(__raw_t<WrapperTy> Val);
190 template <class WrapperTy>
191 __raw_t<WrapperTy> __esimd_wrapper_type_bitcast_from(WrapperTy Val);
192 
193 template <class WrapperTy, class StdTy> struct wrapper_type_converter {
194  using RawTy = __raw_t<WrapperTy>;
195 
196  template <int N>
197  ESIMD_INLINE static vector_type_t<RawTy, N>
198  to_vector(vector_type_t<StdTy, N> Val) {
199  if constexpr (element_type_traits<WrapperTy>::use_native_cpp_ops) {
200  return __builtin_convertvector(Val, vector_type_t<RawTy, N>);
201  } else {
202  return __esimd_convertvector_to<WrapperTy, StdTy, N>(Val);
203  }
204  }
205 
206  template <int N>
207  ESIMD_INLINE static vector_type_t<StdTy, N>
208  from_vector(vector_type_t<RawTy, N> Val) {
209  if constexpr (element_type_traits<WrapperTy>::use_native_cpp_ops) {
210  return __builtin_convertvector(Val, vector_type_t<StdTy, N>);
211  } else {
212  return __esimd_convertvector_from<WrapperTy, StdTy, N>(Val);
213  }
214  }
215 };
216 
217 // Converts a raw representation of a simd vector with element type
218 // SrcWrapperTy to a raw representation of a simd vector with element type
219 // DstWrapperTy.
220 template <class DstWrapperTy, class SrcWrapperTy, int N,
221  class DstRawVecTy = vector_type_t<__raw_t<DstWrapperTy>, N>,
222  class SrcRawVecTy = vector_type_t<__raw_t<SrcWrapperTy>, N>>
223 ESIMD_INLINE DstRawVecTy convert_vector(SrcRawVecTy Val) {
224  if constexpr (std::is_same_v<SrcWrapperTy, DstWrapperTy>) {
225  return Val;
226  } else if constexpr (!is_wrapper_elem_type_v<SrcWrapperTy> &&
227  !is_wrapper_elem_type_v<DstWrapperTy>) {
228  return __builtin_convertvector(Val, DstRawVecTy);
229  } else {
230  // The chain of conversions (some can be no-op if types match):
231  // SrcRawVecTy (of SrcWrapperTy)
232  // | step A [wrapper_type_converter<SrcWrapperTy, SrcStdT>]::from_vector
233  // v
234  // SrcStdT
235  // | step B [__builtin_convertvector]
236  // v
237  // DstStdT
238  // | step C [wrapper_type_converter<DstWrapperTy, DstStdT>]::to_vector
239  // v
240  // DstRawVecTy (of DstWrapperTy)
241  //
242  using DstStdT = typename element_type_traits<DstWrapperTy>::EnclosingCppT;
243  using SrcStdT = typename element_type_traits<SrcWrapperTy>::EnclosingCppT;
244  using SrcConv = wrapper_type_converter<SrcWrapperTy, SrcStdT>;
245  using DstConv = wrapper_type_converter<DstWrapperTy, DstStdT>;
246  using DstStdVecT = vector_type_t<DstStdT, N>;
247  using SrcStdVecT = vector_type_t<SrcStdT, N>;
248  SrcStdVecT TmpSrcVal;
249 
250  if constexpr (std::is_same_v<SrcStdT, SrcWrapperTy>) {
251  TmpSrcVal = std::move(Val);
252  } else {
253  TmpSrcVal = SrcConv::template from_vector<N>(Val); // step A
254  }
255  if constexpr (std::is_same_v<SrcStdT, DstWrapperTy>) {
256  return TmpSrcVal;
257  } else {
258  DstStdVecT TmpDstVal;
259 
260  if constexpr (std::is_same_v<SrcStdT, DstStdVecT>) {
261  TmpDstVal = std::move(TmpSrcVal);
262  } else {
263  TmpDstVal = __builtin_convertvector(TmpSrcVal, DstStdVecT); // step B
264  }
265  if constexpr (std::is_same_v<DstStdT, DstWrapperTy>) {
266  return TmpDstVal;
267  } else {
268  return DstConv::template to_vector<N>(TmpDstVal); // step C
269  }
270  }
271  }
272 }
273 
274 template <class Ty> ESIMD_INLINE __raw_t<Ty> bitcast_to_raw_type(Ty Val) {
275  if constexpr (!is_wrapper_elem_type_v<Ty>) {
276  return Val;
277  } else {
278  return __esimd_wrapper_type_bitcast_from<Ty>(Val);
279  }
280 }
281 
282 template <class Ty> ESIMD_INLINE Ty bitcast_to_wrapper_type(__raw_t<Ty> Val) {
283  if constexpr (!is_wrapper_elem_type_v<Ty>) {
284  return Val;
285  } else {
286  return __esimd_wrapper_type_bitcast_to<Ty>(Val);
287  }
288 }
289 
290 // Converts a scalar value from given source type to destination type. Both
291 // types can be non-std element types, in which case additional non-C++
292 // conversions happen if the types are different.
293 // NOTE: this is not symmetric with convert_vector, which inputs and outputs
294 // raw (storage) vector types.
295 template <class DstWrapperTy, class SrcWrapperTy,
296  class DstRawTy = __raw_t<DstWrapperTy>,
297  class SrcRawTy = __raw_t<SrcWrapperTy>>
298 ESIMD_INLINE DstWrapperTy convert_scalar(SrcWrapperTy Val) {
299  if constexpr (std::is_same_v<SrcWrapperTy, DstWrapperTy>) {
300  return Val;
301  } else if constexpr (!is_wrapper_elem_type_v<SrcWrapperTy> &&
302  !is_wrapper_elem_type_v<DstWrapperTy>) {
303  return static_cast<DstRawTy>(Val);
304  } else {
305  vector_type_t<SrcRawTy, 1> V0 = bitcast_to_raw_type<SrcWrapperTy>(Val);
306  vector_type_t<DstRawTy, 1> V1 =
307  convert_vector<DstWrapperTy, SrcWrapperTy, 1>(V0);
308  return bitcast_to_wrapper_type<DstWrapperTy>(V1[0]);
309  }
310 }
311 
312 template <BinOp Op, class T> T binary_op_default_impl(T X, T Y) {
313  T Res{};
314  if constexpr (Op == BinOp::add)
315  Res = X + Y;
316  else if constexpr (Op == BinOp::sub)
317  Res = X - Y;
318  else if constexpr (Op == BinOp::mul)
319  Res = X * Y;
320  else if constexpr (Op == BinOp::div)
321  Res = X / Y;
322  else if constexpr (Op == BinOp::rem)
323  Res = X % Y;
324  else if constexpr (Op == BinOp::shl)
325  Res = X << Y;
326  else if constexpr (Op == BinOp::shr)
327  Res = X >> Y;
328  else if constexpr (Op == BinOp::bit_or)
329  Res = X | Y;
330  else if constexpr (Op == BinOp::bit_and)
331  Res = X & Y;
332  else if constexpr (Op == BinOp::bit_xor)
333  Res = X ^ Y;
334  else if constexpr (Op == BinOp::log_or)
335  Res = X || Y;
336  else if constexpr (Op == BinOp::log_and)
337  Res = X && Y;
338  return Res;
339 }
340 
341 template <CmpOp Op, class T> auto comparison_op_default_impl(T X, T Y) {
342  decltype(X < Y) Res{};
343  if constexpr (Op == CmpOp::lt)
344  Res = X < Y;
345  else if constexpr (Op == CmpOp::lte)
346  Res = X <= Y;
347  else if constexpr (Op == CmpOp::eq)
348  Res = X == Y;
349  else if constexpr (Op == CmpOp::ne)
350  Res = X != Y;
351  else if constexpr (Op == CmpOp::gte)
352  Res = X >= Y;
353  else if constexpr (Op == CmpOp::gt)
354  Res = X > Y;
355  return Res;
356 }
357 
358 template <UnaryOp Op, class T> auto unary_op_default_impl(T X) {
359  if constexpr (Op == UnaryOp::minus)
360  return -X;
361  else if constexpr (Op == UnaryOp::plus)
362  return +X;
363  else if constexpr (Op == UnaryOp::bit_not)
364  return ~X;
365  else if constexpr (Op == UnaryOp::log_not)
366  return !X;
367 }
368 
369 template <class ElemT, int N> struct __hlp {
370  using RawElemT = __raw_t<ElemT>;
371  using RawVecT = vector_type_t<RawElemT, N>;
372  using BinopT = decltype(std::declval<RawVecT>() + std::declval<RawVecT>());
373  using CmpT = decltype(std::declval<RawVecT>() < std::declval<RawVecT>());
374 };
375 
376 template <class Hlp> using __re_t = typename Hlp::RawElemT;
377 template <class Hlp> using __rv_t = typename Hlp::RawVecT;
378 template <class Hlp> using __cmp_t = typename Hlp::CmpT;
379 
380 // --- Scalar versions of binary operations
381 
382 template <BinOp Op, class T> ESIMD_INLINE T __esimd_binary_op(T X, T Y);
383 
384 template <BinOp Op, class T,
385  class = std::enable_if_t<is_valid_simd_elem_type_v<T>>>
386 ESIMD_INLINE T binary_op_default(T X, T Y) {
387  static_assert(element_type_traits<T>::use_native_cpp_ops);
388  using T1 = __raw_t<T>;
389  T1 X1 = bitcast_to_raw_type(X);
390  T1 Y1 = bitcast_to_raw_type(Y);
391  T1 Res = binary_op_default_impl<Op>(X1, Y1);
392  return bitcast_to_wrapper_type<T>(Res);
393 }
394 
395 // Default (inefficient) implementation of a scalar binary operation, which
396 // involves conversion to an std C++ type, performing the op and converting
397 // back.
398 template <BinOp Op, class T> ESIMD_INLINE T __esimd_binary_op(T X, T Y) {
399  using T1 = typename element_type_traits<T>::EnclosingCppT;
400  T1 X1 = convert_scalar<T1, T>(X);
401  T1 Y1 = convert_scalar<T1, T>(Y);
402  return convert_scalar<T>(binary_op_default<Op, T1>(X1, Y1));
403 }
404 
405 template <BinOp Op, class T,
406  class = std::enable_if_t<is_valid_simd_elem_type_v<T>>>
407 ESIMD_INLINE T binary_op(T X, T Y) {
408  if constexpr (element_type_traits<T>::use_native_cpp_ops) {
409  return binary_op_default<Op>(X, Y);
410  } else {
411  return __esimd_binary_op<Op>(X, Y);
412  }
413 }
414 
415 // --- Vector versions of binary operations
416 
417 template <BinOp Op, class ElemT, int N, class RawVecT = __rv_t<__hlp<ElemT, N>>>
418 ESIMD_INLINE RawVecT vector_binary_op_default(RawVecT X, RawVecT Y) {
419  static_assert(element_type_traits<ElemT>::use_native_cpp_ops);
420  return binary_op_default_impl<Op, RawVecT>(X, Y);
421 }
422 
423 // Default (inefficient) implementation of a vector binary operation, which
424 // involves conversion to an std C++ type, performing the op and converting
425 // back.
426 template <BinOp Op, class ElemT, int N, class RawVecT = __rv_t<__hlp<ElemT, N>>>
427 ESIMD_INLINE RawVecT __esimd_vector_binary_op(RawVecT X, RawVecT Y) {
428  using T1 = typename element_type_traits<ElemT>::EnclosingCppT;
429  using VecT1 = vector_type_t<T1, N>;
430  VecT1 X1 = convert_vector<T1, ElemT, N>(X);
431  VecT1 Y1 = convert_vector<T1, ElemT, N>(Y);
432  return convert_vector<ElemT, T1, N>(
433  vector_binary_op_default<Op, T1, N>(X1, Y1));
434 }
435 
436 template <BinOp Op, class ElemT, int N, class RawVecT = __rv_t<__hlp<ElemT, N>>>
437 ESIMD_INLINE RawVecT vector_binary_op(RawVecT X, RawVecT Y) {
438  if constexpr (element_type_traits<ElemT>::use_native_cpp_ops) {
439  return vector_binary_op_default<Op, ElemT, N>(X, Y);
440  } else {
441  return __esimd_vector_binary_op<Op, ElemT, N>(X, Y);
442  }
443 }
444 
445 // --- Scalar versions of unary operations
446 
447 template <UnaryOp Op, class T> ESIMD_INLINE T __esimd_unary_op(T X);
448 
449 template <UnaryOp Op, class T,
450  class = std::enable_if_t<is_valid_simd_elem_type_v<T>>>
451 ESIMD_INLINE T unary_op_default(T X) {
452  static_assert(element_type_traits<T>::use_native_cpp_ops);
453  using T1 = __raw_t<T>;
454  T1 X1 = bitcast_to_raw_type(X);
455  T1 Res = unary_op_default_impl<Op>(X1);
456  return bitcast_to_wrapper_type<T>(Res);
457 }
458 
459 // Default (inefficient) implementation of a scalar unary operation, which
460 // involves conversion to an std C++ type, performing the op and converting
461 // back.
462 template <UnaryOp Op, class T> ESIMD_INLINE T __esimd_unary_op(T X) {
463  using T1 = typename element_type_traits<T>::EnclosingCppT;
464  T1 X1 = convert_scalar<T1, T>(X);
465  return convert_scalar<T>(unary_op_default<Op, T1>(X1));
466 }
467 
468 template <UnaryOp Op, class T,
469  class = std::enable_if_t<is_valid_simd_elem_type_v<T>>>
470 ESIMD_INLINE T unary_op(T X) {
471  if constexpr (element_type_traits<T>::use_native_cpp_ops) {
472  return unary_op_default<Op>(X);
473  } else {
474  return __esimd_unary_op<Op>(X);
475  }
476 }
477 
478 // --- Vector versions of unary operations
479 
480 template <UnaryOp Op, class ElemT, int N,
481  class RawVecT = __rv_t<__hlp<ElemT, N>>>
482 ESIMD_INLINE RawVecT vector_unary_op_default(RawVecT X) {
483  static_assert(element_type_traits<ElemT>::use_native_cpp_ops);
484  return unary_op_default_impl<Op, RawVecT>(X);
485 }
486 
487 // Default (inefficient) implementation of a vector unary operation, which
488 // involves conversion to an std C++ type, performing the op and converting
489 // back.
490 template <UnaryOp Op, class ElemT, int N,
491  class RawVecT = __rv_t<__hlp<ElemT, N>>>
492 ESIMD_INLINE RawVecT __esimd_vector_unary_op(RawVecT X) {
493  using T1 = typename element_type_traits<ElemT>::EnclosingCppT;
494  using VecT1 = vector_type_t<T1, N>;
495  VecT1 X1 = convert_vector<T1, ElemT, N>(X);
496  return convert_vector<ElemT, T1, N>(vector_unary_op_default<Op, T1, N>(X1));
497 }
498 
499 template <UnaryOp Op, class ElemT, int N,
500  class RawVecT = __rv_t<__hlp<ElemT, N>>>
501 ESIMD_INLINE RawVecT vector_unary_op(RawVecT X) {
502  if constexpr (element_type_traits<ElemT>::use_native_cpp_ops) {
503  return vector_unary_op_default<Op, ElemT, N>(X);
504  } else {
505  return __esimd_vector_unary_op<Op, ElemT, N>(X);
506  }
507 }
508 
509 // --- Vector versions of comparison operations
510 
511 template <CmpOp Op, class ElemT, int N, class H = __hlp<ElemT, N>,
512  class RetT = __cmp_t<H>, class RawVecT = __rv_t<H>>
513 ESIMD_INLINE RetT vector_comparison_op_default(RawVecT X, RawVecT Y) {
514  static_assert(element_type_traits<ElemT>::use_native_cpp_ops);
515  return comparison_op_default_impl<Op, RawVecT>(X, Y);
516 }
517 
518 // Default (inefficient) implementation of a vector comparison operation, which
519 // involves conversion to an std C++ type, performing the op and converting
520 // back.
521 template <CmpOp Op, class ElemT, int N, class H = __hlp<ElemT, N>,
522  class RetT = __cmp_t<H>, class RawVecT = __rv_t<H>>
523 ESIMD_INLINE RetT __esimd_vector_comparison_op(RawVecT X, RawVecT Y) {
524  using T1 = typename element_type_traits<ElemT>::EnclosingCppT;
525  using VecT1 = vector_type_t<T1, N>;
526  VecT1 X1 = convert_vector<T1, ElemT, N>(X);
527  VecT1 Y1 = convert_vector<T1, ElemT, N>(Y);
528  return convert_vector<element_type_t<RetT>, T1, N>(
529  vector_comparison_op_default<Op, T1, N>(X1, Y1));
530 }
531 
532 template <CmpOp Op, class ElemT, int N, class H = __hlp<ElemT, N>,
533  class RetT = __cmp_t<H>, class RawVecT = __rv_t<H>>
534 ESIMD_INLINE RetT vector_comparison_op(RawVecT X, RawVecT Y) {
535  if constexpr (element_type_traits<ElemT>::use_native_cpp_ops) {
536  return vector_comparison_op_default<Op, ElemT, N>(X, Y);
537  } else {
538  return __esimd_vector_comparison_op<Op, ElemT, N>(X, Y);
539  }
540 }
541 
542 // Proxy class to access bit representation of a wrapper type both on host and
543 // device.
544 // TODO add this functionality to sycl type implementation? With C++20,
545 // std::bit_cast should be a good replacement.
546 class WrapperElementTypeProxy {
547 public:
548  template <class T = sycl::half>
549  static inline __raw_t<T> bitcast_from_half(T Val) {
550 #ifdef __SYCL_DEVICE_ONLY__
551  return Val.Data;
552 #else
553  return Val.Data.Buf;
554 #endif // __SYCL_DEVICE_ONLY__
555  }
556 
557  template <class T = sycl::half>
558  static inline T bitcast_to_half(__raw_t<T> Bits) {
559 #ifndef __SYCL_DEVICE_ONLY__
560  return sycl::half{Bits};
561 #else
562  sycl::half Res;
563  Res.Data = Bits;
564  return Res;
565 #endif // __SYCL_DEVICE_ONLY__
566  }
567 };
568 
569 // "Generic" version of std::is_floating_point_v which returns "true" also for
570 // the wrapper floating-point types such as sycl::half.
571 template <typename T>
572 static inline constexpr bool is_generic_floating_point_v =
573  element_type_traits<T>::is_floating_point;
574 
575 // Get computation type of a binary operator given its operand types:
576 // - if both types are arithmetic - return CPP's "common real type" of the
577 // computation (matches C++)
578 // - if both types are simd types, they must be of the same length N,
579 // and the returned type is simd<T, N>, where N is the "common real type" of
580 // the element type of the operands (diverges from clang)
581 // - otherwise, one type is simd and another is arithmetic - the simd type is
582 // returned (matches clang)
583 
584 struct invalid_computation_type;
585 
586 template <class T1, class T2, class SFINAE = void> struct computation_type {
587  using type = invalid_computation_type;
588 };
589 
590 template <class T1, class T2>
591 struct computation_type<T1, T2,
592  std::enable_if_t<is_valid_simd_elem_type_v<T1> &&
593  is_valid_simd_elem_type_v<T2>>> {
594 private:
595  template <class T> using tr = element_type_traits<T>;
596  template <class T>
597  using native_t =
598  std::conditional_t<tr<T>::use_native_cpp_ops, typename tr<T>::RawT,
599  typename tr<T>::EnclosingCppT>;
600  static inline constexpr bool is_wr1 = is_wrapper_elem_type_v<T1>;
601  static inline constexpr bool is_wr2 = is_wrapper_elem_type_v<T2>;
602  static inline constexpr bool is_fp1 = is_generic_floating_point_v<T1>;
603  static inline constexpr bool is_fp2 = is_generic_floating_point_v<T2>;
604 
605 public:
606  using type = std::conditional_t<
607  !is_wr1 && !is_wr2,
608  // T1 and T2 are both std C++ types - use std C++ type promotion
609  decltype(std::declval<T1>() + std::declval<T2>()),
611  std::is_same_v<T1, T2>,
612  // Types are the same wrapper type - return any
613  T1,
614  std::conditional_t<is_fp1 != is_fp2,
615  // One of the types is floating-point - return it
616  // (e.g. computation_type<int, sycl::half> will
617  // yield sycl::half)
618  std::conditional_t<is_fp1, T1, T2>,
619  // both are either floating point or integral -
620  // return result of C++ promotion of the native
621  // types
622  decltype(std::declval<native_t<T1>>() +
623  std::declval<native_t<T2>>())>>>;
624 };
625 
626 template <class T1, class T2>
627 struct computation_type<
628  T1, T2,
629  std::enable_if_t<is_simd_like_type_v<T1> || is_simd_like_type_v<T2>>> {
630 private:
631  using Ty1 = element_type_t<T1>;
632  using Ty2 = element_type_t<T2>;
633  using EltTy = typename computation_type<Ty1, Ty2>::type;
634  static constexpr int N1 = is_simd_like_type_v<T1> ? T1::length : 0;
635  static constexpr int N2 = is_simd_like_type_v<T2> ? T2::length : 0;
636  static_assert((N1 == N2) || ((N1 & N2) == 0), "size mismatch");
637  static constexpr int N = N1 ? N1 : N2;
638 
639 public:
640  using type = simd<EltTy, N1>;
641 };
642 
643 template <class T1, class T2 = T1>
644 using computation_type_t =
645  typename computation_type<remove_cvref_t<T1>, remove_cvref_t<T2>>::type;
646 
648 // sycl::half traits
650 
651 template <class T>
652 struct element_type_traits<T, std::enable_if_t<std::is_same_v<T, sycl::half>>> {
653  // Can't use sycl::detail::half_impl::StorageT as RawT for both host and
654  // device as it still maps to struct on/ host (even though the struct is a
655  // trivial wrapper around uint16_t), and for ESIMD we need a type which can be
656  // an element of clang vector.
657 #ifdef __SYCL_DEVICE_ONLY__
659  // On device, _Float16 is native Cpp type, so it is the enclosing C++ type
660  using EnclosingCppT = RawT;
661  // On device, operations on half are translated to operations on _Float16,
662  // which is natively supported by the device compiler
663  static inline constexpr bool use_native_cpp_ops = true;
664 #else
665  using RawT = uint16_t;
666  using EnclosingCppT = float;
667  // On host, we can't use native Cpp '+', '-' etc. over uint16_t to emulate the
668  // operations on half type.
669  static inline constexpr bool use_native_cpp_ops = false;
670 #endif // __SYCL_DEVICE_ONLY__
671 
672  static inline constexpr bool is_floating_point = true;
673 };
674 
675 using half_raw = __raw_t<sycl::half>;
676 
677 template <>
678 ESIMD_INLINE sycl::half
679 __esimd_wrapper_type_bitcast_to<sycl::half>(half_raw Val) {
680  return WrapperElementTypeProxy::bitcast_to_half(Val);
681 }
682 
683 template <>
684 ESIMD_INLINE half_raw
685 __esimd_wrapper_type_bitcast_from<sycl::half>(sycl::half Val) {
686  return WrapperElementTypeProxy::bitcast_from_half(Val);
687 }
688 
689 template <>
690 struct is_esimd_arithmetic_type<__raw_t<sycl::half>, void> : std::true_type {};
691 
692 // Misc
693 inline std::ostream &operator<<(std::ostream &O, sycl::half const &rhs) {
694  O << static_cast<float>(rhs);
695  return O;
696 }
697 
698 inline std::istream &operator>>(std::istream &I, sycl::half &rhs) {
699  float ValFloat = 0.0f;
700  I >> ValFloat;
701  rhs = ValFloat;
702  return I;
703 }
704 
705 // The only other place which needs to be updated to support a new type is
706 // the is_wrapper_elem_type_v meta function.
707 
709 // sycl::bfloat16 traits
711 // TODO
712 
713 } // namespace __ESIMD_DNS
714 } // __SYCL_INLINE_NAMESPACE(cl)
715 
cl::__ESIMD_ENS::shr
__ESIMD_API std::enable_if_t< std::is_integral< T0 >::value &&std::is_integral< T1 >::value &&std::is_integral< U >::value, __ESIMD_NS::simd< T0, SZ > > shr(__ESIMD_NS::simd< T1, SZ > src0, U src1, Sat sat={})
Shift right operation (vector version)
Definition: math.hpp:114
cl::sycl::bit_xor
std::bit_xor< T > bit_xor
Definition: functional.hpp:22
T
cl::sycl::detail::device_global_map::add
void add(const void *DeviceGlobalPtr, const char *UniqueId)
Definition: device_global_map.cpp:16
cl::sycl::operator<<
std::ostream & operator<<(std::ostream &Out, backend be)
Definition: backend_types.hpp:46
sycl
Definition: invoke_simd.hpp:68
operator>>
std::istream & operator>>(std::istream &I, cl::sycl::half &rhs)
Definition: half_type.hpp:702
cl::__ESIMD_NS::atomic_op::sub
@ sub
Subtraction: *addr = *addr - src0.
cl::sycl::bit_or
std::bit_or< T > bit_or
Definition: functional.hpp:21
cl::sycl::half
cl::sycl::detail::half_impl::half half
Definition: aliases.hpp:76
cl::sycl::length
float length(T p) __NOEXC
Definition: builtins.hpp:1032
cl::sycl::detail::conditional_t
typename std::conditional< B, T, F >::type conditional_t
Definition: stl_type_traits.hpp:27
cl::__ESIMD_ENS::shl
__ESIMD_API std::enable_if_t< std::is_integral< T0 >::value &&std::is_integral< T1 >::value &&std::is_integral< U >::value, __ESIMD_NS::simd< T0, SZ > > shl(__ESIMD_NS::simd< T1, SZ > src0, U src1, Sat sat={})
Shift left operation (vector version)
Definition: math.hpp:40
cl
We provide new interfaces for matrix muliply in this patch:
Definition: access.hpp:13
cl::sycl::bit_and
std::bit_and< T > bit_and
Definition: functional.hpp:20
types.hpp
cl::__ESIMD_ENS::div
__ESIMD_API std::enable_if_t< std::is_integral< T >::value &&std::is_integral< U >::value, __ESIMD_NS::simd< T, SZ > > div(__ESIMD_NS::simd< T, SZ > &remainder, __ESIMD_NS::simd< T, SZ > src0, U src1)
Integral division with a vector dividend and a scalar divisor.
Definition: math.hpp:541
simd
Definition: simd.hpp:1027
std
Definition: accessor.hpp:2616
half_type.hpp
cl::sycl::detail::half_impl::StorageT
detail::host_half_impl::half_v2 StorageT
Definition: half_type.hpp:301
cl::sycl::plus
std::plus< T > plus
Definition: functional.hpp:18
cl::sycl::detail::enable_if_t
typename std::enable_if< B, T >::type enable_if_t
Definition: stl_type_traits.hpp:24
__SYCL_INLINE_NAMESPACE
#define __SYCL_INLINE_NAMESPACE(X)
Definition: defines_elementary.hpp:12