DPC++ Runtime
Runtime libraries for oneAPI DPC++
Memory access API.

ESIMD APIs to access memory via accessors, USM pointers, perform per-element atomic operations. More...

Collaboration diagram for Memory access API.:

Modules

 Atomic memory access.
 Memory access functions which perform per-lane atomic update using given operation. "Per-lane" means that the atomicity guarantees of a vector atomic operation are the same as of N independent scalar atomic operations per lane (N is number of lanes).
 
 Shared local memory access functions.
 
 LSC-specific memory access APIs.
 This group combines types and functions specific to LSC, which is available in Intel GPUs starting from PVC and ACM.
 
 Named barrier APIs.
 
 HW thread .
 

Namespaces

 sycl::_V1::ext::intel::esimd::detail
 

Enumerations

enum  sycl::_V1::ext::intel::esimd::fence_mask : uint8_t {
  sycl::_V1::ext::intel::esimd::global_coherent_fence = 0x1, sycl::_V1::ext::intel::esimd::l3_flush_instructions = 0x2, sycl::_V1::ext::intel::esimd::l3_flush_texture_data = 0x4, sycl::_V1::ext::intel::esimd::l3_flush_constant_data = 0x8,
  sycl::_V1::ext::intel::esimd::l3_flush_rw_data = 0x10, sycl::_V1::ext::intel::esimd::local_barrier = 0x20, sycl::_V1::ext::intel::esimd::l1_flush_ro_data = 0x40, sycl::_V1::ext::intel::esimd::sw_barrier = 0x80
}
 Represetns a bit mask to control behavior of esimd::fence. More...
 

Functions

template<typename AccessorTy >
__ESIMD_API SurfaceIndex sycl::_V1::ext::intel::esimd::get_surface_index (AccessorTy acc)
 Get surface index corresponding to a SYCL accessor. More...
 
template<typename Tx , int N, typename Toffset >
__ESIMD_API simd< Tx, N > sycl::_V1::ext::intel::esimd::gather (const Tx *p, simd< Toffset, N > offsets, simd_mask< N > mask=1)
 Loads ("gathers") elements from different memory locations and returns a vector of them. More...
 
template<typename Tx , int N, typename Toffset , typename RegionTy = region1d_t<Toffset, N, 1>>
__ESIMD_API simd< Tx, N > sycl::_V1::ext::intel::esimd::gather (const Tx *p, simd_view< Toffset, RegionTy > offsets, simd_mask< N > mask=1)
 A variation of gather API with offsets represented as simd_view object. More...
 
template<typename Tx , int N, typename Toffset >
__ESIMD_API std::enable_if_t< std::is_integral_v< Toffset >, simd< Tx, N > > sycl::_V1::ext::intel::esimd::gather (const Tx *p, Toffset offset, simd_mask< N > mask=1)
 A variation of gather API with offsets represented as scalar. More...
 
template<typename Tx , int N, typename Toffset >
__ESIMD_API void sycl::_V1::ext::intel::esimd::scatter (Tx *p, simd< Toffset, N > offsets, simd< Tx, N > vals, simd_mask< N > mask=1)
 Writes ("scatters") elements of the input vector to different memory locations. More...
 
template<typename Tx , int N, typename Toffset , typename RegionTy = region1d_t<Toffset, N, 1>>
__ESIMD_API void sycl::_V1::ext::intel::esimd::scatter (Tx *p, simd_view< Toffset, RegionTy > offsets, simd< Tx, N > vals, simd_mask< N > mask=1)
 A variation of scatter API with offsets represented as simd_view object. More...
 
template<typename Tx , int N, typename Toffset >
__ESIMD_API std::enable_if_t< std::is_integral_v< Toffset > &&N==1 > sycl::_V1::ext::intel::esimd::scatter (Tx *p, Toffset offset, simd< Tx, N > vals, simd_mask< N > mask=1)
 A variation of scatter API with offsets represented as scalar. More...
 
template<typename Tx , int N, typename Flags = overaligned_tag<detail::OperandSize::OWORD>>
__ESIMD_API std::enable_if_t< is_simd_flag_type_v< Flags >, simd< Tx, N > > sycl::_V1::ext::intel::esimd::block_load (const Tx *addr, Flags={})
 Loads a contiguous block of memory from the given memory address addr and returns the loaded data as a vector. More...
 
template<typename Tx , int N, typename AccessorTy , typename Flags = vector_aligned_tag, typename = std::enable_if_t< is_simd_flag_type_v<Flags> && sycl::detail::acc_properties::is_accessor_v<AccessorTy> && !sycl::detail::acc_properties::is_local_accessor_v<AccessorTy>>, class T = detail::__raw_t<Tx>>
__ESIMD_API simd< Tx, N > sycl::_V1::ext::intel::esimd::block_load (AccessorTy acc, uint32_t offset, Flags={})
 Loads a contiguous block of memory from given accessor and offset and returns the loaded data as a vector. More...
 
template<typename Tx , int N, typename Flags = overaligned_tag<detail::OperandSize::OWORD>>
__ESIMD_API __ESIMD_API std::enable_if_t< is_simd_flag_type_v< Flags > > sycl::_V1::ext::intel::esimd::block_store (Tx *addr, simd< Tx, N > vals, Flags={})
 Stores elements of the vector vals to a contiguous block of memory at the given address addr. More...
 
template<typename Tx , int N, typename AccessorTy , class T = detail::__raw_t<Tx>>
__ESIMD_API std::enable_if_t< sycl::detail::acc_properties::is_accessor_v< AccessorTy > &&!sycl::detail::acc_properties::is_local_accessor_v< AccessorTy > > sycl::_V1::ext::intel::esimd::block_store (AccessorTy acc, uint32_t offset, simd< Tx, N > vals)
 Stores elements of a vector to a contiguous block of memory represented by an accessor and an offset within this accessor. More...
 
template<typename T , int N, typename AccessorTy >
__ESIMD_API std::enable_if_t<(sizeof(T)<=4) &&(N==1||N==8||N==16||N==32) &&sycl::detail::acc_properties::is_accessor_v< AccessorTy > &&!sycl::detail::acc_properties::is_local_accessor_v< AccessorTy >, simd< T, N > > sycl::_V1::ext::intel::esimd::gather (AccessorTy acc, simd< uint32_t, N > offsets, uint32_t glob_offset=0, simd_mask< N > mask=1)
 
template<typename T , int N, typename AccessorTy >
__ESIMD_API std::enable_if_t<(sizeof(T)<=4) &&(N==1||N==8||N==16||N==32) &&sycl::detail::acc_properties::is_accessor_v< AccessorTy > &&!sycl::detail::acc_properties::is_local_accessor_v< AccessorTy > > sycl::_V1::ext::intel::esimd::scatter (AccessorTy acc, simd< uint32_t, N > offsets, simd< T, N > vals, uint32_t glob_offset=0, simd_mask< N > mask=1)
 
template<typename T , typename AccessorTy >
__ESIMD_API T sycl::_V1::ext::intel::esimd::scalar_load (AccessorTy acc, uint32_t offset)
 Load a scalar value from an accessor. More...
 
template<typename T , typename AccessorTy >
__ESIMD_API void sycl::_V1::ext::intel::esimd::scalar_store (AccessorTy acc, uint32_t offset, T val)
 Store a scalar value into an accessor. More...
 
template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T , int N, typename Toffset >
__ESIMD_API simd< T, N *get_num_channels_enabled(RGBAMask)> sycl::_V1::ext::intel::esimd::gather_rgba (const T *p, simd< Toffset, N > offsets, simd_mask< N > mask=1)
 
template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T , int N, typename Toffset , typename RegionTy = region1d_t<Toffset, N, 1>>
__ESIMD_API simd< T, N *get_num_channels_enabled(RGBAMask)> sycl::_V1::ext::intel::esimd::gather_rgba (const T *p, simd_view< Toffset, RegionTy > offsets, simd_mask< N > mask=1)
 A variation of gather_rgba API with offsets represented as simd_view object. More...
 
template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T , int N, typename Toffset >
__ESIMD_API std::enable_if_t< std::is_integral_v< Toffset >, simd< T, N *get_num_channels_enabled(RGBAMask)> > sycl::_V1::ext::intel::esimd::gather_rgba (const T *p, Toffset offset, simd_mask< N > mask=1)
 A variation of gather_rgba API with offsets represented as scalar. More...
 
template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T , int N, typename Toffset >
__ESIMD_API void sycl::_V1::ext::intel::esimd::scatter_rgba (T *p, simd< Toffset, N > offsets, simd< T, N *get_num_channels_enabled(RGBAMask)> vals, simd_mask< N > mask=1)
 
template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T , int N, typename Toffset , typename RegionTy = region1d_t<Toffset, N, 1>>
__ESIMD_API void sycl::_V1::ext::intel::esimd::scatter_rgba (T *p, simd_view< Toffset, RegionTy > offsets, simd< T, N *get_num_channels_enabled(RGBAMask)> vals, simd_mask< N > mask=1)
 A variation of scatter_rgba API with offsets represented as simd_view object. More...
 
template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T , int N, typename Toffset >
__ESIMD_API std::enable_if_t< std::is_integral_v< Toffset > &&N==1 > sycl::_V1::ext::intel::esimd::scatter_rgba (T *p, Toffset offset, simd< T, N *get_num_channels_enabled(RGBAMask)> vals, simd_mask< N > mask=1)
 A variation of scatter_rgba API with offsets represented as scalar. More...
 
template<typename T , int N, rgba_channel_mask RGBAMask>
__ESIMD_API std::enable_if_t<(N==8||N==16||N==32) &&sizeof(T)==4 > sycl::_V1::ext::intel::esimd::scatter_rgba (T *p, simd< uint32_t, N > offsets, simd< T, N *get_num_channels_enabled(RGBAMask)> vals, simd_mask< N > mask=1)
 
template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename AccessorT , int N, typename T = typename AccessorT::value_type>
__ESIMD_API std::enable_if_t<((N==8||N==16||N==32) &&sizeof(T)==4 &&!std::is_pointer_v< AccessorT > &&!sycl::detail::acc_properties::is_local_accessor_v< AccessorT >), simd< T, N *get_num_channels_enabled(RGBAMask)> > sycl::_V1::ext::intel::esimd::gather_rgba (AccessorT acc, simd< uint32_t, N > offsets, uint32_t global_offset=0, simd_mask< N > mask=1)
 Gather and transpose pixels from the given memory locations defined by the base specified by acc, the global offset global_offset and a vector of offsets offsets. More...
 
template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename AccessorT , int N, typename T = typename AccessorT::value_type>
__ESIMD_API std::enable_if_t<(N==8||N==16||N==32) &&sizeof(T)==4 &&!std::is_pointer_v< AccessorT > &&!sycl::detail::acc_properties::is_local_accessor_v< AccessorT > > sycl::_V1::ext::intel::esimd::scatter_rgba (AccessorT acc, simd< uint32_t, N > offsets, simd< T, N *get_num_channels_enabled(RGBAMask)> vals, uint32_t global_offset=0, simd_mask< N > mask=1)
 Gather data from the memory addressed by accessor acc, offset common for all loaded elements global_offset and per-element offsets offsets, and return it as simd vector. More...
 
template<uint8_t cntl>
__ESIMD_API void sycl::_V1::ext::intel::esimd::fence ()
 esimd::fence sets the memory read/write order. More...
 
__ESIMD_API void sycl::_V1::ext::intel::esimd::fence (fence_mask cntl)
 
__ESIMD_API void sycl::_V1::ext::intel::esimd::barrier ()
 Generic work-group barrier. More...
 
template<typename T , int m, int N, typename AccessorTy , unsigned plane = 0>
__ESIMD_API simd< T, m *N > sycl::_V1::ext::intel::esimd::media_block_load (AccessorTy acc, unsigned x, unsigned y)
 Media block load. More...
 
template<typename T , int m, int N, typename AccessorTy , unsigned plane = 0>
__ESIMD_API void sycl::_V1::ext::intel::esimd::media_block_store (AccessorTy acc, unsigned x, unsigned y, simd< T, m *N > vals)
 Media block store. More...
 
template<typename Tx , int N, typename AccessorTy , typename Flags = overaligned_tag<detail::OperandSize::OWORD>>
__ESIMD_API std::enable_if_t< sycl::detail::acc_properties::is_local_accessor_v< AccessorTy > &&is_simd_flag_type_v< Flags >, simd< Tx, N > > sycl::_V1::ext::intel::esimd::block_load (AccessorTy acc, uint32_t offset, Flags={})
 Loads a contiguous block of SLM memory referenced by the given local-accessor acc and byte-offset offset, then returns the loaded data as a simd object. More...
 
template<typename Tx , int N, typename AccessorTy , typename Flags = overaligned_tag<detail::OperandSize::OWORD>>
__ESIMD_API std::enable_if_t< sycl::detail::acc_properties::is_local_accessor_v< AccessorTy > &&is_simd_flag_type_v< Flags > > sycl::_V1::ext::intel::esimd::block_store (AccessorTy acc, uint32_t offset, simd< Tx, N > vals, Flags={})
 Variant of block_store that uses local accessor as a parameter. More...
 
template<split_barrier_action flag>
__ESIMD_API void sycl::_V1::ext::intel::experimental::esimd::split_barrier ()
 Generic work-group split barrier. More...
 
__ESIMD_API void sycl::_V1::ext::intel::experimental::esimd::split_barrier (split_barrier_action flag)
 

Detailed Description

ESIMD APIs to access memory via accessors, USM pointers, perform per-element atomic operations.

Enumeration Type Documentation

◆ fence_mask

Represetns a bit mask to control behavior of esimd::fence.

Enum elements define semantics of the bits in the mask.

Enumerator
global_coherent_fence 

“Commit enable” - wait for fence to complete before continuing.

l3_flush_instructions 

Flush the instruction cache.

l3_flush_texture_data 

Flush sampler (texture) cache.

l3_flush_constant_data 

Flush constant cache.

l3_flush_rw_data 

Flush constant cache.

local_barrier 

Issue SLM memory barrier only. If not set, the memory barrier is global.

l1_flush_ro_data 

Flush L1 read - only data cache.

sw_barrier 

Creates a software (compiler) barrier, which does not generate any instruction and only prevents instruction scheduler from reordering instructions across this barrier at compile time.

Definition at line 2012 of file memory.hpp.

Function Documentation

◆ barrier()

__ESIMD_API void sycl::_V1::ext::intel::esimd::barrier ( )

Generic work-group barrier.

Performs barrier synchronization for all threads within the same thread group. The barrier instruction causes the executing thread to wait until all threads in the same thread group have executed the barrier instruction. Memory ordering is also guaranteed by this instruction. The behavior is undefined if this instruction is executed in divergent control flow.

Definition at line 2049 of file memory.hpp.

References sycl::_V1::ext::intel::esimd::global_coherent_fence, and sycl::_V1::ext::intel::esimd::local_barrier.

Referenced by sycl::_V1::ext::intel::experimental::esimd::named_barrier_signal().

◆ block_load() [1/3]

template<typename Tx , int N, typename AccessorTy , typename Flags = vector_aligned_tag, typename = std::enable_if_t< is_simd_flag_type_v<Flags> && sycl::detail::acc_properties::is_accessor_v<AccessorTy> && !sycl::detail::acc_properties::is_local_accessor_v<AccessorTy>>, class T = detail::__raw_t<Tx>>
__ESIMD_API simd<Tx, N> sycl::_V1::ext::intel::esimd::block_load ( AccessorTy  acc,
uint32_t  offset,
Flags  = {} 
)

Loads a contiguous block of memory from given accessor and offset and returns the loaded data as a vector.

Actual code generated depends on the alignment parameter.

Template Parameters
TxElement type.
NNumber of elements to load, N * sizeof(Tx) must be 1, 2, 4 or 8 owords long.
AccessorTyAccessor type (auto-deduced).
FlagsThe alignment specifier type tag. Auto-deduced from the Flags parameter. If it is less than 16, then slower unaligned access is generated, otherwise the access is aligned.
Parameters
accThe accessor.
offsetThe offset to load from in bytes.
FlagsSpecifies the alignment.
Returns
A vector of loaded elements.

Definition at line 313 of file memory.hpp.

◆ block_load() [2/3]

template<typename Tx , int N, typename AccessorTy , typename Flags = overaligned_tag<detail::OperandSize::OWORD>>
__ESIMD_API std::enable_if_t< sycl::detail::acc_properties::is_local_accessor_v<AccessorTy> && is_simd_flag_type_v<Flags>, simd<Tx, N> > sycl::_V1::ext::intel::esimd::block_load ( AccessorTy  acc,
uint32_t  offset,
Flags  = {} 
)

Loads a contiguous block of SLM memory referenced by the given local-accessor acc and byte-offset offset, then returns the loaded data as a simd object.

The generated code depends on the combination {T, N, Flags}. Providing flags specifying the alignment of 16-bytes or more produces more efficient code. If the alignment is smaller than 16-bytes, then less efficient gather is generated. If the loaded vector is too long for 1 flat-load GPU instruction, then a series of flat-loads and/or gathers may be generated.

Template Parameters
TxElement type.
NNumber of elements to load.
AccessorTyAccessor type (auto-deduced).
FlagsThe alignment specifier type tag.
Parameters
accThe local accessor.
offsetThe offset to load from in bytes.
FlagsSpecifies the alignment.
Returns
A vector of loaded elements.

Definition at line 2165 of file memory.hpp.

◆ block_load() [3/3]

template<typename Tx , int N, typename Flags = overaligned_tag<detail::OperandSize::OWORD>>
__ESIMD_API std::enable_if_t<is_simd_flag_type_v<Flags>, simd<Tx, N> > sycl::_V1::ext::intel::esimd::block_load ( const Tx *  addr,
Flags  = {} 
)

Loads a contiguous block of memory from the given memory address addr and returns the loaded data as a vector.

The generated code depends on the combination {T, N, Flags}. Providing flags specifying the alignment of 16-bytes or more produces more efficient code. If the alignment is smaller than 16-bytes, then less efficient gather is generated. If the loaded vector is too long for 1 flat-load GPU instruction, then a series of flat-loads and/or gathers may be generated.

Template Parameters
TxElement type.
NNumber of elements to load.
FlagsThe alignment specifier type tag.
Parameters
addrThe address to load from.
FlagsSpecifies the alignment.
Returns
A vector of loaded elements.

Definition at line 283 of file memory.hpp.

◆ block_store() [1/3]

template<typename Tx , int N, typename AccessorTy , class T = detail::__raw_t<Tx>>
__ESIMD_API std::enable_if_t< sycl::detail::acc_properties::is_accessor_v<AccessorTy> && !sycl::detail::acc_properties::is_local_accessor_v<AccessorTy> > sycl::_V1::ext::intel::esimd::block_store ( AccessorTy  acc,
uint32_t  offset,
simd< Tx, N >  vals 
)

Stores elements of a vector to a contiguous block of memory represented by an accessor and an offset within this accessor.

Template Parameters
TxElement type.
NNumber of elements to store, N * sizeof(Tx) must be 1, 2, 4 or 8 owords long.
AccessorTyAccessor type (auto-deduced).
Parameters
accThe accessor to store to.
offsetThe offset to store at. It is in bytes and must be a multiple of 16.
valsThe vector to store.

Definition at line 386 of file memory.hpp.

References sycl::_V1::ext::intel::esimd::detail::isPowerOf2().

◆ block_store() [2/3]

template<typename Tx , int N, typename AccessorTy , typename Flags = overaligned_tag<detail::OperandSize::OWORD>>
__ESIMD_API std::enable_if_t< sycl::detail::acc_properties::is_local_accessor_v<AccessorTy> && is_simd_flag_type_v<Flags> > sycl::_V1::ext::intel::esimd::block_store ( AccessorTy  acc,
uint32_t  offset,
simd< Tx, N >  vals,
Flags  = {} 
)

Variant of block_store that uses local accessor as a parameter.

Stores elements of the vector vals to a contiguous block of SLM memory represented by the given local accessor and the byte-offset offset. The generated code depends on the combination {T, N, Flags}. Providing flags specifying the alignment of 16-bytes or more produces more efficient code. If the alignment is smaller than 16-bytes, then less efficient scatter is generated. If the stored vector is too long for 1 flat-store GPU instruction, then a series of flat-store and/or scatters may be generated.

Template Parameters
TxElement type.
NNumber of elements to store.
AccessorTyAccessor type (auto-deduced).
Parameters
accThe local accessor to store to.
offsetThe byte-offset to store at.
valsThe vector to store.
FlagsSpecifies the alignment.

Definition at line 2192 of file memory.hpp.

◆ block_store() [3/3]

template<typename Tx , int N, typename Flags = overaligned_tag<detail::OperandSize::OWORD>>
__ESIMD_API __ESIMD_API std::enable_if_t<is_simd_flag_type_v<Flags> > sycl::_V1::ext::intel::esimd::block_store ( Tx *  addr,
simd< Tx, N >  vals,
Flags  = {} 
)

Stores elements of the vector vals to a contiguous block of memory at the given address addr.

The generated code depends on the combination {T, N, Flags}. Providing flags specifying the alignment of 16-bytes or more produces more efficient code. If the alignment is smaller than 16-bytes, then less efficient scatter is generated. If the stored vector is too long for 1 flat-store GPU instruction, then a series of flat-store and/or scatters may be generated.

Template Parameters
TxElement type.
NNumber of elements to store.
FlagsThe alignment specifier type tag.
Parameters
addrThe memory address to store at.
valsThe vector to store.
FlagsSpecifies the alignment.

Definition at line 362 of file memory.hpp.

◆ fence() [1/2]

template<uint8_t cntl>
__ESIMD_API void sycl::_V1::ext::intel::esimd::fence ( )

esimd::fence sets the memory read/write order.

Template Parameters
cntlA bitmask composed from fence_mask bits.

Definition at line 2036 of file memory.hpp.

◆ fence() [2/2]

__ESIMD_API void sycl::_V1::ext::intel::esimd::fence ( fence_mask  cntl)

Definition at line 2039 of file memory.hpp.

◆ gather() [1/4]

template<typename T , int N, typename AccessorTy >
__ESIMD_API std::enable_if_t< (sizeof(T) <= 4) && (N == 1 || N == 8 || N == 16 || N == 32) && sycl::detail::acc_properties::is_accessor_v<AccessorTy> && !sycl::detail::acc_properties::is_local_accessor_v<AccessorTy>, simd<T, N> > sycl::_V1::ext::intel::esimd::gather ( AccessorTy  acc,
simd< uint32_t, N >  offsets,
uint32_t  glob_offset = 0,
simd_mask< N >  mask = 1 
)

Accessor-based gather.

Collects elements located at given offsets in an accessor and returns them as a single simd object. An element can be a 1, 2 or 4-byte value.

Template Parameters
TElement type; can only be a 1,2,4-byte integer, sycl::half or float.
NThe number of vector elements. Can be 1, 8, 16 or 32.
AccessorTyThe accessor type.
Parameters
accThe accessor to gather from.
offsetsPer-element offsets in bytes.
glob_offsetOffset in bytes added to each individual element's offset to compute actual memory access offset for that element.
maskMemory access mask. Elements with zero corresponding mask's predicate are not accessed, their values in the resulting vector are undefined.

Definition at line 528 of file memory.hpp.

◆ gather() [2/4]

template<typename Tx , int N, typename Toffset >
__ESIMD_API simd<Tx, N> sycl::_V1::ext::intel::esimd::gather ( const Tx *  p,
simd< Toffset, N >  offsets,
simd_mask< N >  mask = 1 
)

Loads ("gathers") elements from different memory locations and returns a vector of them.

Each memory location is base address plus an offset - a value of the corresponding element in the input offset vector. Access to any element's memory location can be disabled via the input vector of predicates (mask).

Template Parameters
TxElement type, must be of size 4 or less.
NNumber of elements to read; can be 1, 2, 4, 8, 16 or 32.
Parameters
pThe base address.
offsetsthe vector of 32-bit or 64-bit offsets in bytes. For each lane i, ((byte*)p + offsets[i]) must be element size aligned.
maskThe access mask, defaults to all 1s.
Returns
A vector of elements read. Elements in masked out lanes are undefined.

Definition at line 125 of file memory.hpp.

References sycl::_V1::ext::intel::esimd::detail::isPowerOf2().

◆ gather() [3/4]

template<typename Tx , int N, typename Toffset , typename RegionTy = region1d_t<Toffset, N, 1>>
__ESIMD_API simd<Tx, N> sycl::_V1::ext::intel::esimd::gather ( const Tx *  p,
simd_view< Toffset, RegionTy >  offsets,
simd_mask< N >  mask = 1 
)

A variation of gather API with offsets represented as simd_view object.

Template Parameters
TxElement type, must be of size 4 or less.
NNumber of elements to read; can be 1, 2, 4, 8, 16 or 32.
Parameters
pThe base address.
offsetsthe simd_view of 32-bit or 64-bit offsets in bytes. For each lane i, ((byte*)p + offsets[i]) must be element size aligned.
maskThe access mask, defaults to all 1s.
Returns
A vector of elements read. Elements in masked out lanes are undefined.

Definition at line 165 of file memory.hpp.

◆ gather() [4/4]

template<typename Tx , int N, typename Toffset >
__ESIMD_API std::enable_if_t<std::is_integral_v<Toffset>, simd<Tx, N> > sycl::_V1::ext::intel::esimd::gather ( const Tx *  p,
Toffset  offset,
simd_mask< N >  mask = 1 
)

A variation of gather API with offsets represented as scalar.

Template Parameters
TxElement type, must be of size 4 or less.
NNumber of elements to read; can be 1, 2, 4, 8, 16 or 32.
Parameters
pThe base address.
offsetthe scalar 32-bit or 64-bit offset in bytes. ((byte*)p + offset) must be element size aligned.
maskThe access mask, defaults to all 1s.
Returns
A vector of elements read. Elements in masked out lanes are undefined.

Definition at line 185 of file memory.hpp.

◆ gather_rgba() [1/4]

template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename AccessorT , int N, typename T = typename AccessorT::value_type>
__ESIMD_API std::enable_if_t< ((N == 8 || N == 16 || N == 32) && sizeof(T) == 4 && !std::is_pointer_v<AccessorT> && !sycl::detail::acc_properties::is_local_accessor_v<AccessorT>), simd<T, N * get_num_channels_enabled(RGBAMask)> > sycl::_V1::ext::intel::esimd::gather_rgba ( AccessorT  acc,
simd< uint32_t, N >  offsets,
uint32_t  global_offset = 0,
simd_mask< N >  mask = 1 
)

Gather and transpose pixels from the given memory locations defined by the base specified by acc, the global offset global_offset and a vector of offsets offsets.

Up to 4 32-bit data elements may be accessed at each address depending on the channel mask RGBAMask. Each pixel's address must be 4-byte aligned. For usage examples, see usm_gather_rgba above, the only difference would be the usage of an accessor instead of a usm pointer.

Template Parameters
RGBAMaskA pixel's channel mask.
AccessorTThe accessor type for the memory to be loaded/gathered. The returned vector elements must match the accessor data type. The loaded elements must be 4 bytes in size.
NNumber of pixels to access (matches the size of the offsets vector). Must be 8, 16 or 32.
Parameters
accThe accessor representing memory address of the access.
offsetsByte offsets of the pixels relative to the base pointer.
global_offsetByte offset of the pixels relative to the base pointer.
maskMemory access mask. Pixels with zero corresponding mask's predicate are not accessed. Their values in the resulting vector are undefined.
Returns
Read data - up to N*4 values of type Tx.

Definition at line 884 of file memory.hpp.

References simd_mask< _Tp, _Abi >::data(), and sycl::_V1::ext::intel::esimd::get_surface_index().

◆ gather_rgba() [2/4]

template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T , int N, typename Toffset >
__ESIMD_API simd<T, N * get_num_channels_enabled(RGBAMask)> sycl::_V1::ext::intel::esimd::gather_rgba ( const T *  p,
simd< Toffset, N >  offsets,
simd_mask< N >  mask = 1 
)

Gather and transpose pixels from given memory locations defined by the base pointer p and offsets. Up to 4 32-bit data elements may be accessed at each address depending on the channel mask Mask template parameter. Each pixel's address must be 4 byte aligned. As an example, let's assume we want to read n pixels at address addr, skipping G and B channels. Each channel is a 32-bit float and the pixel data at given address in memory is:

R1 G1 B1 A1 R2 G2 B2 A2 ... Rn Gn Bn An

Then this can be achieved by using

simd<uint32_t, n> byte_offsets(0, 4*4 /* byte size of a single pixel */);
auto x = gather_rgba<float, n, rgba_channel_mask::AR>(addr, byte_offsets);

Returned x will contain 2*n float elements:

R1 R2 ... Rn A1 A2 ... An
Template Parameters
TElement type of the returned vector. Must be 4 bytes in size.
NNumber of pixels to access (matches the size of the offsets vector). Must be 8, 16 or 32.
MaskA pixel's channel mask.
Parameters
pThe USM base pointer representing memory address of the access.
offsetsvector of byte offsets of the pixels relative to the base pointer.
maskMemory access mask. Pixels with zero corresponding mask's predicate are not accessed. Their values in the resulting vector are undefined.
Returns
Read data - up to N*4 values of type Tx.

Definition at line 690 of file memory.hpp.

References simd_mask< _Tp, _Abi >::data().

◆ gather_rgba() [3/4]

template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T , int N, typename Toffset , typename RegionTy = region1d_t<Toffset, N, 1>>
__ESIMD_API simd<T, N * get_num_channels_enabled(RGBAMask)> sycl::_V1::ext::intel::esimd::gather_rgba ( const T *  p,
simd_view< Toffset, RegionTy >  offsets,
simd_mask< N >  mask = 1 
)

A variation of gather_rgba API with offsets represented as simd_view object.

Template Parameters
TElement type of the returned vector. Must be 4 bytes in size.
NNumber of pixels to access (matches the size of the offsets vector). Must be 8, 16 or 32.
MaskA pixel's channel mask.
Parameters
pThe USM base pointer representing memory address of the access.
offsetssimd_view of byte offsets of the pixels relative to the base pointer.
maskMemory access mask. Pixels with zero corresponding mask's predicate are not accessed. Their values in the resulting vector are undefined.
Returns
Read data - up to N*4 values of type Tx.

Definition at line 720 of file memory.hpp.

◆ gather_rgba() [4/4]

template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T , int N, typename Toffset >
__ESIMD_API std::enable_if_t<std::is_integral_v<Toffset>, simd<T, N * get_num_channels_enabled(RGBAMask)> > sycl::_V1::ext::intel::esimd::gather_rgba ( const T *  p,
Toffset  offset,
simd_mask< N >  mask = 1 
)

A variation of gather_rgba API with offsets represented as scalar.

Template Parameters
TElement type of the returned vector. Must be 4 bytes in size.
NNumber of pixels to access (matches the size of the offsets vector). Must be 8, 16 or 32.
MaskA pixel's channel mask.
Parameters
pThe USM base pointer representing memory address of the access.
offsetscalar byte offsets of the pixels relative to the base pointer.
maskMemory access mask. Pixels with zero corresponding mask's predicate are not accessed. Their values in the resulting vector are undefined.
Returns
Read data - up to N*4 values of type Tx.

Definition at line 744 of file memory.hpp.

◆ get_surface_index()

◆ media_block_load()

template<typename T , int m, int N, typename AccessorTy , unsigned plane = 0>
__ESIMD_API simd<T, m * N> sycl::_V1::ext::intel::esimd::media_block_load ( AccessorTy  acc,
unsigned  x,
unsigned  y 
)

Media block load.

Template Parameters
Tis the element data type.
mis the height of the 2D block.
Nis the width of the 2D block.
AccessorTyis type of the SYCL accessor.
planeis planar surface index.
Parameters
accis the SYCL accessor.
xis X-coordinate of the left upper rectangle corner in BYTES.
yis Y-coordinate of the left upper rectangle corner in ROWS.
Returns
the linearized 2D block data read from surface.

Definition at line 2071 of file memory.hpp.

References sycl::_V1::ext::intel::esimd::get_surface_index().

◆ media_block_store()

template<typename T , int m, int N, typename AccessorTy , unsigned plane = 0>
__ESIMD_API void sycl::_V1::ext::intel::esimd::media_block_store ( AccessorTy  acc,
unsigned  x,
unsigned  y,
simd< T, m *N >  vals 
)

Media block store.

Template Parameters
Tis the element data type.
mis the height of the 2D block.
Nis the width of the 2D block.
isAccessorTy type of the SYCL accessor.
planeis planar surface index.
Parameters
accis the SYCL accessor.
xis X-coordinate of the left upper rectangle corner in BYTES.
yis Y-coordinate of the left upper rectangle corner in ROWS.
valsis the linearized 2D block data to be written to surface.

Definition at line 2112 of file memory.hpp.

References sycl::_V1::ext::intel::esimd::get_surface_index().

◆ scalar_load()

template<typename T , typename AccessorTy >
__ESIMD_API T sycl::_V1::ext::intel::esimd::scalar_load ( AccessorTy  acc,
uint32_t  offset 
)

Load a scalar value from an accessor.

Template Parameters
TType of the value.
AccessorTyType of the accessor.
Parameters
accAccessor to load from.
offsetOffset in bytes.
Returns
The loaded value.

Definition at line 623 of file memory.hpp.

◆ scalar_store()

template<typename T , typename AccessorTy >
__ESIMD_API void sycl::_V1::ext::intel::esimd::scalar_store ( AccessorTy  acc,
uint32_t  offset,
val 
)

Store a scalar value into an accessor.

Template Parameters
TType of the value.
AccessorTyType of the accessor.
Parameters
accAccessor to store to.
offsetOffset in bytes.
valThe stored value.

Definition at line 643 of file memory.hpp.

◆ scatter() [1/4]

template<typename T , int N, typename AccessorTy >
__ESIMD_API std::enable_if_t< (sizeof(T) <= 4) && (N == 1 || N == 8 || N == 16 || N == 32) && sycl::detail::acc_properties::is_accessor_v<AccessorTy> && !sycl::detail::acc_properties::is_local_accessor_v<AccessorTy> > sycl::_V1::ext::intel::esimd::scatter ( AccessorTy  acc,
simd< uint32_t, N >  offsets,
simd< T, N >  vals,
uint32_t  glob_offset = 0,
simd_mask< N >  mask = 1 
)

Accessor-based scatter.

Writes elements of a simd object into an accessor at given offsets. An element can be a 1, 2 or 4-byte value.

Template Parameters
TElement type; can only be a 1,2,4-byte integer, sycl::half or float.
NThe number of vector elements. Can be 1, 8, 16 or 32.
AccessorTyThe accessor type.
Parameters
accThe accessor to scatter to.
offsetsPer-element offsets in bytes.
valsValues to write.
glob_offsetOffset in bytes added to each individual element's offset to compute actual memory access offset for that element.
maskMemory access mask. Elements with zero corresponding mask's predicate are not accessed.

Definition at line 581 of file memory.hpp.

◆ scatter() [2/4]

template<typename Tx , int N, typename Toffset >
__ESIMD_API void sycl::_V1::ext::intel::esimd::scatter ( Tx *  p,
simd< Toffset, N >  offsets,
simd< Tx, N >  vals,
simd_mask< N >  mask = 1 
)

Writes ("scatters") elements of the input vector to different memory locations.

Each memory location is base address plus an offset - a value of the corresponding element in the input offset vector. Access to any element's memory location can be disabled via the input mask.

Template Parameters
TxElement type, must be of size 4 or less.
NNumber of elements to write; can be 1, 2, 4, 8, 16 or 32.
Parameters
pThe base address.
offsetsA vector of 32-bit or 64-bit offsets in bytes. For each lane i, ((byte*)p + offsets[i]) must be element size aligned.
valsThe vector to scatter.
maskThe access mask, defaults to all 1s.

Definition at line 203 of file memory.hpp.

References sycl::_V1::ext::intel::esimd::detail::isPowerOf2().

◆ scatter() [3/4]

template<typename Tx , int N, typename Toffset , typename RegionTy = region1d_t<Toffset, N, 1>>
__ESIMD_API void sycl::_V1::ext::intel::esimd::scatter ( Tx *  p,
simd_view< Toffset, RegionTy >  offsets,
simd< Tx, N >  vals,
simd_mask< N >  mask = 1 
)

A variation of scatter API with offsets represented as simd_view object.

Template Parameters
TxElement type, must be of size 4 or less.
NNumber of elements to write; can be 1, 2, 4, 8, 16 or 32.
Parameters
pThe base address.
offsetsA simd_view of 32-bit or 64-bit offsets in bytes. For each lane i, ((byte*)p + offsets[i]) must be element size aligned.
valsThe vector to scatter.
maskThe access mask, defaults to all 1s.

Definition at line 243 of file memory.hpp.

◆ scatter() [4/4]

template<typename Tx , int N, typename Toffset >
__ESIMD_API std::enable_if_t<std::is_integral_v<Toffset> && N == 1> sycl::_V1::ext::intel::esimd::scatter ( Tx *  p,
Toffset  offset,
simd< Tx, N >  vals,
simd_mask< N >  mask = 1 
)

A variation of scatter API with offsets represented as scalar.

Template Parameters
TxElement type, must be of size 4 or less.
NNumber of elements to write; can be 1, 2, 4, 8, 16 or 32.
Parameters
pThe base address.
offsetthe scalar 32-bit or 64-bit offset in bytes. ((byte*)p + offset) must be element size aligned.
valsThe vector to scatter.
maskThe access mask, defaults to all 1s.

Definition at line 261 of file memory.hpp.

◆ scatter_rgba() [1/5]

template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename AccessorT , int N, typename T = typename AccessorT::value_type>
__ESIMD_API std::enable_if_t< (N == 8 || N == 16 || N == 32) && sizeof(T) == 4 && !std::is_pointer_v<AccessorT> && !sycl::detail::acc_properties::is_local_accessor_v<AccessorT> > sycl::_V1::ext::intel::esimd::scatter_rgba ( AccessorT  acc,
simd< uint32_t, N >  offsets,
simd< T, N *get_num_channels_enabled(RGBAMask)>  vals,
uint32_t  global_offset = 0,
simd_mask< N >  mask = 1 
)

Gather data from the memory addressed by accessor acc, offset common for all loaded elements global_offset and per-element offsets offsets, and return it as simd vector.

See usm_gather_rgba for information about the operation semantics and parameter restrictions/interdependencies.

Template Parameters
RGBAMaskPixel's channel mask.
AccessorTThe accessor type for the memory to be stored/scattered. The returned vector elements must match the accessor data type. The loaded elements must be 4 bytes in size.
NThe number of elements to access.
Parameters
offsetsByte offsets of each element.
valsvalues to be written.
global_offsetByte offset of the pixels relative to the base pointer.
maskOperation mask. All-1 by default.

Definition at line 941 of file memory.hpp.

References simd_mask< _Tp, _Abi >::data(), and sycl::_V1::ext::intel::esimd::get_surface_index().

◆ scatter_rgba() [2/5]

template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T , int N, typename Toffset >
__ESIMD_API void sycl::_V1::ext::intel::esimd::scatter_rgba ( T *  p,
simd< Toffset, N >  offsets,
simd< T, N *get_num_channels_enabled(RGBAMask)>  vals,
simd_mask< N >  mask = 1 
)

Transpose and scatter pixels to given memory locations defined by the base pointer p and offsets. Up to 4 32-bit data elements may be accessed at each address depending on the channel mask RGBAMask. Each pixel's address must be 4 byte aligned. This is basically an inverse operation for gather_rgba. Unlike gather_rgba, this function imposes restrictions on possible Mask template argument values. It can only be one of the following: ABGR, BGR, GR, R.

Template Parameters
TElement type of the returned vector. Must be 4 bytes in size.
NNumber of pixels to access (matches the size of the offsets vector). Must be 8, 16 or 32.
RGBAMaskA pixel's channel mask.
Parameters
pThe USM base pointer representing memory address of the access.
valsvalues to be written.
offsetsvector of byte offsets of the pixels relative to the base pointer.
maskMemory access mask. Pixels with zero corresponding mask's predicate are not accessed. Their values in the resulting vector are undefined.

Definition at line 781 of file memory.hpp.

References simd_mask< _Tp, _Abi >::data().

◆ scatter_rgba() [3/5]

template<typename T , int N, rgba_channel_mask RGBAMask>
__ESIMD_API std:: enable_if_t<(N == 8 || N == 16 || N == 32) && sizeof(T) == 4> sycl::_V1::ext::intel::esimd::scatter_rgba ( T *  p,
simd< uint32_t, N >  offsets,
simd< T, N *get_num_channels_enabled(RGBAMask)>  vals,
simd_mask< N >  mask = 1 
)

Definition at line 847 of file memory.hpp.

◆ scatter_rgba() [4/5]

template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T , int N, typename Toffset , typename RegionTy = region1d_t<Toffset, N, 1>>
__ESIMD_API void sycl::_V1::ext::intel::esimd::scatter_rgba ( T *  p,
simd_view< Toffset, RegionTy >  offsets,
simd< T, N *get_num_channels_enabled(RGBAMask)>  vals,
simd_mask< N >  mask = 1 
)

A variation of scatter_rgba API with offsets represented as simd_view object.

Template Parameters
TElement type of the returned vector. Must be 4 bytes in size.
NNumber of pixels to access (matches the size of the offsets vector). Must be 8, 16 or 32.
RGBAMaskA pixel's channel mask.
Parameters
pThe USM base pointer representing memory address of the access.
valsvalues to be written.
offsetssimd_view of byte offsets of the pixels relative to the base pointer.
maskMemory access mask. Pixels with zero corresponding mask's predicate are not accessed. Their values in the resulting vector are undefined.

Definition at line 814 of file memory.hpp.

◆ scatter_rgba() [5/5]

template<rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T , int N, typename Toffset >
__ESIMD_API std::enable_if_t<std::is_integral_v<Toffset> && N == 1> sycl::_V1::ext::intel::esimd::scatter_rgba ( T *  p,
Toffset  offset,
simd< T, N *get_num_channels_enabled(RGBAMask)>  vals,
simd_mask< N >  mask = 1 
)

A variation of scatter_rgba API with offsets represented as scalar.

Template Parameters
TElement type of the returned vector. Must be 4 bytes in size.
NNumber of pixels to access (matches the size of the offsets vector). Must be 8, 16 or 32.
RGBAMaskA pixel's channel mask.
Parameters
pThe USM base pointer representing memory address of the access.
valsvalues to be written.
offsetscalar byte offset of the pixels relative to the base pointer.
maskMemory access mask. Pixels with zero corresponding mask's predicate are not accessed. Their values in the resulting vector are undefined.

Definition at line 838 of file memory.hpp.

◆ split_barrier() [1/2]

template<split_barrier_action flag>
__ESIMD_API void sycl::_V1::ext::intel::experimental::esimd::split_barrier ( )

Generic work-group split barrier.

Template Parameters
flag- split barrier action.

Definition at line 28 of file memory.hpp.

◆ split_barrier() [2/2]

__ESIMD_API void sycl::_V1::ext::intel::experimental::esimd::split_barrier ( split_barrier_action  flag)

Definition at line 33 of file memory.hpp.

syclcompat::local_id::x
size_t x()
Definition: id_query.hpp:54
simd
Definition: simd.hpp:1030