DPC++ Runtime
Runtime libraries for oneAPI DPC++
syclcompat::experimental Namespace Reference

Namespaces

 detail
 

Classes

struct  memcpy_parameter
 Memory copy parameters for 2D/3D memory data. More...
 
class  logical_group
 The logical-group is a logical collection of some work-items within a work-group. More...
 
class  group_base
 The group_base will dispatch the function call to the specific interface based on the group type. More...
 
class  group
 Container type that can store supported group_types. More...
 

Enumerations

enum  memcpy_direction {
  host_to_host , host_to_device , device_to_host , device_to_device ,
  automatic
}
 
enum class  group_type { work_group , sub_group , logical_group , root_group }
 Supported group types. More...
 

Functions

template<auto F, int SubgroupSize, typename... Args>
std::enable_if_t< std::is_invocable_v< decltype(F), Args..., char * >, sycl::eventlaunch (sycl::nd_range< 3 > launch_range, std::size_t local_memory_size, sycl::queue queue, Args... args)
 
template<auto F, int SubgroupSize, int Dim, typename... Args>
std::enable_if_t< std::is_invocable_v< decltype(F), Args..., char * >, sycl::eventlaunch (sycl::nd_range< Dim > launch_range, std::size_t local_memory_size, Args... args)
 
template<auto F, int SubgroupSize, typename... Args>
std::enable_if_t< std::is_invocable_v< decltype(F), Args..., char * >, sycl::eventlaunch (::syclcompat::dim3 grid_dim, ::syclcompat::dim3 block_dim, std::size_t local_memory_size, Args... args)
 
template<auto F, int SubgroupSize, typename... Args>
std::enable_if_t< std::is_invocable_v< decltype(F), Args... >, sycl::eventlaunch (sycl::nd_range< 3 > launch_range, sycl::queue queue, Args... args)
 
template<auto F, int SubgroupSize, int Dim, typename... Args>
std::enable_if_t< std::is_invocable_v< decltype(F), Args... >, sycl::eventlaunch (sycl::nd_range< Dim > launch_range, Args... args)
 
template<auto F, int SubgroupSize, typename... Args>
std::enable_if_t< std::is_invocable_v< decltype(F), Args... >, sycl::eventlaunch (::syclcompat::dim3 grid_dim, ::syclcompat::dim3 block_dim, Args... args)
 
static void memcpy (const memcpy_parameter &param, sycl::queue q=get_default_queue())
 [UNSUPPORTED] Synchronously copies 2D/3D memory data specified by param . More...
 
static void memcpy_async (const memcpy_parameter &param, sycl::queue q=get_default_queue())
 [UNSUPPORTED] Asynchronously copies 2D/3D memory data specified by param More...
 
template<typename T >
select_from_sub_group (unsigned int member_mask, sycl::sub_group g, T x, int remote_local_id, int logical_sub_group_size=32)
 Masked version of select_from_sub_group, which execute masked sub-group operation. More...
 
template<typename T >
shift_sub_group_left (unsigned int member_mask, sycl::sub_group g, T x, unsigned int delta, int logical_sub_group_size=32)
 Masked version of shift_sub_group_left, which execute masked sub-group operation. More...
 
template<typename T >
shift_sub_group_right (unsigned int member_mask, sycl::sub_group g, T x, unsigned int delta, int logical_sub_group_size=32)
 Masked version of shift_sub_group_right, which execute masked sub-group operation. More...
 
template<typename T >
permute_sub_group_by_xor (unsigned int member_mask, sycl::sub_group g, T x, unsigned int mask, int logical_sub_group_size=32)
 Masked version of permute_sub_group_by_xor, which execute masked sub-group operation. More...
 
template<int dimensions = 3>
void nd_range_barrier (const sycl::nd_item< dimensions > &item, sycl::atomic_ref< unsigned int, barrier_memory_order, sycl::memory_scope::device, sycl::access::address_space::global_space > &counter)
 Synchronize work items from all work groups within a SYCL kernel. More...
 
template<>
void nd_range_barrier (const sycl::nd_item< 1 > &item, sycl::atomic_ref< unsigned int, barrier_memory_order, sycl::memory_scope::device, sycl::access::address_space::global_space > &counter)
 Synchronize work items from all work groups within a SYCL kernel. More...
 
int calculate_max_active_wg_per_xecore (int *num_wg, int wg_size, int slm_size=0, int sg_size=32, bool used_barrier=false, bool used_large_grf=false)
 This function is used for occupancy calculation, it computes the max active work-group number per Xe-Core. More...
 
int calculate_max_potential_wg (int *num_wg, int *wg_size, int max_wg_size_for_device_code, int slm_size=0, int sg_size=32, bool used_barrier=false, bool used_large_grf=false)
 This function is used for occupancy calculation, it computes the work-group number and the work-group size which achieves the maximum occupancy of the device potentially. More...
 

Variables

constexpr sycl::memory_order barrier_memory_order = sycl::memory_order::seq_cst
 

Enumeration Type Documentation

◆ group_type

Supported group types.

Enumerator
work_group 
sub_group 
logical_group 
root_group 

Definition at line 843 of file util.hpp.

◆ memcpy_direction

Enumerator
host_to_host 
host_to_device 
device_to_host 
device_to_device 
automatic 

Definition at line 80 of file memory.hpp.

Function Documentation

◆ calculate_max_active_wg_per_xecore()

int syclcompat::experimental::calculate_max_active_wg_per_xecore ( int *  num_wg,
int  wg_size,
int  slm_size = 0,
int  sg_size = 32,
bool  used_barrier = false,
bool  used_large_grf = false 
)
inline

This function is used for occupancy calculation, it computes the max active work-group number per Xe-Core.

Ref to https://github.com/oneapi-src/oneAPI-samples/tree/master/Tools/GPU-Occupancy-Calculator

Parameters
[out]num_wgActive work-group number.
[in]wg_sizeWork-group size.
[in]slm_sizeShare local memory size.
[in]sg_sizeSub-group size.
[in]used_barrierWhether barrier is used.
[in]used_large_grfWhether large General Register File is used.
Returns
If no error, returns 0. If wg_size exceeds the max work-group size, the max work-group size will be used instead of wg_size and returns -1.

Definition at line 752 of file util.hpp.

References syclcompat::get_current_device().

Referenced by calculate_max_potential_wg().

◆ calculate_max_potential_wg()

int syclcompat::experimental::calculate_max_potential_wg ( int *  num_wg,
int *  wg_size,
int  max_wg_size_for_device_code,
int  slm_size = 0,
int  sg_size = 32,
bool  used_barrier = false,
bool  used_large_grf = false 
)
inline

This function is used for occupancy calculation, it computes the work-group number and the work-group size which achieves the maximum occupancy of the device potentially.

Ref to https://github.com/oneapi-src/oneAPI-samples/tree/master/Tools/GPU-Occupancy-Calculator

Parameters
[out]num_wgWork-group number.
[out]wg_sizeWork-group size.
[in]max_wg_size_for_device_codeThe maximum working work-group size for current device code logic. Zero means no limitation.
[in]slm_sizeShare local memory size.
[in]sg_sizeSub-group size.
[in]used_barrierWhether barrier is used.
[in]used_large_grfWhether large General Register File is used.
Returns
Returns 0.

Definition at line 817 of file util.hpp.

References calculate_max_active_wg_per_xecore(), syclcompat::get_current_device(), sycl::_V1::device::get_info(), and sycl::_V1::device::has().

◆ launch() [1/6]

template<auto F, int SubgroupSize, typename... Args>
std::enable_if_t<std::is_invocable_v<decltype(F), Args...>, sycl::event> syclcompat::experimental::launch ( ::syclcompat::dim3  grid_dim,
::syclcompat::dim3  block_dim,
Args...  args 
)

◆ launch() [2/6]

template<auto F, int SubgroupSize, typename... Args>
std::enable_if_t<std::is_invocable_v<decltype(F), Args..., char *>, sycl::event> syclcompat::experimental::launch ( ::syclcompat::dim3  grid_dim,
::syclcompat::dim3  block_dim,
std::size_t  local_memory_size,
Args...  args 
)

◆ launch() [3/6]

template<auto F, int SubgroupSize, typename... Args>
std::enable_if_t<std::is_invocable_v<decltype(F), Args..., char *>, sycl::event> syclcompat::experimental::launch ( sycl::nd_range< 3 >  launch_range,
std::size_t  local_memory_size,
sycl::queue  queue,
Args...  args 
)

Definition at line 38 of file launch_experimental.hpp.

References sycl::_V1::handler::parallel_for(), and sycl::_V1::queue::submit().

Referenced by launch().

◆ launch() [4/6]

template<auto F, int SubgroupSize, typename... Args>
std::enable_if_t<std::is_invocable_v<decltype(F), Args...>, sycl::event> syclcompat::experimental::launch ( sycl::nd_range< 3 >  launch_range,
sycl::queue  queue,
Args...  args 
)

◆ launch() [5/6]

template<auto F, int SubgroupSize, int Dim, typename... Args>
std::enable_if_t<std::is_invocable_v<decltype(F), Args...>, sycl::event> syclcompat::experimental::launch ( sycl::nd_range< Dim >  launch_range,
Args...  args 
)

◆ launch() [6/6]

template<auto F, int SubgroupSize, int Dim, typename... Args>
std::enable_if_t<std::is_invocable_v<decltype(F), Args..., char *>, sycl::event> syclcompat::experimental::launch ( sycl::nd_range< Dim >  launch_range,
std::size_t  local_memory_size,
Args...  args 
)

◆ memcpy()

static void syclcompat::experimental::memcpy ( const memcpy_parameter param,
sycl::queue  q = get_default_queue() 
)
inlinestatic

[UNSUPPORTED] Synchronously copies 2D/3D memory data specified by param .

The function will return after the copy is completed.

Parameters
paramMemory copy parameters.
qQueue to execute the copy task.
Returns
no return value.

Definition at line 850 of file memory.hpp.

References syclcompat::experimental::detail::memcpy(), and sycl::_V1::ext::intel::experimental::esimd::wait().

◆ memcpy_async()

static void syclcompat::experimental::memcpy_async ( const memcpy_parameter param,
sycl::queue  q = get_default_queue() 
)
inlinestatic

[UNSUPPORTED] Asynchronously copies 2D/3D memory data specified by param

The return of the function does NOT guarantee the copy is completed.

Parameters
paramMemory copy parameters.
qQueue to execute the copy task.
Returns
no return value.

Definition at line 861 of file memory.hpp.

References syclcompat::experimental::detail::memcpy().

◆ nd_range_barrier() [1/2]

template<>
void syclcompat::experimental::nd_range_barrier ( const sycl::nd_item< 1 > &  item,
sycl::atomic_ref< unsigned int, barrier_memory_order, sycl::memory_scope::device, sycl::access::address_space::global_space > &  counter 
)
inline

Synchronize work items from all work groups within a SYCL kernel.

Parameters
[in]itemRepresents a work group.
[in]counterAn atomic object defined on a device memory which can be accessed by work items in all work groups. The initial value of the counter should be zero. Note: Please make sure that all the work items of all work groups within a SYCL kernel can be scheduled actively at the same time on a device.

Definition at line 639 of file util.hpp.

References sycl::_V1::nd_item< Dimensions >::barrier(), sycl::_V1::nd_item< Dimensions >::get_group(), sycl::_V1::nd_item< Dimensions >::get_group_range(), and sycl::_V1::nd_item< Dimensions >::get_local_linear_id().

◆ nd_range_barrier() [2/2]

template<int dimensions = 3>
void syclcompat::experimental::nd_range_barrier ( const sycl::nd_item< dimensions > &  item,
sycl::atomic_ref< unsigned int, barrier_memory_order, sycl::memory_scope::device, sycl::access::address_space::global_space > &  counter 
)
inline

Synchronize work items from all work groups within a SYCL kernel.

Parameters
[in]itemRepresents a work group.
[in]counterAn atomic object defined on a device memory which can be accessed by work items in all work groups. The initial value of the counter should be zero. Note: Please make sure that all the work items of all work groups within a SYCL kernel can be scheduled actively at the same time on a device.

Definition at line 599 of file util.hpp.

References sycl::_V1::nd_item< Dimensions >::barrier(), sycl::_V1::nd_item< Dimensions >::get_group(), sycl::_V1::nd_item< Dimensions >::get_group_range(), and sycl::_V1::nd_item< Dimensions >::get_local_linear_id().

◆ permute_sub_group_by_xor()

template<typename T >
T syclcompat::experimental::permute_sub_group_by_xor ( unsigned int  member_mask,
sycl::sub_group  g,
x,
unsigned int  mask,
int  logical_sub_group_size = 32 
)

Masked version of permute_sub_group_by_xor, which execute masked sub-group operation.

The parameter member_mask indicating the work-items participating the call. Whether the n-th bit is set to 1 representing whether the work-item with id n is participating the call. All work-items named in member_mask must be executed with the same member_mask, or the result is undefined.

Template Parameters
TInput value type
Parameters
[in]member_maskInput mask
[in]gInput sub_group
[in]xInput value
[in]maskInput mask
[in]logical_sub_group_sizeInput logical sub_group size
Returns
The result

Definition at line 464 of file util.hpp.

References sycl::_V1::sub_group::get_local_linear_id(), and __spv::Scope::Subgroup.

◆ select_from_sub_group()

template<typename T >
T syclcompat::experimental::select_from_sub_group ( unsigned int  member_mask,
sycl::sub_group  g,
x,
int  remote_local_id,
int  logical_sub_group_size = 32 
)

Masked version of select_from_sub_group, which execute masked sub-group operation.

The parameter member_mask indicating the work-items participating the call. Whether the n-th bit is set to 1 representing whether the work-item with id n is participating the call. All work-items named in member_mask must be executed with the same member_mask, or the result is undefined.

Template Parameters
TInput value type
Parameters
[in]member_maskInput mask
[in]gInput sub_group
[in]xInput value
[in]remote_local_idInput source work item id
[in]logical_sub_group_sizeInput logical sub_group size
Returns
The result

Definition at line 325 of file util.hpp.

References sycl::_V1::sub_group::get_local_linear_id(), and __spv::Scope::Subgroup.

◆ shift_sub_group_left()

template<typename T >
T syclcompat::experimental::shift_sub_group_left ( unsigned int  member_mask,
sycl::sub_group  g,
x,
unsigned int  delta,
int  logical_sub_group_size = 32 
)

Masked version of shift_sub_group_left, which execute masked sub-group operation.

The parameter member_mask indicating the work-items participating the call. Whether the n-th bit is set to 1 representing whether the work-item with id n is participating the call. All work-items named in member_mask must be executed with the same member_mask, or the result is undefined.

Template Parameters
TInput value type
Parameters
[in]member_maskInput mask
[in]gInput sub_group
[in]xInput value
[in]deltaInput delta
[in]logical_sub_group_sizeInput logical sub_group size
Returns
The result

Definition at line 370 of file util.hpp.

References sycl::_V1::sub_group::get_local_linear_id(), and __spv::Scope::Subgroup.

◆ shift_sub_group_right()

template<typename T >
T syclcompat::experimental::shift_sub_group_right ( unsigned int  member_mask,
sycl::sub_group  g,
x,
unsigned int  delta,
int  logical_sub_group_size = 32 
)

Masked version of shift_sub_group_right, which execute masked sub-group operation.

The parameter member_mask indicating the work-items participating the call. Whether the n-th bit is set to 1 representing whether the work-item with id n is participating the call. All work-items named in member_mask must be executed with the same member_mask, or the result is undefined.

Template Parameters
TInput value type
Parameters
[in]member_maskInput mask
[in]gInput sub_group
[in]xInput value
[in]deltaInput delta
[in]logical_sub_group_sizeInput logical sub_group size
Returns
The result

Definition at line 418 of file util.hpp.

References sycl::_V1::sub_group::get_local_linear_id(), and __spv::Scope::Subgroup.

Variable Documentation

◆ barrier_memory_order

constexpr sycl::memory_order syclcompat::experimental::barrier_memory_order = sycl::memory_order::seq_cst
constexpr

Definition at line 588 of file util.hpp.