Classes | |
class | logical_group |
The logical-group is a logical collection of some work-items within a work-group. More... | |
Functions | |
template<typename T > | |
T | select_from_sub_group (unsigned int member_mask, sycl::sub_group g, T x, int remote_local_id, int logical_sub_group_size=32) |
Masked version of select_from_sub_group, which execute masked sub-group operation. More... | |
template<typename T > | |
T | shift_sub_group_left (unsigned int member_mask, sycl::sub_group g, T x, unsigned int delta, int logical_sub_group_size=32) |
Masked version of shift_sub_group_left, which execute masked sub-group operation. More... | |
template<typename T > | |
T | shift_sub_group_right (unsigned int member_mask, sycl::sub_group g, T x, unsigned int delta, int logical_sub_group_size=32) |
Masked version of shift_sub_group_right, which execute masked sub-group operation. More... | |
template<typename T > | |
T | permute_sub_group_by_xor (unsigned int member_mask, sycl::sub_group g, T x, unsigned int mask, int logical_sub_group_size=32) |
Masked version of permute_sub_group_by_xor, which execute masked sub-group operation. More... | |
template<int dimensions = 3> | |
void | nd_range_barrier (sycl::nd_item< dimensions > item, sycl::atomic_ref< unsigned int, sycl::memory_order::acq_rel, sycl::memory_scope::device, sycl::access::address_space::global_space > &counter) |
Synchronize work items from all work groups within a SYCL kernel. More... | |
template<> | |
void | nd_range_barrier (sycl::nd_item< 1 > item, sycl::atomic_ref< unsigned int, sycl::memory_order::acq_rel, sycl::memory_scope::device, sycl::access::address_space::global_space > &counter) |
Synchronize work items from all work groups within a SYCL kernel. More... | |
int | calculate_max_active_wg_per_xecore (int *num_wg, int wg_size, int slm_size=0, int sg_size=32, bool used_barrier=false, bool used_large_grf=false) |
This function is used for occupancy calculation, it computes the max active work-group number per Xe-Core. More... | |
int | calculate_max_potential_wg (int *num_wg, int *wg_size, int max_wg_size_for_device_code, int slm_size=0, int sg_size=32, bool used_barrier=false, bool used_large_grf=false) |
This function is used for occupancy calculation, it computes the work-group number and the work-group size which achieves the maximum occupancy of the device potentially. More... | |
|
inline |
This function is used for occupancy calculation, it computes the max active work-group number per Xe-Core.
Ref to https://github.com/oneapi-src/oneAPI-samples/tree/master/Tools/GPU-Occupancy-Calculator
[out] | num_wg | Active work-group number. |
[in] | wg_size | Work-group size. |
[in] | slm_size | Share local memory size. |
[in] | sg_size | Sub-group size. |
[in] | used_barrier | Whether barrier is used. |
[in] | used_large_grf | Whether large General Register File is used. |
wg_size
exceeds the max work-group size, the max work-group size will be used instead of wg_size
and returns -1. Definition at line 655 of file util.hpp.
References syclcompat::get_current_device().
Referenced by calculate_max_potential_wg().
|
inline |
This function is used for occupancy calculation, it computes the work-group number and the work-group size which achieves the maximum occupancy of the device potentially.
Ref to https://github.com/oneapi-src/oneAPI-samples/tree/master/Tools/GPU-Occupancy-Calculator
[out] | num_wg | Work-group number. |
[out] | wg_size | Work-group size. |
[in] | max_wg_size_for_device_code | The maximum working work-group size for current device code logic. Zero means no limitation. |
[in] | slm_size | Share local memory size. |
[in] | sg_size | Sub-group size. |
[in] | used_barrier | Whether barrier is used. |
[in] | used_large_grf | Whether large General Register File is used. |
Definition at line 720 of file util.hpp.
References calculate_max_active_wg_per_xecore(), syclcompat::get_current_device(), sycl::_V1::device::get_info(), and sycl::_V1::device::has().
|
inline |
Synchronize work items from all work groups within a SYCL kernel.
[in] | item | Represents a work group. |
[in] | counter | An atomic object defined on a device memory which can be accessed by work items in all work groups. The initial value of the counter should be zero. Note: Please make sure that all the work items of all work groups within a SYCL kernel can be scheduled actively at the same time on a device. |
Definition at line 544 of file util.hpp.
References sycl::_V1::nd_item< Dimensions >::barrier(), sycl::_V1::nd_item< Dimensions >::get_group(), sycl::_V1::nd_item< Dimensions >::get_group_range(), and sycl::_V1::nd_item< Dimensions >::get_local_linear_id().
|
inline |
Synchronize work items from all work groups within a SYCL kernel.
[in] | item | Represents a work group. |
[in] | counter | An atomic object defined on a device memory which can be accessed by work items in all work groups. The initial value of the counter should be zero. Note: Please make sure that all the work items of all work groups within a SYCL kernel can be scheduled actively at the same time on a device. |
Definition at line 504 of file util.hpp.
References sycl::_V1::nd_item< Dimensions >::barrier(), sycl::_V1::nd_item< Dimensions >::get_group(), sycl::_V1::nd_item< Dimensions >::get_group_range(), and sycl::_V1::nd_item< Dimensions >::get_local_linear_id().
T syclcompat::experimental::permute_sub_group_by_xor | ( | unsigned int | member_mask, |
sycl::sub_group | g, | ||
T | x, | ||
unsigned int | mask, | ||
int | logical_sub_group_size = 32 |
||
) |
Masked version of permute_sub_group_by_xor, which execute masked sub-group operation.
The parameter member_mask indicating the work-items participating the call. Whether the n-th bit is set to 1 representing whether the work-item with id n is participating the call. All work-items named in member_mask must be executed with the same member_mask, or the result is undefined.
T | Input value type |
[in] | member_mask | Input mask |
[in] | g | Input sub_group |
[in] | x | Input value |
[in] | mask | Input mask |
[in] | logical_sub_group_size | Input logical sub_group size |
Definition at line 449 of file util.hpp.
References sycl::_V1::sub_group::get_local_linear_id(), and __spv::Scope::Subgroup.
T syclcompat::experimental::select_from_sub_group | ( | unsigned int | member_mask, |
sycl::sub_group | g, | ||
T | x, | ||
int | remote_local_id, | ||
int | logical_sub_group_size = 32 |
||
) |
Masked version of select_from_sub_group, which execute masked sub-group operation.
The parameter member_mask indicating the work-items participating the call. Whether the n-th bit is set to 1 representing whether the work-item with id n is participating the call. All work-items named in member_mask must be executed with the same member_mask, or the result is undefined.
T | Input value type |
[in] | member_mask | Input mask |
[in] | g | Input sub_group |
[in] | x | Input value |
[in] | remote_local_id | Input source work item id |
[in] | logical_sub_group_size | Input logical sub_group size |
Definition at line 310 of file util.hpp.
References sycl::_V1::sub_group::get_local_linear_id(), and __spv::Scope::Subgroup.
T syclcompat::experimental::shift_sub_group_left | ( | unsigned int | member_mask, |
sycl::sub_group | g, | ||
T | x, | ||
unsigned int | delta, | ||
int | logical_sub_group_size = 32 |
||
) |
Masked version of shift_sub_group_left, which execute masked sub-group operation.
The parameter member_mask indicating the work-items participating the call. Whether the n-th bit is set to 1 representing whether the work-item with id n is participating the call. All work-items named in member_mask must be executed with the same member_mask, or the result is undefined.
T | Input value type |
[in] | member_mask | Input mask |
[in] | g | Input sub_group |
[in] | x | Input value |
[in] | delta | Input delta |
[in] | logical_sub_group_size | Input logical sub_group size |
Definition at line 355 of file util.hpp.
References sycl::_V1::sub_group::get_local_linear_id(), and __spv::Scope::Subgroup.
T syclcompat::experimental::shift_sub_group_right | ( | unsigned int | member_mask, |
sycl::sub_group | g, | ||
T | x, | ||
unsigned int | delta, | ||
int | logical_sub_group_size = 32 |
||
) |
Masked version of shift_sub_group_right, which execute masked sub-group operation.
The parameter member_mask indicating the work-items participating the call. Whether the n-th bit is set to 1 representing whether the work-item with id n is participating the call. All work-items named in member_mask must be executed with the same member_mask, or the result is undefined.
T | Input value type |
[in] | member_mask | Input mask |
[in] | g | Input sub_group |
[in] | x | Input value |
[in] | delta | Input delta |
[in] | logical_sub_group_size | Input logical sub_group size |
Definition at line 403 of file util.hpp.
References sycl::_V1::sub_group::get_local_linear_id(), and __spv::Scope::Subgroup.