Namespaces | |
detail | |
Classes | |
struct | memcpy_parameter |
Memory copy parameters for 2D/3D memory data. More... | |
class | logical_group |
The logical-group is a logical collection of some work-items within a work-group. More... | |
class | group_base |
The group_base will dispatch the function call to the specific interface based on the group type. More... | |
class | group |
Container type that can store supported group_types. More... | |
Enumerations | |
enum | memcpy_direction { host_to_host , host_to_device , device_to_host , device_to_device , automatic } |
enum class | group_type { work_group , sub_group , logical_group , root_group } |
Supported group types. More... | |
Functions | |
template<auto F, int SubgroupSize, typename... Args> | |
std::enable_if_t< std::is_invocable_v< decltype(F), Args..., char * >, sycl::event > | launch (sycl::nd_range< 3 > launch_range, std::size_t local_memory_size, sycl::queue queue, Args... args) |
template<auto F, int SubgroupSize, int Dim, typename... Args> | |
std::enable_if_t< std::is_invocable_v< decltype(F), Args..., char * >, sycl::event > | launch (sycl::nd_range< Dim > launch_range, std::size_t local_memory_size, Args... args) |
template<auto F, int SubgroupSize, typename... Args> | |
std::enable_if_t< std::is_invocable_v< decltype(F), Args..., char * >, sycl::event > | launch (::syclcompat::dim3 grid_dim, ::syclcompat::dim3 block_dim, std::size_t local_memory_size, Args... args) |
template<auto F, int SubgroupSize, typename... Args> | |
std::enable_if_t< std::is_invocable_v< decltype(F), Args... >, sycl::event > | launch (sycl::nd_range< 3 > launch_range, sycl::queue queue, Args... args) |
template<auto F, int SubgroupSize, int Dim, typename... Args> | |
std::enable_if_t< std::is_invocable_v< decltype(F), Args... >, sycl::event > | launch (sycl::nd_range< Dim > launch_range, Args... args) |
template<auto F, int SubgroupSize, typename... Args> | |
std::enable_if_t< std::is_invocable_v< decltype(F), Args... >, sycl::event > | launch (::syclcompat::dim3 grid_dim, ::syclcompat::dim3 block_dim, Args... args) |
static void | memcpy (const memcpy_parameter ¶m, sycl::queue q=get_default_queue()) |
[UNSUPPORTED] Synchronously copies 2D/3D memory data specified by param . More... | |
static void | memcpy_async (const memcpy_parameter ¶m, sycl::queue q=get_default_queue()) |
[UNSUPPORTED] Asynchronously copies 2D/3D memory data specified by param More... | |
template<typename T > | |
T | select_from_sub_group (unsigned int member_mask, sycl::sub_group g, T x, int remote_local_id, int logical_sub_group_size=32) |
Masked version of select_from_sub_group, which execute masked sub-group operation. More... | |
template<typename T > | |
T | shift_sub_group_left (unsigned int member_mask, sycl::sub_group g, T x, unsigned int delta, int logical_sub_group_size=32) |
Masked version of shift_sub_group_left, which execute masked sub-group operation. More... | |
template<typename T > | |
T | shift_sub_group_right (unsigned int member_mask, sycl::sub_group g, T x, unsigned int delta, int logical_sub_group_size=32) |
Masked version of shift_sub_group_right, which execute masked sub-group operation. More... | |
template<typename T > | |
T | permute_sub_group_by_xor (unsigned int member_mask, sycl::sub_group g, T x, unsigned int mask, int logical_sub_group_size=32) |
Masked version of permute_sub_group_by_xor, which execute masked sub-group operation. More... | |
template<int dimensions = 3> | |
void | nd_range_barrier (const sycl::nd_item< dimensions > &item, sycl::atomic_ref< unsigned int, barrier_memory_order, sycl::memory_scope::device, sycl::access::address_space::global_space > &counter) |
Synchronize work items from all work groups within a SYCL kernel. More... | |
template<> | |
void | nd_range_barrier (const sycl::nd_item< 1 > &item, sycl::atomic_ref< unsigned int, barrier_memory_order, sycl::memory_scope::device, sycl::access::address_space::global_space > &counter) |
Synchronize work items from all work groups within a SYCL kernel. More... | |
int | calculate_max_active_wg_per_xecore (int *num_wg, int wg_size, int slm_size=0, int sg_size=32, bool used_barrier=false, bool used_large_grf=false) |
This function is used for occupancy calculation, it computes the max active work-group number per Xe-Core. More... | |
int | calculate_max_potential_wg (int *num_wg, int *wg_size, int max_wg_size_for_device_code, int slm_size=0, int sg_size=32, bool used_barrier=false, bool used_large_grf=false) |
This function is used for occupancy calculation, it computes the work-group number and the work-group size which achieves the maximum occupancy of the device potentially. More... | |
Variables | |
constexpr sycl::memory_order | barrier_memory_order = sycl::memory_order::seq_cst |
|
strong |
Enumerator | |
---|---|
host_to_host | |
host_to_device | |
device_to_host | |
device_to_device | |
automatic |
Definition at line 80 of file memory.hpp.
|
inline |
This function is used for occupancy calculation, it computes the max active work-group number per Xe-Core.
Ref to https://github.com/oneapi-src/oneAPI-samples/tree/master/Tools/GPU-Occupancy-Calculator
[out] | num_wg | Active work-group number. |
[in] | wg_size | Work-group size. |
[in] | slm_size | Share local memory size. |
[in] | sg_size | Sub-group size. |
[in] | used_barrier | Whether barrier is used. |
[in] | used_large_grf | Whether large General Register File is used. |
wg_size
exceeds the max work-group size, the max work-group size will be used instead of wg_size
and returns -1. Definition at line 752 of file util.hpp.
References syclcompat::get_current_device().
Referenced by calculate_max_potential_wg().
|
inline |
This function is used for occupancy calculation, it computes the work-group number and the work-group size which achieves the maximum occupancy of the device potentially.
Ref to https://github.com/oneapi-src/oneAPI-samples/tree/master/Tools/GPU-Occupancy-Calculator
[out] | num_wg | Work-group number. |
[out] | wg_size | Work-group size. |
[in] | max_wg_size_for_device_code | The maximum working work-group size for current device code logic. Zero means no limitation. |
[in] | slm_size | Share local memory size. |
[in] | sg_size | Sub-group size. |
[in] | used_barrier | Whether barrier is used. |
[in] | used_large_grf | Whether large General Register File is used. |
Definition at line 817 of file util.hpp.
References calculate_max_active_wg_per_xecore(), syclcompat::get_current_device(), sycl::_V1::device::get_info(), and sycl::_V1::device::has().
std::enable_if_t<std::is_invocable_v<decltype(F), Args...>, sycl::event> syclcompat::experimental::launch | ( | ::syclcompat::dim3 | grid_dim, |
::syclcompat::dim3 | block_dim, | ||
Args... | args | ||
) |
Definition at line 96 of file launch_experimental.hpp.
References syclcompat::get_default_queue(), launch(), and syclcompat::detail::transform_nd_range().
std::enable_if_t<std::is_invocable_v<decltype(F), Args..., char *>, sycl::event> syclcompat::experimental::launch | ( | ::syclcompat::dim3 | grid_dim, |
::syclcompat::dim3 | block_dim, | ||
std::size_t | local_memory_size, | ||
Args... | args | ||
) |
Definition at line 62 of file launch_experimental.hpp.
References syclcompat::get_default_queue(), launch(), and syclcompat::detail::transform_nd_range().
std::enable_if_t<std::is_invocable_v<decltype(F), Args..., char *>, sycl::event> syclcompat::experimental::launch | ( | sycl::nd_range< 3 > | launch_range, |
std::size_t | local_memory_size, | ||
sycl::queue | queue, | ||
Args... | args | ||
) |
Definition at line 38 of file launch_experimental.hpp.
References sycl::_V1::handler::parallel_for(), and sycl::_V1::queue::submit().
Referenced by launch().
std::enable_if_t<std::is_invocable_v<decltype(F), Args...>, sycl::event> syclcompat::experimental::launch | ( | sycl::nd_range< 3 > | launch_range, |
sycl::queue | queue, | ||
Args... | args | ||
) |
Definition at line 76 of file launch_experimental.hpp.
References sycl::_V1::handler::parallel_for(), and sycl::_V1::queue::submit().
std::enable_if_t<std::is_invocable_v<decltype(F), Args...>, sycl::event> syclcompat::experimental::launch | ( | sycl::nd_range< Dim > | launch_range, |
Args... | args | ||
) |
Definition at line 88 of file launch_experimental.hpp.
References syclcompat::get_default_queue(), launch(), and syclcompat::detail::transform_nd_range().
std::enable_if_t<std::is_invocable_v<decltype(F), Args..., char *>, sycl::event> syclcompat::experimental::launch | ( | sycl::nd_range< Dim > | launch_range, |
std::size_t | local_memory_size, | ||
Args... | args | ||
) |
Definition at line 53 of file launch_experimental.hpp.
References syclcompat::get_default_queue(), launch(), and syclcompat::detail::transform_nd_range().
|
inlinestatic |
[UNSUPPORTED] Synchronously copies 2D/3D memory data specified by param
.
The function will return after the copy is completed.
param | Memory copy parameters. |
q | Queue to execute the copy task. |
Definition at line 850 of file memory.hpp.
References syclcompat::experimental::detail::memcpy(), and sycl::_V1::ext::intel::experimental::esimd::wait().
|
inlinestatic |
[UNSUPPORTED] Asynchronously copies 2D/3D memory data specified by param
The return of the function does NOT guarantee the copy is completed.
param | Memory copy parameters. |
q | Queue to execute the copy task. |
Definition at line 861 of file memory.hpp.
References syclcompat::experimental::detail::memcpy().
|
inline |
Synchronize work items from all work groups within a SYCL kernel.
[in] | item | Represents a work group. |
[in] | counter | An atomic object defined on a device memory which can be accessed by work items in all work groups. The initial value of the counter should be zero. Note: Please make sure that all the work items of all work groups within a SYCL kernel can be scheduled actively at the same time on a device. |
Definition at line 639 of file util.hpp.
References sycl::_V1::nd_item< Dimensions >::barrier(), sycl::_V1::nd_item< Dimensions >::get_group(), sycl::_V1::nd_item< Dimensions >::get_group_range(), and sycl::_V1::nd_item< Dimensions >::get_local_linear_id().
|
inline |
Synchronize work items from all work groups within a SYCL kernel.
[in] | item | Represents a work group. |
[in] | counter | An atomic object defined on a device memory which can be accessed by work items in all work groups. The initial value of the counter should be zero. Note: Please make sure that all the work items of all work groups within a SYCL kernel can be scheduled actively at the same time on a device. |
Definition at line 599 of file util.hpp.
References sycl::_V1::nd_item< Dimensions >::barrier(), sycl::_V1::nd_item< Dimensions >::get_group(), sycl::_V1::nd_item< Dimensions >::get_group_range(), and sycl::_V1::nd_item< Dimensions >::get_local_linear_id().
T syclcompat::experimental::permute_sub_group_by_xor | ( | unsigned int | member_mask, |
sycl::sub_group | g, | ||
T | x, | ||
unsigned int | mask, | ||
int | logical_sub_group_size = 32 |
||
) |
Masked version of permute_sub_group_by_xor, which execute masked sub-group operation.
The parameter member_mask indicating the work-items participating the call. Whether the n-th bit is set to 1 representing whether the work-item with id n is participating the call. All work-items named in member_mask must be executed with the same member_mask, or the result is undefined.
T | Input value type |
[in] | member_mask | Input mask |
[in] | g | Input sub_group |
[in] | x | Input value |
[in] | mask | Input mask |
[in] | logical_sub_group_size | Input logical sub_group size |
Definition at line 464 of file util.hpp.
References sycl::_V1::sub_group::get_local_linear_id(), and __spv::Scope::Subgroup.
T syclcompat::experimental::select_from_sub_group | ( | unsigned int | member_mask, |
sycl::sub_group | g, | ||
T | x, | ||
int | remote_local_id, | ||
int | logical_sub_group_size = 32 |
||
) |
Masked version of select_from_sub_group, which execute masked sub-group operation.
The parameter member_mask indicating the work-items participating the call. Whether the n-th bit is set to 1 representing whether the work-item with id n is participating the call. All work-items named in member_mask must be executed with the same member_mask, or the result is undefined.
T | Input value type |
[in] | member_mask | Input mask |
[in] | g | Input sub_group |
[in] | x | Input value |
[in] | remote_local_id | Input source work item id |
[in] | logical_sub_group_size | Input logical sub_group size |
Definition at line 325 of file util.hpp.
References sycl::_V1::sub_group::get_local_linear_id(), and __spv::Scope::Subgroup.
T syclcompat::experimental::shift_sub_group_left | ( | unsigned int | member_mask, |
sycl::sub_group | g, | ||
T | x, | ||
unsigned int | delta, | ||
int | logical_sub_group_size = 32 |
||
) |
Masked version of shift_sub_group_left, which execute masked sub-group operation.
The parameter member_mask indicating the work-items participating the call. Whether the n-th bit is set to 1 representing whether the work-item with id n is participating the call. All work-items named in member_mask must be executed with the same member_mask, or the result is undefined.
T | Input value type |
[in] | member_mask | Input mask |
[in] | g | Input sub_group |
[in] | x | Input value |
[in] | delta | Input delta |
[in] | logical_sub_group_size | Input logical sub_group size |
Definition at line 370 of file util.hpp.
References sycl::_V1::sub_group::get_local_linear_id(), and __spv::Scope::Subgroup.
T syclcompat::experimental::shift_sub_group_right | ( | unsigned int | member_mask, |
sycl::sub_group | g, | ||
T | x, | ||
unsigned int | delta, | ||
int | logical_sub_group_size = 32 |
||
) |
Masked version of shift_sub_group_right, which execute masked sub-group operation.
The parameter member_mask indicating the work-items participating the call. Whether the n-th bit is set to 1 representing whether the work-item with id n is participating the call. All work-items named in member_mask must be executed with the same member_mask, or the result is undefined.
T | Input value type |
[in] | member_mask | Input mask |
[in] | g | Input sub_group |
[in] | x | Input value |
[in] | delta | Input delta |
[in] | logical_sub_group_size | Input logical sub_group size |
Definition at line 418 of file util.hpp.
References sycl::_V1::sub_group::get_local_linear_id(), and __spv::Scope::Subgroup.
|
constexpr |