Detailed Description

template<typename T, uint32_t SZ, uint32_t N, reduce_op Op, uint32_t N_SG, bool is_all_reduce = true, gpu_arch arch_ = gpu_arch::Xe>
struct gpu::xetla::group::group_reduce_t< T, SZ, N, Op, N_SG, is_all_reduce, arch_ >

This is the group reduction.

Use slm to exchange the data.

Template Parameters

T	Is the data type to do the reduction
SZ	Is the vector size per item
N	Is the number of independent sets for one subgroup to do the parallel all-reduction
Op	Is the reduction op
N_SG	Is the number of subgroups that participate in this reduction.
is_all_reduce	Is the flag to enable all_reduce. If it is false, only sg_id 0 will have the updated result; otherwise all the N_SG subgroups will have the updated result.
arch_	Is the HW generation.