template<typename group_swizzle_policy_, int global_ratio_ = 1, int local_ratio_ = 1>
struct gpu::xetla::kernel::dispatch_policy_kslicing< group_swizzle_policy_, global_ratio_, local_ratio_ >
Kslicing GEMM_UNIVERSAL implementation.
A special GEMM_UNIVERSAL implementation to increase the hardware occupancy by splitting the GEMM_UNIVERSAL task along k dimension. It includes inter-group reduction (by using global atomic) and intra-group reduction (by using local memory for data exchange).
- Template Parameters
-
| num_global_kslicing_ | Is the k dim split ratio between groups. |
| num_local_kslicing_ | Is the k dim split ratio within a group. |
| arch_tag_ | Is the HW architecture. |