template<typename group_swizzle_policy_, int num_global_kslicing_ = 1, int num_local_kslicing_ = 1>
struct gpu::xetla::kernel::dispatch_policy_int4_dequantize_kslicing< group_swizzle_policy_, num_global_kslicing_, num_local_kslicing_ >
4bit kslicing GEMM implementation.
A special GEMM implementation to increase the hardware occupancy by splitting the GEMM task along k dimension. It includes inter-group reduction (by using global atomic) and intra-group reduction (by using local memory for data exchange).
- Note
- The difference compare with dispatch_policy_kslicing is we will add additional handling for 4bit.
- Template Parameters
-
| num_global_kslicing_ | Is the k dim split ratio between groups. |
| num_local_kslicing_ | Is the k dim split ratio within a group. |
| arch_tag_ | Is the HW architecture. |