Offline generation#
OpenCL-C for FFT kernels source code can be generated offline using the following function:
-
std::vector<std::string> bbfft::generate_fft_kernels(std::ostream &os, std::vector<configuration> const &cfgs, device_info const &info)#
Generate FFT kernel code for configuration and device.
- Parameters:
os – Output stream (e.g. std::cout)
cfgs – configurations
info – Properties of target device
- Returns:
kernel names
Device info#
FFTs are specialized for the target device based on information stored in the following struct:
-
struct device_info#
Parameters of target device.
Public Functions
-
std::size_t min_subgroup_size() const#
Minimum subgroup size.
-
std::size_t max_subgroup_size() const#
Maximum subgroup size.
-
std::size_t register_space_min() const#
Min size of register file in bytes.
-
std::size_t register_space_max() const#
Max size of register file in bytes.
-
std::string to_string() const#
convert device info to string
-
bool operator==(device_info const &other) const#
equality check
-
bool operator!=(device_info const &other) const#
inequality check
Public Members
-
std::size_t max_work_group_size = 0#
maximum number of work items in work group
-
std::vector<std::size_t> subgroup_sizes#
supported sub group sizes
-
std::size_t local_memory_size = 0#
size of shared local memory
-
device_type type = device_type::gpu#
device type
-
std::size_t min_subgroup_size() const#
-
std::ostream &bbfft::operator<<(std::ostream &os, device_type type)#
Convert device type to string.
- Parameters:
os – output stream
type – device type
- Returns:
Reference to os
-
std::ostream &bbfft::operator<<(std::ostream &os, device_info const &info)#
Convert device info to string.
- Parameters:
os – output stream
info – device info
- Returns:
Reference to os
Enumerations#
Query device info in OpenCL#
-
auto bbfft::get_device_info(cl_device_id device) -> device_info#
Returns device info for device.
- Parameters:
device – device
- Returns:
-
auto bbfft::get_device_id(cl_device_id device) -> uint64_t#
Return device id for device.
- Parameters:
device – device
- Returns:
device id
Query device info in Level Zero#
-
auto bbfft::get_device_info(ze_device_handle_t device) -> device_info#
Returns device info for device.
- Parameters:
device – device
- Returns:
-
auto bbfft::get_device_id(ze_device_handle_t device) -> uint64_t#
Return device id for device.
- Parameters:
device – device
- Returns:
device id
Query device info in SYCL#
-
auto bbfft::get_device_info(::sycl::device device) -> device_info#
Returns device info for device.
- Parameters:
device – device
- Returns:
-
auto bbfft::get_device_id(::sycl::device device) -> uint64_t#
Return device id for device.
- Parameters:
device – device
- Returns:
device id
Algorithms#
The bbfft::generate_fft_kernels()
function automatically selects
the algorithm to generate the FFT kernel.
The functions in this section allow direct access to the generators of each algorithm.
Small batch fft#
The “small batch FFT” is intended for FFT sizes up to about N=64 (the maximum size depends on the size of the register file of the device).
-
small_batch_configuration bbfft::configure_small_batch_fft(configuration const &cfg, device_info const &info)#
Configure small batch FFT algorithm.
- Parameters:
cfg – configuration
info – Properties of target device
- Returns:
-
void bbfft::generate_small_batch_fft(std::ostream &os, small_batch_configuration const &cfg, std::string_view name = {})#
Generate OpenCL C code for small batch FFT algorithm.
- Parameters:
os – Output stream (e.g. std::cout)
cfg – small batch configuration
name – Override default kernel name
-
struct small_batch_configuration#
Configuration for small batch FFT.
- Attention
Do not set values directly but use configure_small_batch_fft
Public Functions
-
std::string identifier() const#
convert configuration to identification string
Public Members
-
int direction#
-1 or +1
-
std::size_t M#
M.
-
std::size_t Mb#
M block size (w.r.t. to reshaped data)
-
std::size_t N#
Number of points in DFT.
-
std::size_t Kb#
K block size.
-
std::size_t sgs#
sub group size
-
transform_type type#
transform type (c2c, r2c, c2r)
-
std::array<std::size_t, 3u> istride#
stride of input tensor
-
std::array<std::size_t, 3u> ostride#
stride of output tensor
-
bool inplace_unsupported#
true if inplace not available
-
char const *load_function#
user provided load callback name
-
char const *store_function#
user provided store callback name
Two factor fft#
The “two factor FFT” is intended for larger FFT up to the size of the shared local memory.
-
factor2_slm_configuration bbfft::configure_factor2_slm_fft(configuration const &cfg, device_info const &info)#
Configure two factor FFT algorithm.
- Parameters:
cfg – configuration
info – Properties of target device
- Returns:
-
void bbfft::generate_factor2_slm_fft(std::ostream &os, factor2_slm_configuration const &cfg, std::string_view name = {})#
Generate OpenCL C code for two factor FFT algorithm.
- Parameters:
os – Output stream (e.g. std::cout)
cfg – small batch configuration
name – Override default kernel name
-
struct factor2_slm_configuration#
Configuration for two factor FFT.
- Attention
Do not set values directly but use configure_factor2_slm_fft
Public Functions
-
std::string identifier() const#
convert configuration to identification string
Public Members
-
int direction#
-1 or +1
-
std::size_t M#
M.
-
std::size_t Mb#
M block size.
-
std::size_t N#
Number of points in DFT.
-
std::vector<int> factorization#
Factorization scale_even_r2c(N)=N_1*N_2*…*N_L, where scale_even_r2c(N) = N/2 if mode == r2c and N%2 == 0 else N
-
std::size_t Nb#
Number of parallel FFTs in factor.
-
std::size_t Kb#
K block size.
-
std::size_t sgs#
sub group size
-
transform_type type#
transform type (c2c, r2c, c2r)
-
std::array<std::size_t, 3u> istride#
stride of input tensor
-
std::array<std::size_t, 3u> ostride#
stride of output tensor
-
bool inplace_unsupported#
true if inplace not available
-
char const *load_function#
user provided load callback name
-
char const *store_function#
user provided store callback name