Offline generation#

OpenCL-C for FFT kernels source code can be generated offline using the following function:

std::vector<std::string> bbfft::generate_fft_kernels(std::ostream &os, std::vector<configuration> const &cfgs, device_info const &info)#

Generate FFT kernel code for configuration and device.

Parameters:
  • os – Output stream (e.g. std::cout)

  • cfgs – configurations

  • info – Properties of target device

Returns:

kernel names

Device info#

FFTs are specialized for the target device based on information stored in the following struct:

struct device_info#

Parameters of target device.

Public Functions

std::size_t min_subgroup_size() const#

Minimum subgroup size.

std::size_t max_subgroup_size() const#

Maximum subgroup size.

std::size_t register_space_min() const#

Min size of register file in bytes.

std::size_t register_space_max() const#

Max size of register file in bytes.

std::string to_string() const#

convert device info to string

bool operator==(device_info const &other) const#

equality check

bool operator!=(device_info const &other) const#

inequality check

Public Members

std::size_t max_work_group_size = 0#

maximum number of work items in work group

std::vector<std::size_t> subgroup_sizes#

supported sub group sizes

std::size_t local_memory_size = 0#

size of shared local memory

device_type type = device_type::gpu#

device type

std::ostream &bbfft::operator<<(std::ostream &os, device_type type)#

Convert device type to string.

Parameters:
  • os – output stream

  • type – device type

Returns:

Reference to os

std::ostream &bbfft::operator<<(std::ostream &os, device_info const &info)#

Convert device info to string.

Parameters:
  • os – output stream

  • info – device info

Returns:

Reference to os

Enumerations#

enum class bbfft::device_type#

Device type.

Values:

enumerator gpu#

GPU device.

enumerator cpu#

CPU device.

enumerator custom#

custom device

Query device info in OpenCL#

auto bbfft::get_device_info(cl_device_id device) -> device_info#

Returns device info for device.

Parameters:

device – device

Returns:

device_info

auto bbfft::get_device_id(cl_device_id device) -> uint64_t#

Return device id for device.

Parameters:

device – device

Returns:

device id

Query device info in Level Zero#

auto bbfft::get_device_info(ze_device_handle_t device) -> device_info#

Returns device info for device.

Parameters:

device – device

Returns:

device_info

auto bbfft::get_device_id(ze_device_handle_t device) -> uint64_t#

Return device id for device.

Parameters:

device – device

Returns:

device id

Query device info in SYCL#

auto bbfft::get_device_info(::sycl::device device) -> device_info#

Returns device info for device.

Parameters:

device – device

Returns:

device_info

auto bbfft::get_device_id(::sycl::device device) -> uint64_t#

Return device id for device.

Parameters:

device – device

Returns:

device id

Algorithms#

The bbfft::generate_fft_kernels() function automatically selects the algorithm to generate the FFT kernel. The functions in this section allow direct access to the generators of each algorithm.

Small batch fft#

The “small batch FFT” is intended for FFT sizes up to about N=64 (the maximum size depends on the size of the register file of the device).

small_batch_configuration bbfft::configure_small_batch_fft(configuration const &cfg, device_info const &info)#

Configure small batch FFT algorithm.

Parameters:
  • cfg – configuration

  • info – Properties of target device

Returns:

small_batch_configuration

void bbfft::generate_small_batch_fft(std::ostream &os, small_batch_configuration const &cfg, std::string_view name = {})#

Generate OpenCL C code for small batch FFT algorithm.

Parameters:
  • os – Output stream (e.g. std::cout)

  • cfg – small batch configuration

  • name – Override default kernel name

struct small_batch_configuration#

Configuration for small batch FFT.

Attention

Do not set values directly but use configure_small_batch_fft

Public Functions

std::string identifier() const#

convert configuration to identification string

Public Members

int direction#

-1 or +1

std::size_t M#

M.

std::size_t Mb#

M block size (w.r.t. to reshaped data)

std::size_t N#

Number of points in DFT.

std::size_t Kb#

K block size.

std::size_t sgs#

sub group size

precision fp#

floating-point precision

transform_type type#

transform type (c2c, r2c, c2r)

std::array<std::size_t, 3u> istride#

stride of input tensor

std::array<std::size_t, 3u> ostride#

stride of output tensor

bool inplace_unsupported#

true if inplace not available

char const *load_function#

user provided load callback name

char const *store_function#

user provided store callback name

Two factor fft#

The “two factor FFT” is intended for larger FFT up to the size of the shared local memory.

factor2_slm_configuration bbfft::configure_factor2_slm_fft(configuration const &cfg, device_info const &info)#

Configure two factor FFT algorithm.

Parameters:
  • cfg – configuration

  • info – Properties of target device

Returns:

factor2_slm_configuration

void bbfft::generate_factor2_slm_fft(std::ostream &os, factor2_slm_configuration const &cfg, std::string_view name = {})#

Generate OpenCL C code for two factor FFT algorithm.

Parameters:
  • os – Output stream (e.g. std::cout)

  • cfg – small batch configuration

  • name – Override default kernel name

struct factor2_slm_configuration#

Configuration for two factor FFT.

Attention

Do not set values directly but use configure_factor2_slm_fft

Public Functions

std::string identifier() const#

convert configuration to identification string

Public Members

int direction#

-1 or +1

std::size_t M#

M.

std::size_t Mb#

M block size.

std::size_t N#

Number of points in DFT.

std::vector<int> factorization#

Factorization scale_even_r2c(N)=N_1*N_2*…*N_L, where scale_even_r2c(N) = N/2 if mode == r2c and N%2 == 0 else N

std::size_t Nb#

Number of parallel FFTs in factor.

std::size_t Kb#

K block size.

std::size_t sgs#

sub group size

precision fp#

floating-point precision

transform_type type#

transform type (c2c, r2c, c2r)

std::array<std::size_t, 3u> istride#

stride of input tensor

std::array<std::size_t, 3u> ostride#

stride of output tensor

bool inplace_unsupported#

true if inplace not available

char const *load_function#

user provided load callback name

char const *store_function#

user provided store callback name