Understanding data layouts#
The most important message comes first: Every data layout in the Double-Batched FFT Library is defined in column major order.
Strides#
Suppose we are given a
Let
such that x[linear_index]
gives the correct entry, where x
is the base address of the tensor’s data.
Here, x
might be either real or complex, that is, the linear_index
is taken
w.r.t. to the underlying data type.
Default strides may be overriden in the configuration
.
Warning
Default c2c#
Default c2c strides are
There is no distinction between in-place and out-of-place transforms.
Default r2c#
For r2c, the input tensor is real and the output tensor is complex. We only need to store half of the modes due to symmetry, therefore define
We need to distinguish between the default out-of-place layout and the default in-place layout.
Out-of-place#
For the out-of-place transform the input tensor uses the default packed format
For the output tensor we use the default packed format but truncate the first FFT mode:
In-place#
The input tensor is overwritten during the FFT, hence it needs enough space to store the output tensor. Therefore, the first FFT mode needs to be padded. Let
The strides are
Example: Let
Default c2r#
c2r is the converse of r2c, so we simply swap input and output strides.
Tensor indexer#
The tensor_indexer
is a helpful class to work with input tensors.
E.g. in one dimension for a r2c in-place transform we can use the following code:
std::size_t N_out = N / 2 + 1;
auto xi = tensor_indexer<std::size_t, 3, layout::col_major>({M, N, K}, {1, M, M * N_out});
auto x = malloc_device<T>(xi.size(), Q);
for (std::size_t k = 0; k < K; ++k) {
for (std::size_t n = 0; n < N; ++n) {
for (std::size_t m = 0; m < M; ++m) {
x[xi(m, n, k)] = ...; // Load data for entry (m, n, k)
}
}
}
Tip
The tensor_indexer
and configuration
strides are compatible.
For example, given the configuration cfg
, one can initialize xi
with
auto xi = tensor_indexer<std::size_t, 3, layout::col_major>(
fit_array<3>(cfg.shape), fit_array<3>(cfg.istride));