<< Chapter < Page | Chapter >> Page > |
These plans extract a vector loop to reduce a DFT problem to a problem of lower vector rank, which is then solved recursively. Anyof the vector loops of $\mathbf{V}$ could be extracted in this way, leading to a number of possible plans corresponding to different looporderings.
Formally, to solve $dft(\mathbf{N},\mathbf{V},\mathbf{I},\mathbf{O})$ , where $\mathbf{V}=\left\{(,n,,,\iota ,,,o,)\right\}\cup {\mathbf{V}}_{1}$ , FFTW generates a loop that, for all $k$ such that $0\le k<n$ , invokes a plan for $dft(\mathbf{N},{\mathbf{V}}_{1},\mathbf{I}+k\xb7\iota ,\mathbf{O}+k\xb7o)$ .
Indirect plans transform a DFT problem that requires some data shuffling (or discontiguous operation) into a problem that requires noshuffling plus a rank-0 problem that performs the shuffling.
Formally, to solve $dft(\mathbf{N},\mathbf{V},\mathbf{I},\mathbf{O})$ where $\left|\mathbf{N}\right|>0$ , FFTW generates a plan that first solves $dft(\left\{\right\},\mathbf{N}\cup \mathbf{V},\mathbf{I},\mathbf{O})$ , and then solves $dft(copy-o(\mathbf{N}),copy-o(\mathbf{V}),\mathbf{O},\mathbf{O})$ . Here we define $copy-o\left(t\right)$ to be the I/O tensor $\left\{(,n,,,o,,,o,),\mid ,(,n,,,\iota ,,,o,),\in ,t\right\}$ : that is, it replaces the input strides with the output strides. Thus, an indirect plan firstrearranges/copies the data to the output, then solves the problem in place.
As discussed in "Goals and Background of the FFTW Project" , it turns out to be surprisingly useful to be able to handle large prime $n$ (or large prime factors). Rader plans implement the algorithm from [link] to compute one-dimensional DFTs of prime size in $\Theta (nlogn)$ time. Bluestein plans implement Bluestein's “chirp-z” algorithm, which can also handle prime $n$ in $\Theta (nlogn)$ time [link] , [link] , [link] . Generic plans implement a naive $\Theta \left({n}^{2}\right)$ algorithm (useful for $n\lesssim 100$ ).
Although it may not be immediately apparent, the combination of the recursive rules in "The space of plans in FFTW" can produce a number of useful algorithms. To illustrate these compositions, we discuss three particular issues: depth- vs. breadth-first, loop reordering,and in-place transforms.
As discussed previously in sections "Review of the Cooley-Tukey FFT" and "Understanding FFTs with an ideal cache" , the same Cooley-Tukey decomposition can be executed in either traditionalbreadth-first order or in recursive depth-first order, where the latter has some theoretical cache advantages. FFTW is explicitlyrecursive, and thus it can naturally employ a depth-first order. Because its sub-problems contain a vector loop that can be executed ina variety of orders, however, FFTW can also employ breadth-first traversal. In particular, a 1d algorithm resembling thetraditional breadth-first Cooley-Tukey would result from applying "Cooley-Tukey plans" to completely factorize the problem size before applying the loop rule "Plans for higher vector ranks" to reduce the vector ranks, whereas depth-first traversal would result fromapplying the loop rule before factorizing each subtransform. These two possibilities are illustrated by an example in [link] .
Notification Switch
Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?