<< Chapter < Page Chapter >> Page >

(When implementing hard-coded base cases, there is another choice because aloop of small transforms is always required. Is it better to implement a hard-coded FFT of size 64, for example, or an unrolledloop of four size-16 FFTs, both of which operate on the same amount of data? The former should be more efficient because it performs morecomputations with the same amount of data, thanks to the log n factor in the FFT's n log n complexity.)

In addition, there are many other techniques that FFTW employs to supplement the basic recursive strategy, mainly to address the factthat cache implementations strongly favor accessing consecutive data—thanks to cache lines, limited associativity, and directmapping using low-order address bits (accessing data at power-of-two intervals in memory, which is distressingly common in FFTs, is thusespecially prone to cache-line conflicts). Unfortunately, the known FFT algorithms inherently involve some non-consecutive access(whether mixed with the computation or in separate bit-reversal/transposition stages). There are many optimizations inFFTW to address this. For example, the data for several butterflies at a time can be copied to a small buffer before computingand then copied back, where the copies and computations involve more consecutive access than doing the computation directly in-place. Or,the input data for the subtransform can be copied from (discontiguous) input to (contiguous) output before performing thesubtransform in-place (see "Indirect plans" ), rather than performing the subtransform directly out-of-place (as in algorithm 1 ). Or, the order of loops can be interchanged in order to push the outermost loop from the first radix step [the 2 loop in [link] ] down to the leaves, in order to make the inputaccess more consecutive (see "Discussion" ). Or, the twiddle factors can be computed using a smaller look-up table (fewer memoryloads) at the cost of more arithmetic (see "Numerical Accuracy in FFTs" ). The choice of whether to use any of these techniques, which come into playmainly for moderate n ( 2 13 < n < 2 20 ), is made by the self-optimizing planner as described in the next section.

Adaptive composition of fft algorithms

As alluded to several times already, FFTW implements a wide variety of FFT algorithms (mostly rearrangements of Cooley-Tukey) andselects the “best” algorithm for a given n automatically. In this section, we describe how such self-optimization is implemented, andespecially how FFTW's algorithms are structured as a composition of algorithmic fragments. These techniques in FFTW are described ingreater detail elsewhere [link] , so here we will focus only on the essential ideas and the motivations behind them.

An FFT algorithm in FFTW is a composition of algorithmic steps called a plan . The algorithmic steps each solve a certain class of problems (either solving the problem directly or recursively breaking it into sub-problems of the same type). Thechoice of plan for a given problem is determined by a planner that selects a composition of steps, either by runtime measurements to pick the fastest algorithm, or by heuristics, or by loading apre-computed plan. These three pieces: problems, algorithmic steps, and the planner, are discussed in the following subsections.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Fast fourier transforms. OpenStax CNX. Nov 18, 2012 Download for free at http://cnx.org/content/col10550/1.22
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?

Ask