<< Chapter < Page Chapter >> Page >
Discussion of the considerations involved in high-performance FFT implementations, which center largely on memory access and other non-arithmetic concerns, as illustrated by a case study of the FFTW library.

by Steven G. Johnson (Department of Mathematics, Massachusetts Institute of Technology) and Matteo Frigo (Cilk Arts, Inc.)

Introduction

Although there are a wide range of fast Fourier transform (FFT) algorithms, involving a wealth of mathematics from number theory topolynomial algebras, the vast majority of FFT implementations in practice employ some variation on the Cooley-Tukeyalgorithm [link] . The Cooley-Tukey algorithm can be derived in two or three lines of elementary algebra. It can beimplemented almost as easily, especially if only power-of-two sizes are desired; numerous popular textbooks list short FFT subroutinesfor power-of-two sizes, written in the language du jour . The implementation of the Cooley-Tukey algorithm, at least, wouldtherefore seem to be a long-solved problem. In this chapter, however, we will argue that matters are not as straightforward as they mightappear.

For many years, the primary route to improving upon the Cooley-Tukey FFT seemed to be reductions in the count of arithmetic operations,which often dominated the execution time prior to the ubiquity of fast floating-point hardware (at least on non-embedded processors). Therefore, great effort was expended towardsfinding new algorithms with reduced arithmetic counts [link] , from Winograd's method to achieve Θ ( n ) multiplications We employ the standard asymptotic notation of O for asymptotic upper bounds, Θ for asymptotic tight bounds, and Ω for asymptotic lower bounds [link] . (at the cost of many more additions) [link] , [link] , [link] , [link] to the split-radix variant on Cooley-Tukey that long achieved the lowestknown total count of additions and multiplications for power-of-two sizes [link] , [link] , [link] , [link] , [link] (but was recently improved upon [link] , [link] ). The question of the minimum possible arithmetic count continues to be of fundamentaltheoretical interest—it is not even known whether better than Θ ( n log n ) complexity is possible, since Ω ( n log n ) lower bounds on the count of additions have only been proven subject to restrictive assumptions about thealgorithms [link] , [link] , [link] . Nevertheless, the difference in the number of arithmetic operations, forpower-of-two sizes n , between the 1965 radix-2 Cooley-Tukey algorithm ( 5 n log 2 n [link] ) and the currently lowest-known arithmetic count ( 34 9 n log 2 n [link] , [link] ) remains only about 25%.

log (n) base 2 on the horizontal axis, and FFTW speed/numerical recipes speed on the vertical axis. Two curves move horizontally from value 3 to value 18, where they begin increasing. One curve, labeled, without SSE, is lower in vertical value than the second curve, with SSE (SIMD instructions).
The ratio of speed (1/time) between a highly optimized FFT (FFTW 3.1.2 [link] , [link] ) and a typical textbook radix-2 implementation ( Numerical Recipes in C [link] ) on a 3 GHz Intel Core Duo with the Intel C compiler 9.1.043, for single-precisioncomplex-data DFTs of size n , plotted versus log 2 n . Top line (squares) shows FFTW with SSE SIMD instructions enabled, whichperform multiple arithmetic operations at once (see section); bottom line (circles) shows FFTW with SSE disabled, which thusrequires a similar number of arithmetic instructions to the textbook code. (This is not intended as a criticism of Numerical Recipes —simple radix-2 implementations are reasonable for pedagogy—but it illustrates the radical differences betweenstraightforward and optimized implementations of FFT algorithms, even with similar arithmetic costs.) For n 2 19 , the ratio increases because the textbook code becomes much slower (thishappens when the DFT size exceeds the level-2 cache).

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Fast fourier transforms. OpenStax CNX. Nov 18, 2012 Download for free at http://cnx.org/content/col10550/1.22
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?

Ask