<< Chapter < Page Chapter >> Page >

One might get the impression that there is a strict dichotomy that divides cache-aware and cache-oblivious algorithms, but the two arenot mutually exclusive in practice. Given an implementation of a cache-oblivious strategy, one can further optimize it for the cachecharacteristics of a particular machine in order to improve the constant factors. For example, one can tune the radices used, thetransition point between the radix- n algorithm and the bounded-radix algorithm, or other algorithmic choices as describedin "Memory strategies in FFTW" . The advantage of starting cache-aware tuning with a cache-oblivious approach is that the starting point already exploitsall levels of the cache to some extent, and one has reason to hope that good performance on one machine will be more portable to otherarchitectures than for a purely cache-aware “blocking” approach. In practice, we have found this combination to be very successful withFFTW.

Memory strategies in fftw

The recursive cache-oblivious strategies described above form a useful starting point, but FFTW supplements them with a number ofadditional tricks, and also exploits cache-obliviousness in less-obvious forms.

We currently find that the general radix- n algorithm is beneficial only when n becomes very large, on the order of 2 20 10 6 . In practice, this means that we use at most a single step of radix- n (two steps would only be used for n 2 40 ). The reason for this is that the implementation of radix n is less efficient than for a bounded radix: the latter has the advantage that an entire radix butterfly can beperformed in hard-coded loop-free code within local variables/registers, including the necessary permutations and twiddlefactors.

Thus, for more moderate n , FFTW uses depth-first recursion with a bounded radix, similar in spirit to the algorithm of [link] but with much larger radices (radix 32 is common) and base cases (size 32 or 64 iscommon) as produced by the code generator of "Generating Small FFT Kernels" . The self-optimization described in "Adaptive Composition of FFT Algorithms" allows the choice of radix and the transition to the radix- n algorithm to be tuned in a cache-aware (but entirely automatic) fashion.

For small n (including the radix butterflies and the base cases of the recursion), hard-coded FFTs (FFTW's codelets ) are employed. However, this gives rise to an interesting problem: acodelet for (e.g.) n = 64 is 2000 lines long, with hundreds of variables and over 1000 arithmetic operations that can be executed inmany orders, so what order should be chosen? The key problem here is the efficient use of the CPU registers, which essentially form anearly ideal, fully associative cache. Normally, one relies on the compiler for all code scheduling and register allocation, but but thecompiler needs help with such long blocks of code (indeed, the general register-allocation problem is NP-complete). In particular, FFTW'sgenerator knows more about the code than the compiler—the generator knows it is an FFT, and therefore it can use an optimalcache-oblivious schedule (analogous to the radix- n algorithm) to order the code independent of the number ofregisters [link] . The compiler is then used only for local “cache-aware” tuning (both for register allocation and the CPUpipeline). One practical difficulty is that some “optimizing” compilers will tend to greatly re-order the code,destroying FFTW's optimal schedule. With GNU gcc, we circumvent this problem by using compiler flags that explicitly disable certain stages of theoptimizer. As a practical matter, one consequence of this scheduler is that FFTW's machine-independent codelets are no slower thanmachine-specific codelets generated by an automated search and optimization over many possible codelet implementations, as performedby the SPIRAL project [link] .

Questions & Answers

the art of managing the production, distribution and consumption.
Satangthem Reply
what is economics
Khawar Reply
what is Open Market Operation
Adu Reply
dominating middlemen men activities circumstances
Christy Reply
what Equilibrium price
Adji Reply
what is gap
who is good with the indifference curve
What is diseconomic
Alixe Reply
what are the types of goods
how can price determination be the central problem of micro economics
simon Reply
marginal cost formula
Nandu Reply
you should differentiate the total cost function in order to get marginal cost function then you can get marginal cost from it
What about total cost
how can price determination be the central problem if micro economics
formula of cross elasticity of demand
Theresia Reply
what is ceteris paribus
Priyanka Reply
what is ceteris parabus
Ceteris paribus - Literally, "other things being equal"; usually used in economics to indicate that all variables except the ones specified are assumed not to change.
What is broker
land is natural resources that is made by nature
What is broker
what is land
What is broker
land is natural resources that is made by nature
whats poppina nigga turn it up for a minute get it
amarsyaheed Reply
what is this?
am from nigeria@ pilo
am from nigeria@ pilo
what is production possibility frontier
it's a summary of opportunity cost depicted on a curve.
please help me solve this question with the aid of appropriate diagrams explain how each of the following changes will affect the market price and quantity of bread 1. A
Manuela Reply
please l need past question about economics
Prosper Reply
ok let me know some of the questions please.
ok am not wit some if den nw buh by tommorow I shall get Dem
Hi guys can I get Adam Smith's WEALTH OF NATIONS fo sale?
hello I'm Babaisa alhaji Mustapha. I'm studying Economics in the university of Maiduguri
my name is faisal Yahaya. i studied economics at Kaduna state university before proceeding to West African union university benin republic for masters
Hi guys..I am from Bangladesh..
Wat d meaning of management
igwe Reply
disaster management cycle
Gogul Reply
cooperate social responsibility
Fedric Wilson Taylor also define management as the act of knowing what to do and seeing that it is done in the best and cheapest way
Difference between extinct and extici spicies
Amanpreet Reply
Researchers demonstrated that the hippocampus functions in memory processing by creating lesions in the hippocampi of rats, which resulted in ________.
Mapo Reply
The formulation of new memories is sometimes called ________, and the process of bringing up old memories is called ________.
Mapo Reply
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get the best Algebra and trigonometry course in your pocket!

Source:  OpenStax, Fast fourier transforms. OpenStax CNX. Nov 18, 2012 Download for free at http://cnx.org/content/col10550/1.22
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?