<< Chapter < Page Chapter >> Page >
Memory access pattern of the straight line blocks of code in a VL-1 size-128 hard-coded leaf FFT with sorted radix-2/4 and size-8 body sub-transforms

[link] depicts the memory access patterns of a size-128 transform where the outermost body sub-transform has subsumed a smaller sub-transform to become a size-8 sub-transform. The columns from 33 onwards show the sub-transform accessing eight elements in the output data array (cf. [link] , which shows the memory access patterns of the same transform prior to doubling the radix of the outer sub-transform).

Optimizing the hierarchical structure

The largest transform that has been considered so far is size-128. As it stands, the hard-coded leaf approach begins to generate code of unwieldy proportions as the size of the transform tends towards tens of thousands or hundreds of thousands of points. This is due to the lists of statically elaborated body sub-transform calls, e.g., a size-262,144 transform contains a lengthy list of 7279 such calls.

While long lists of statically elaborated calls are one extreme, the other is to compute the body sub-transforms with a recursive program. The former option degrades performance for larger transforms, while the latter option curbs performance for smaller transforms. A compromise is to somehow compress blocks of statically elaborated sub-transform calls.

The approach presented here extracts the hierarchical structure from the sequence of body sub-transforms and emits a set of functions that are neither too small (as in the case of a recursive program) nor too large (as is the case with full static elaboration). This is accomplished by adapting the Sequitur algorithm  [link] , which builds a grammar of rules from a sequence of symbols, and enforces two basic constraints:

  1. no pair of adjacent symbols (referred to as a digram) appears more than once in the grammar;
  2. every rule is used more than once.

The resulting grammar is an efficient hierarchical representation of the original sequence. Additional constraints can be imposed to limit the maximum or minimum size of each rule, which enable the size of the resulting functions to be tuned to be not too small and not too large.

To build the grammar, each body sub-transform is represented by a symbol consisting of the size and offset of the sub-transform. The radix is discarded, because it can be inferred from the size. Here are several other details relevant to this particular application of Sequitur:

  • A digram of two sub-transforms is deemed to match another digram when the size of each sub-transform matches the size of the other digram's respective sub-transform and the relative offsets between sub-transforms within each digram match;
  • Sub-transform offsets are maintained to be always relative to the base of the containing rule – when a rule is constructed, the offsets of the symbols within that rule are adjusted to be relative to the base of the new rule, and when a rule is subsumed (due to violation of constraint 2: every rule must be used more than once), the offsets are recomputed to be relative to the subsuming rule.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Computing the fast fourier transform on simd microprocessors. OpenStax CNX. Jul 15, 2012 Download for free at http://cnx.org/content/col11438/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Computing the fast fourier transform on simd microprocessors' conversation and receive update notifications?

Ask