<< Chapter < Page | Chapter >> Page > |
The number of additions depends on the order of the pre- and postweave operators. For example in the length-15 WFTA in [link] , if the length-5 had been done first and last, there would have beensix row addition preweaves in the preweave operator rather than the five shown. It is difficult to illustrate the algorithm for three ormore factors of N, but the ideas apply to any number of factors. Each length has an optimal ordering of the pre- and postweaveoperators that will minimize the number of additions.
A program for the WFTA is not as simple as for the FFT or PFA because of the very characteristic that reduces the number ofmultiplications, the nesting. A simple two-factor example program is given in [link] and a general program can be found in [link] , [link] . The same lengths are possible with the PFA and WFTA and the same short DFT modules can be used, however, themultiplies in the modules must occur in one place for use in the WFTA.
In the previous section it was seen how using the permutation property of the elementary operators in the PFA allowedthe nesting of the multiplications to reduce their number. It was also seen that a proper ordering of the operators could minimize thenumber of additions. These ideas have been extended in formulating a more general algorithm optimizing problem. If the DFT operator $F$ in [link] is expressed in a still more factored form obtained from Winograd’s Short DFT Algorithms: Equation 30 , a greater variety of ordering can be optimized. For example if the $A$ operators have two factors
The DFT in [link] becomes
The operator notation is very helpful in understanding the central ideas, but may hide some important facts. It has been shown [link] , [link] that operators in different ${F}_{i}$ commute with each other, but the order of the operators within an ${F}_{i}$ cannot be changed. They represent the matrix multiplications in Winograd’s Short DFT Algorithms: Equation 30 or Winograd’s Short DFT Algorithms: Equation 8 which do not commute.
This formulation allows a very large set of possible orderings, in fact, the number is so large that some automatictechnique must be used to find the “best". It is possible to set up a criterion of optimality that not only includes the number ofmultiplications but the number of additions as well. The effects of relative multiply-add times, data transfer times, CPU register andmemory sizes, and other hardware characteristics can be included in the criterion. Dynamic programming can then be applied to derive anoptimal algorithm for a particular application [link] . This is a very interesting idea as there is no longer a single algorithm, buta class and an optimizing procedure. The challenge is to generate a broad enough class to result in a solution that is close to a globaloptimum and to have a practical scheme for finding the solution.
Results obtained applying the dynamic programming method to the design of fairly long DFT algorithms gave algorithms that hadfewer multiplications and additions than either a pure PFA or WFTA [link] . It seems that some nesting is desirable but not total nesting for four or more factors. There are also some interestingpossibilities in mixing the Cooley-Tukey with this formulation. Unfortunately, the twiddle factors are not the same for all rows andcolumns, therefore, operations cannot commute past a twiddle factor operator. There are ways of breaking the total algorithm intohorizontal paths and using different orderings along the different paths [link] , [link] . In a sense, this is what the split-radix FFT does with its twiddle factors when compared to a conventionalCooley-Tukey FFT.
Notification Switch
Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?