<< Chapter < Page Chapter >> Page >

Performance

Single-precision, SSE (VL-2)
Double-precision, SSE (VL-1)
Single-precision, AVX (VL-4)
Double-precision, AVX (VL-2)
Performance of hard-coded leaf FFTs on a Macbook Air 4,2.

[link] shows the results of a benchmark for transforms of size 256 through to 262,144 running on a Macbook Air 4,2. The speed of FFTW 3.3 running in estimate and patient modes is also shown for comparison.

For each size of transform, precision and vector length (i.e., either SSE or AVX), several configurations of hard-coded leaf FFT were generated: three configurations of leaf size (16, 32 and 64), and if the transform was larger than 32,768, an additional transform with size-16 leaves and streaming store instructions was also generated. Before running the benchmark, the library was calibrated and the fastest configuration selected (details of the calibration are described in "Calibration" ).

For most sizes of transform, precision and vector length, SFFT is faster than FFTW running in patient mode. For the transforms with memory requirements that are approximately at the limits of the cache, FFTW running in patient mode is sometimes marginally faster than SFFT. Once the transforms exceed the size of the cache, SFFT is again the fastest.

It is important to note that FFTW running in patient mode evaluates a huge configuration space of parameters (and thus takes a long time to calibrate), while SFFT has, in this case, only evaluated either three or four configurations per transform.

In practice

SFFT is not itself an FFT library; the name refers to the elaboration program that reads a configuration file and generates the code for an FFT library. The code for the FFT library is then built as any other library would be.

Organization

As well as the generated code, there is infrastructure code which is common to all libraries generated by SFFT. This can be broadly categorized into three parts: initialization, dispatch and calibration.

Initialization

Before an application can compute an FFT with SFFT, it must initialize a plan for the specific size, precision and direction of FFT. The library may have several FFTs and configurations that can compute the requested FFT, and it chooses the fastest option by timing each of the candidate configurations, which is at most 8 for any size of transform – a very small space compared to FFTW's exhaustive search of all possible FFT algorithms and configurations. Results and discussion describes an alternative to calibration, where machine learning is used with data collected from benchmarks to build a model that predicts performance.

After determining which implementation and parameters will be used, the initialization code allocates memory and populates any lookup tables that may be required. Before returning the plan to the application, a function pointer in the plan is updated to point to the FFT that has just been initialized.

Dispatch

Applications do not invoke any of the FFTs within SFFT directly. Rather they invoke a dispatch function on an initialized plan, which in turn transfers control to the correct FFT code within SFFT. The use of a dispatch function is purely a matter of convenience, so that users only need to deal with a few simple functions.

Calibration

SFFT contains calibration code to measure the performance of the possible configurations of FFT on the target machine, which is at most 8 for each size of transform. Following calibration, the timing data is written to a file, which is then used by SFFT to select the fastest possible FFT for a given problem running on that machine.

Usage

SFFT is used much like other FFT libraries:

  1. A plan for an FFT is initialized;
  2. Using the plan, an FFT is computed (this step may be repeated many times);
  3. The plan is destroyed.

The plan is initialized for a given size, precision and direction of transform, and may then be executed any number of times on any data. Any number of plans can be simultaneously created and used.

  int n = 1024;   double complex __attribute__ ((aligned(32))) *input, *output;  input = _mm_malloc(n * sizeof(double complex), 32);   output = _mm_malloc(n * sizeof(double complex), 32);    for(i=0;i<n;i++) input[i] = i;    sfft_plan_t *p = sfft_init(i, SFFT_FORWARD|SFFT_DOUBLE|SFFT_AVX);    if(p) {      sfft_execute(p, input, output);    for(i=0;i<n;i++)       printf("%d %f %f\n", i, creal(output[i]), cimag(output[i]));    sfft_free(p);    }else{     printf("Plan unsupported\n");  }
SFFT example usage

In [link] , a size-1024 transform is computed on double-precision data with AVX enabled. In lines 2-4, the input and output arrays are allocated with 32 byte alignment, as is required for aligned AVX memory operations. The plan is initialized at line 8, used to compute an FFT at line 12 (provided the requested plan is supported), and finally freed at line 20.

Other optimizations

In addition to generating a general-purpose library that can be calibrated for a machine and application at runtime, there are several situations where the SFFT library can be specially optimized:

  1. If the machine and application are fixed, a one time calibration can be performed and an optimized library containing only the fastest transforms specific to the application and machine is generated;
  2. If the application is fixed, an optimized library containing only the transforms specific to the application is generated (and the library is calibrated the first time it is used on each machine);
  3. If the machine is fixed, an optimized library containing only the transforms specific to the machine is generated (and an application can use any transform without calibration).

Questions & Answers

what is biology
Hajah Reply
the study of living organisms and their interactions with one another and their environments
AI-Robot
what is biology
Victoria Reply
HOW CAN MAN ORGAN FUNCTION
Alfred Reply
the diagram of the digestive system
Assiatu Reply
allimentary cannel
Ogenrwot
How does twins formed
William Reply
They formed in two ways first when one sperm and one egg are splited by mitosis or two sperm and two eggs join together
Oluwatobi
what is genetics
Josephine Reply
Genetics is the study of heredity
Misack
how does twins formed?
Misack
What is manual
Hassan Reply
discuss biological phenomenon and provide pieces of evidence to show that it was responsible for the formation of eukaryotic organelles
Joseph Reply
what is biology
Yousuf Reply
the study of living organisms and their interactions with one another and their environment.
Wine
discuss the biological phenomenon and provide pieces of evidence to show that it was responsible for the formation of eukaryotic organelles in an essay form
Joseph Reply
what is the blood cells
Shaker Reply
list any five characteristics of the blood cells
Shaker
lack electricity and its more savely than electronic microscope because its naturally by using of light
Abdullahi Reply
advantage of electronic microscope is easily and clearly while disadvantage is dangerous because its electronic. advantage of light microscope is savely and naturally by sun while disadvantage is not easily,means its not sharp and not clear
Abdullahi
cell theory state that every organisms composed of one or more cell,cell is the basic unit of life
Abdullahi
is like gone fail us
DENG
cells is the basic structure and functions of all living things
Ramadan
What is classification
ISCONT Reply
is organisms that are similar into groups called tara
Yamosa
in what situation (s) would be the use of a scanning electron microscope be ideal and why?
Kenna Reply
A scanning electron microscope (SEM) is ideal for situations requiring high-resolution imaging of surfaces. It is commonly used in materials science, biology, and geology to examine the topography and composition of samples at a nanoscale level. SEM is particularly useful for studying fine details,
Hilary
cell is the building block of life.
Condoleezza Reply
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Computing the fast fourier transform on simd microprocessors. OpenStax CNX. Jul 15, 2012 Download for free at http://cnx.org/content/col11438/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Computing the fast fourier transform on simd microprocessors' conversation and receive update notifications?

Ask