<< Chapter < Page Chapter >> Page >
This module analyzes the effectiveness of reording data structures in a filter bank implementation to achieve higher efficiency.

This module is a continuation of the previous module on POSIX thread implementations.

Implementation 3: partial reordering

An area for improvement over the previous p-thread implementations is to reorder some of the arrays that the filter uses to store data. We started by restructuring the output vector. We split the output vector into a two-dimensional array giving each thread its own output vector. This separates each thread's data in memory, cutting down on cache poisonings. Note that this does change the structure of the final output, but this change is acceptable given that the Open Ephys GUI can be threaded to read in the filter output from several vectors at once.

We ran the same sequence of tests as before using this modified implementations and observed the following results:

Number of Threads Runtime (s)
Control (main thread) 50.106
1 49.302
2 78.888
4 89.939
8 35.283
16 71.337
32 109.112

Implementation 4: full reordering

Another area for significant improvement was to restructure the intermediate data vectors. The vectors were originally designed to store the intermediate filter values of w1, w2, w3 and w4 (see module on Transposed Direct-Form II implementation) in separate arrays. All the threads also shared the arrays, but wrote to different values within the array.

We constructed an alternate structure where all the intermediate filter values of each channel were located in adjacent memory values in a single large vector. Each thread would have its own intermediate value vector for the channels that it was processing. What this does is enable spatial locality for the intermediate values for each channel, which tend to be used all at the same time (temporal locality). The locality of this data ensure that cache hits occur frequently. Splitting the intermediate value vector by thread will help limit cache poisoning.

Using the same sequence of tests to benchmark this implementation, we observed the following results:

Number of Threads Runtime (s)
Control (main thread) 71.166
1 57.156
2 52.639
4 48.939
8 33.589
16 51.543
32 110.716

Analysis of results

A dog sitting on a bed

The obtained results show a very promising improvement, especially after implementing a full reordering scheme. With data reordering, the cache effects of the previous POSIX implementations are significantly reduced. We cannot circumvent p-thread overhead, which indicates why a higher number of p-threads still continues to perform poorly, regardless of data reordering.

With clever data reordering, p-thread implementations of the filter bank can provide significant speed gains over comparable single-threaded methods. The fastest run time obtained by these implementations ran in under 34 seconds with full reordering. This is much faster than the 48 second run time posted by the fastest single-threaded implementation.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Efficient real-time filter design for recording multichannel neural activity. OpenStax CNX. Dec 11, 2012 Download for free at http://cnx.org/content/col11461/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Efficient real-time filter design for recording multichannel neural activity' conversation and receive update notifications?

Ask