I\'ve been working on a project that requires a low-pass filter on sampled waveforms. There is a generic low-pass algorithm:
W[n] = C*S[n]+(1-C)*S[n-1]
S = Input buffer
W = Output buffer
C = Filter coefficient
This a is fast and functional low-pass filter, but introduces phase-shift into the signal. The signal distortion is too great to be unusable with the filter coefficient we required.
Another way of doing a low-pass filter is to do so in the frequency domain. This requires a discrete Fourier transform
(DFT) on the waveform, nulling out the high frequencies and an inverse DFT. We had done this successfully with some C code we found on the net. Since this was implemented in a hurry, it was hardly efficient. I knew that implementations must exist that took advantage of specialized CPU instructions such as MMX
, all of which the CPU I\'m working with supports. SSE and 3dnow!, in particular, could be of advantage to an FFT because they added scalar floating-point hardware acceleration.
In my quest for a better algorithm, I kept coming back to FFTW
(Fastest Fourier Transform in the West), an DFT library developed at MIT
(Massachusetts Institute of Technology). At first, I shied away from this library, as it seemed to be far more then I was looking for; capable of discrete Fourier transforms on multi-dimensional arrays of complex data. The mer task of transforming a one dimensional array of real data seemed insulting to such a powerful tool. Why would the writers bother concentrating optimization effort to such a simple task? As I found out, however, a good number of applications used FFTW; applications that needed a fast, simple DFT. Specifically, the LAME mp3 encoder library
. Clearly, that project needs something quick.
As my implementation started, I had a good number of problems. First of all, I had no idea how my existing FFT algorithm was going about it\'s busness. I knew the FFT algorithm returned data in the frequency domain, and I understood the filtering function. However, reading about FFTs, it was clear to me the transform had to be returning complex data. Complex numbers
are acually two numbers: a real part, and an imaginary part. It is typically expressed as ( a + b*i ), where \"a\" is the real part, \"b\" is the imaginary part, and \"i\" is the square root of -1.
There was a second problem as well. The FFT algorithm was also offset the data by 1 element and prepended \"0\" to the front before the transform. FFTW stated specifically it did not do this. This turned out not to be much of an issue at all. The biggest problem was figuring out what output was created by the phantom FFT.
The third problem wasn\'t much of an issue either, once I understood what was happening. The output of the FFTW was not normalized. Thus, one could not transform the input data to frequency and back again without an intermediate step. This step simply involves dividing all the elements in the sample by the number of samples.
It turns out the problem was in the ordering. The phantom algorithm returned complex data, as it must. The data was arranged r,i,ri...r[n],i[n]. FFTW\'s transform function for real data returns it\'s output r,r...r[n],i[n]...i,i. So, rather then ever other element being the imaginary part, the 2nd half of the output was the imaginary part in reverse order. I can only assume this was arranged this way because it was the most efficient for the algorithm. After I understood the layout of the two FFT algorithm\'s output, I succeeded in modifying the filtering portion of the low-pass filter code to deal with this layout.
The results, a 15x speed improvement over the previous algorithm. Considering the fact this is a real-time application, that\'s a world of difference.
Pictured is the isthmus at Riverside Park in Beloit.