November 06, 2017
Impulse Noise Filtering - Implementation of Median Filter
When implementing the median filter there are two options: off-line or real-time. If not done off-line then the filter loops though each data point and calculates the median over a window of the data. Here is an example with a buffer size of 5 and the median window size of 3:
55 |
42 |
1000 |
86 |
61 |
55 |
42 |
1000 |
86 |
61 |
55 |
42 |
1000 |
86 |
61 |
Taking the median for the number highlighted in blue for each row, the filtered data would then be:
55 |
86 |
86 |
Note how the spike of 1000 is filtered out. Note also the resulting buffer is shorter. The filtered data size is always the initial data size less the window size.
If running the filter in real-time, there needs to be a ring-buffer that holds the window size samples. The filtered result is the median of this buffer. New samples replace old samples and the filter continues. Starting this filter requires either loading some starting values or waiting until enough samples have been taken to fill the buffer.
If the following is the initial ring buffer at start:
21 |
30 |
71 |
62 |
72 |
Then the first output is 62. Assume the current into into the buffer is the second sample. When a new sample arrives, it will replace the second sample. For instance if the new sample is 41, the ring buffer becomes:
21 |
41 |
71 |
62 |
72 |
The median of this is again 62 which is the output of the filter. In this way the filter can be used in real-time.
Getting the filter started is up to the implementer. Most of the time simply using an initial buffer full of zeros works just fine. One can also delay producing filtered output until enough samples have been acquired. The entire buffer can be loaded entirely with the first sample taken. Or the window used for calculations can gradually grow until it reaches the desired size. Which option used depends on the application.
Basic implementation of the median calculation itself is simple. Sort the data and choose the middle data point. However sorting data is a time consuming task and fast sorting algorithms don’t have fixed run times. In addition if one is using a ring-buffer to hold samples the sorting cannot be done in place.
The easiest filter to implement is a selection sort. It is simple and always takes the same number of steps for any window size. Selection sort is also the slowest sort with a fixed complexity of O( n2 ). One bit of acceleration can be gained by only sorting half the data. Since the median is looking for the middle data point in a sorted set, one needs only to find the middle data point—everything else is ignored.
My co-worker Flux suggested I look into using an insertion sort for small filter window sizes. Worst case an insertion sort has the same speed, O( n2 ) as a selection sort. However the speed improves the more sorted the data initially is. If already sorted the speed of the algorithm is O( n ). Since we are running as a windowed filter we are only ever changing one piece of data at a time. Thus the data remains mostly sorted. As a result the insertion sort runs closer to O( n ) complexity and worst case O( 2n )—once through the array, and once again needing to move the last element to the first element. That allows this filter to still be useful in real-time applications, especially for small window sizes.
Here is a simple C implementation of a median filter useful for real-time applications.
// Uses:
// Return the median of the data buffer.
// Input:
// data - Data from which to select median.
// indexes - Buffer to hold sort indexes (must contain 'windowSize' elements).
// windowSize - Number of data points.
// Output:
// Median of data points.
// Notes:
// Designed for small data sizes.
// The 'indexes' must be pre-initialized at least once. After
// initialization it can be used for all subsequent calls of the same
// window size.
// for ( int index = 0; index < windowSize; index += 1 )
// indexes[ index ] = index;
// Author:
// Andrew Que <https://www.DrQue.net/>
//-----------------------------------------------------------------------------
static inline int median
(
int const * data,
int * indexes,
int windowSize
)
{
// Insert sort the data using swap buffer.
for ( int indexA = 1; indexA < windowSize; indexA += 1 )
{
int value = data[ indexes[ indexA ] ];
int swapIndex = indexes[ indexA ];
int indexB = indexA - 1;
while ( ( indexB >= 0 )
&& ( data[ indexes[ indexB ] ] > value ) )
{
indexes[ indexB + 1 ] = indexes[ indexB ];
indexB -= 1;
}
indexes[ indexB + 1 ] = swapIndex;
}
// Return median.
return data[ indexes[ windowSize >> 1 ] ];
}
The code is designed for a ring-buffer and requires a second buffer to hold indexes. One should replace all int with the necessary word size for the application but otherwise the code should be drop-in. The algorithm is fairly efficient up to a few tens of window size. After that some more efficient sorting techniques should be used.