Andrew Que Sites list Photos
Projects Contact
Main

September 22, 2012

Generating Normally Distributed Random Numbers with a Banded Range

In doing my analysis of my implementation of the Box-Muller transform function I started to ask questions and began expanding upon the idea. The theoretical output of this function is -∞ to +∞. This assumes that the random numbers going into the function have an infinite precision and can truly be any number between 0 and 1. In practice the numbers are limited by the floating-point resolution of the system the function runs on. Calculating the actual range can be done, but it made me ask the question: could I control the range of the output and still have data that has normal distribution? That is, can we have a function that has an upper and lower limit with distribution is (more or less) Gaussian?

My approach for getting such a function had to do with analyzing the basic form of the Box–Muller transform. Let us first break the function into smaller parts and remove the scale (standard deviation multiplier) and offset (mean addition) :

 

The output of sine and cosine ranges between -1 and 1. So the scale of this function depends solely on a. If we scale a, we scale the range total range of the function. Sinceα is random it is possible to make sure this is never below some value, which will guarantee that a is never above some value. This value must relate the to span of the function we define as the limit from the target mean of the function. Thus:

 

So now we have an equation that defines the smallest value that acan be, amin when sis the maximum span. We can reincorporate this into the Box–Muller transform.

 

Now the transform will produce an output from ranging from -s to s. That isn't the desired range, so we need a scale the function. What exactly s turns out to be is the number of standard deviations that form the distribution. This function still requires a target standard deviation and a mean. However to keep the everything within a given span, the standard deviation needs to be set based on the span and the value of s. We will again use σ for standard deviation.

 

Where %u2206 is the desired span. The finial function then becomes:

 

Here the definition of what we want in a function must be clarified. Since we are applying upper and lower limits to the function we can no longer generate data with a known standard deviation. Instead, we generate data with an approximate standard deviation with a distribution similar to a Gaussian function.

Using this function requires understanding how to use s. This basically determines the shape of the distribution.

This set of histograms shows the same scale divided into some number of standard deviations. Here %u2206 is constant, and s changes from 1 to 5 for over each of the graphs. Note that 1 standard deviation accounts for about 68% of the data in normal distribution, 2 about 95%, 3 about 99.7%, 4 about 99.993% and 5 about 99.99994%. For most cases, a value of 3 should make a fairly good bell curve.

Here is the implementation of the function:

//-----------------------------------------------------------------------------
// USES:
//   Produce a mostly normally distributed random sample with a defined center,
//   maximum span, and with a distribution curve following a given number of
//   standard deviations.
// INPUT:
//   mean - Desired average.
//   span - Max deviation (plus or minus) from mean.
//   stddevs - Number of standard deviations to to range over this span.
// OUTPUT:
//   Random number between 'mean - span' and 'mean + span'.
// NOTES:
//   A value of 3 or 4 in 'stddevs' will reproduce a fairly normal
// distribution (i.e. a bell curve).
//-----------------------------------------------------------------------------
double getNormSpanRandom( double mean, double span, double stddevs )
{
  // Since this functions generates two words at a time, this variable is
  // used to cache the second word.
  static double next = DBL_MAX;

  double result = next;

  // Calculate the desired standard deviation.  This is based on the number
  // of standard deviations covering the span.  The larger the number of
  // standard deviations, the closer the resulting values will achieve this
  // value.
  double stddev = span / ( 2.0 * stddevs );

  // If we don't have a cached word, we make a pair...
  if ( DBL_MAX == next )
  {
    // Calculate minimum value that will produce the desired range.
    // This is used the scale the random number.  Forces the output to be
    // within the given span.
    double minimum = exp( -0.5 * ( stddevs * stddevs ) );

    // Two random numbers are used to generate a point inside a circle.
    // 'a' corresponds to the radius of the circle, and 'b' to the angle.
    double a = sqrt( -2.0 * log( getRandom() * ( 1 - minimum ) + minimum ) );
    double b = 2.0 * acos( -1.0 ) * getRandom();
    result = a * cos( b );
    next   = a * sin( b );
  }
  else
    next = DBL_MAX;

    // Scale by standard deviation and add mean.
  return result * stddev + mean;

} // getNormSpanRandom

I don't have a project yet to make use of this function, but I know one day I will come up with some thing.

September 21, 2012

Generating Random Numbers with a Given Standard Deviation

There are several times I've wondered about how to generate a set of random numbers that will have a given standard deviation, but I never really dove into this until recently. At work I had measured an analog signal that had a lot of electrical noise. I decided to analyze this data in a spreadsheet to see how this noise correlated to the accuracy of the operation this signal helped control. In doing so, I started asking questions like “how much would half as much noise change this operation?”. I was running simulation models that required random errors within a set standard deviation—now I had a reason to implement this function.

After a bit of searching around, I came across the Box–Muller transform. This algorithm does precisely what I needed. The first implementation of this function I came across used the polar form of the function, which is what I used at work. However, this method requires a loop that keeps generating random numbers until they fall within a desired range. This eliminates the need for a sine/cosine operation, but isn't deterministic (in theory, should your random numbers be truly random, you might never leave this loop). So I've implemented the basic form, and it is this form the rest of the article will be based.

First, the algorithm:

 

The inputs are α and β are random numbers that are greater than 0 but less than 1 (0 < α, β < 1). σ is the desired standard deviation, and m is the desired average. Note that this function returns a vector—2 outputs. The parts of the vectors are almost identical except that the first part uses cosine, and the second part uses sine.

The reason this function works starts with this part of the function:

 

Here is a graph of the result of this function as α goes from 0 to 1.

The x-axis is α, and the y-axis the function of α. If you rotate this graph 90 degrees, it has half the shape of a bell-curve. In fact, it is the shape of the bell curve over 4 standard deviations—4 because the y-axis scale runs from 0 to 4. So this part of the function produces the shape.

The second part of the function involves sine and cosine.

 

This vector should be easily recognizable. As β goes from 0 to 1, this vector produces a circle with radius 1—the unit circle. Then one way to think about this function is to take the graph above and spin it along the x-axis. It will produce a bell-like shape. Unfortunately, I couldn't find the software to be able to make a 3d plot of this shape. The shape of the function is forcing values toward the 0, and the likelihood of values outside this range becomes lower and lower.

These two parts of the function produce a number between 0 and +infinity. Multiplying by the desired standard deviation scales the shape of the curve, and adding the mean offsets the value.

In software we only want one value at a time. Since the algorithm generates two values at once we can either discard one value, or save the second value the first time and use it the next. Since the latter is more efficient that's the best option. The code:

//-----------------------------------------------------------------------------
// USES:
//   Generate random numbers that have a given average and standard deviation.
// INPUT:
//   mean - Desired average.
//   stddev - Desired standard deviation.
// OUTPUT:
//   Random number between -infinity and +infinity characterized by the
//   parameters given.
// NOTES:
//   Uses the Box–Muller transform to accomplish this task.  Each transform
// results in two usable values.  One of these values is saved for the next
// call, and the other is returned.  The next call the previous saved value
// is used.
//   There is a form of this algorithm that doesn't require sine/cosine, but
// it is not deterministic because the random numbers can be rejected.  In this
// implementation all random values are used.
//-----------------------------------------------------------------------------
double getNormalizedRandom( double mean, double stddev )
{
  // Since this functions generates two words at a time, this variable is
  // used to cache the second word.
  static double next = DBL_MAX;

  double result = next;

  // If we don't have a cached word, we make a pair...
  if ( DBL_MAX == next )
  {
    // Two random numbers are used to generate a point inside a circle.
    // 'a' corresponds to the radius of the circle, and 'b' to the angle.
    double a = sqrt( -2.0 * log( getRandom() ) );
    double b = 2.0 * acos( -1.0 ) * getRandom();
    result = a * cos( b );
    next   = a * sin( b );
  }
  else
    next = DBL_MAX;

    // Scale by standard deviation and add mean.
  return result * stddev + mean;

} // normalizedRandom

Note the getRandom function. The value of RAND_MAX could be as low as 32767 so this function will concatenate as many random values as necessary to get the granularity desired. In the GNU C compiler, RAND_MAX is 231. So this function has a granularity of 1/ 262 or round 2.168E-19.

Here is an example of a histogram of the function's output:

This histogram using 100 million samples of the function using a mean of 0, standard deviation of 25. The resulting image is a text book example of bell curve. The true average over this sample set was -0.00104991 and the true standard deviation was 25.000994—so very close to the desired values. This function was exactly what I needed on my work project so I could evaluate the effectiveness of various filter coefficients.

It has been a few years in the making, but at long last the high-deffinition video walk through of my virtual Victorian house is complete.

This took around 2 days to render the 2025 frames, but the upgrades to the Blue-Dragon greatly improved the speeds.  I did have to compromise on some of the rendering settings.  I used no "fuzzy tracing" it is just resulted in strange and ugly sparkles.  This meant I had to change some of the materials I used, but it did work.

The model was drawn in Google Sketchup, and the rendering done in Kerkythea.  Everything in the model was drawn by Andrew Que.  The music was composed by Jasun Karner for this animation and is called "A Victorian Walts".

September 18, 2012

1D Linear Regression

The other day at work I ran across a problem I thought I'd write about. I had an analog input and I had to determine the direction of travel—that is, if the analog input was rising or falling over some time period. This seems like a simple problem because by using two samples at different times one can easily see if the second sample was larger or smaller than the first. However, the analog input is fairly noisy. So how does one find direction with a noisy sample?

One of the simplest filters used all the time in software is a moving average, and it works well for removing noise from a steady state signal. On a signal that is changing a moving average introduces lag, or phase shift. This lag means the signal is always off a little, and how much depends on the rate of change.

A moving average by itself will not indicate the direction of travel. But comparing the output of the average with that of an earlier time will indicate direction of travel. Since my project requires the continuous monitoring of the direction of travel, this means I have to keep a rolling history of previous averages. While that isn't terribly difficult, there is an other option.

Since I already had a rolling history of samples from the average I considered how to determine direction from this despite the noise. I've done a fair bit of work in the past with linear regression. By it's nature, it is determining a line that best describes a noisy set of samples.

Above is an example of 200 data points with a small signal-to-noise ratio (i.e. there is a lot of noise). By simply looking at the data it is not easy to see the data is increasing. The linear regression line (in red) shows the general trend, and the increase is visible. So calculating linear regression for our set of data will also produce direction. Let us start with the equation for linear regression.

This is the equation in it's expanded matrix form—I left in all the redundant parts such as x0 and x1. Recall that xi and yi are the data, and n is the number of data points. When we solve the equation for b and m and we have the intercept and slope for the linear regression line. Here all we need is the slope, and more specifically, the sign of the slope. Before we solve the matrix equation there are some assumptions that will simplify the equation.

An array used for a rolling average only contains data for yi. The xi data we have to generate. This isn't difficult—all this data represents is the location (time) of the sample's position. We can simply use the index i for this. Rewriting the equation with this assumption produces the following:

Using summation identities we can simplify this further.

Now there are only two summations. In this application, the array size is fixed, so the equations involving n are just constants. Let us now solve the matrix equation and reduce the result.

For this application we are only using the slope m. In addition the only concern is the direction of travel, not the rate. So the direction, d, is as follows:

However, it is faster to simply use an “if” statement can check if the slope is greater or less than 0 rather than preform the absolute and divide needed by d and then use an “if” statement to check of that is greater or less than zero.

//------------------------------------------------------------------------------
// USES:
//   Least-square slope of a set of data.
// INPUT:
//   data - Data points for which the slope is to be calculated.
//   dataSize - Number of data points in 'data'.
// OUTPUT:
//   Slope of data.
//------------------------------------------------------------------------------
double getSlope( double const * data, int dataSize )
{
  double result = 0;

  if ( dataSize > 0 )
  {
    // Sums start at zero.
    double sumY  = 0.0;
    double sumXY = 0.0;

    // Loop through all data points.
    int index;
    for ( index = 0; index < dataSize; ++index )
    {
      // Accumulate a sum of all the Y values.
      sumY  += data[ index ];

      // Accumulate a sum of X*Y for all values.
      sumXY += index * data[ index ];
    }

    dataSize -= 1;

    // Calculate numerator of fraction.
    double numerator;
    numerator  = 12.0 * sumXY;
    numerator -= 6.0 * dataSize * sumY;

    // Calculate denominator of fraction.
    double denominator = dataSize * ( dataSize + 1 ) * ( dataSize + 2 );

    result = numerator / denominator;
  }

  return result;
}

In software the algorithm is fairly simple. There is a for-loop where the two sums are accumulated, and then the multiplication and division of the sums.

In conclusion, the 1D slope algorithm works well for determining the direction of travel on a noisy set data. It's implementation is much the same as linear regression, but can be simplified thereby calculated faster.

September 17, 2012

Keyboard Repair

Keyboard repair

Keyboard repair

   I've had this Korg keyboard since sometime in 1998.  In it's many years it has gone through a lot of moves and one of the keys broke and stood higher than the rest.  A small plastic stop with rubber pads had broken off.  So I disassembled the keyboard today to see if I couldn't fix this.  Lucky for me, the plastic piece that had broken off was still inside the keyboard, and a little super glue had the piece right back where it needed to be.  But since I had the keyboard open, I decided to preform some cleaning maintance.  I removed all the keys and washed them, and gave the inside of a through dusting with high-pressure air and a toothbrush.  After I reassembled everything it was almost as good as new.  I still have one key that doesn't always respond as it should.  But this keyboard is 16 years old, so that's not too bad.