I've written in the past about various algorithms I came up with for getting weighted random numbers, but ever since leaning the Box-Muller transform I've been refining my ideas. The Box-Muller transform allows the generation of random numbers with Gaussian distribution. That is, a set of numbers with a known standard deviation. That, in turn, allows for the generation of numbers with some standard statistical properties.
Today I needed a random number between two values, but I wanted something with a known standard deviation. Gaussian distribution does not have a minimum and maximum, theoretically capable of giving any number between negative infinity to positive infinity. However, standard deviation tells us that around 68.2% of the values will be within 1 standard deviation, 95.4% with in 2, and 99.73% with in 3 standard deviations. So by dividing the output of the Box-Muller transform by 3, 99.73% of these values will be between -1 and 1, and only 0.27% will be outside this range. If we ignore values outside this range we can generate a ranged random number generator with a given standard deviation.
Let us first start with the Box-Muller transform.
There are two inputs, r1 and r2, and two results. The inputs are random values such that 0 ≤ rn ≤ 1. Since there are two input values, there are also two output values. The output is such that -∞ < x < ∞. The Box-Muller transform is kind of the unit circle of Gaussian distribution with 68.2% between -1 and 1, 95.4% between -2 and 2, 99.73% between -3 and 3, and so forth. Multiplying the output of the Box-Muller transform will result in a stream of numbers with a standard deviation of the multiplicand.
Now to get this into a fixed range we just need to scale the results. The problem is that input range is infinity. However, we know that most of the values are closer to the center. So we can simply discard values outside of the desired standard deviation. To illustrate this, let's say we want numbers on average around 75 between 50 and 100 with a standard deviation of 8.
Let xn be the output of the Box-Muller transform. To get a known standard deviation we need to scale down xn. Let s be the scale factor. s= 2d/Δ where d is the desired standard deviation, and Δ=max-min. For our example, d=8, and Δ=(100-50)=50. Thus s = 0.32.
Let y = s xn. Now discard any y where |y| > 1. That is, if the results are larger than one, get a new random number and try again. We now have a value y such that -1 ≤ y ≤ 1, and that we can scale to the desired range. Let z = y Δ/2+ Δ/2+m where m is the minimum value and M is the maximum value. The result z is such that m ≤ z ≤ M.
Here is a graph of the example:
This chart shows 1,000 points generated at random with this algorithm. The actual standard deviation is 7.972 which is pretty close to the desired 8.0, average of 75.222, min of 51.950 and max of 99.961. So our algorithm seems to work.
There are some limitations to this system which I haven't yet been able to analyze in depth. This algorithms gets close to the desired standard deviation, but usually comes in under. My guess is that this is due to the amount of data discarded. The larger the standard deviation, the more data is discarded and thus the more this missing data effects the resulting standard deviation. This becomes really noticeable for standard deviation above 18%. For our example range of 50 to 100, 18% is 0.40 * (100-50) = 20. There may be a way to compensate for this by increasing the desired standard deviation to account, but I would have to dive more into the math to come up with a compensation factor.
Here is the C code for the algorithm:
In this example, the function getRandom returns a number between 0 and 1, but has not been specified.