Andrew Que Sites list Photos
 Computers Projects Contact
 Next week Previous week
Jun 30 to Jun 24  -  Jun 23 to Jun 17  -  Jun 16 to Jun 10 -  Jun 9 to Jun 3  -  Jun 2 to May 27
September - August - July - June - May - April - March
2013 - 2012 - 2011 - 2010 - 2009
Current week

 06/15/2012 - Least-Square Regression Demo + Add a comment

The other day I received an e-mail with questions about my least-square regression PHP class. I haven't touched this implementation since I wrote it for an article on least-square regression in June of 2009, so it's time for a demo.

The original implementation used Cramer's Rule to solve the resulting system of equations. I wrote in May of 2011 an article about a faster method for doing this. So I decided to implement that method and then make a demo to show how the regression curve fits data.

This demo has a number of black points that can be moved around to form a polynomial curve. The thin red line in the center represents the true polynomial curve. The blue dots represent data points along the true curve with random error introduced. The scatter and concentration of the error can be controlled with the two sliders. The higher the concentration value, the closer the error will fall toward the curve. The scatter magnitude controls how much it is possible for the error to deviate from the true data. The data with the random errors is then used as input to the least-square regression function, and the output of that function is displayed in green. So the green curve should match closely the red curve.

What this simulation shows is the ability of the regression function to recover polynomial coefficients from a signal with a fairly low signal-to-noise ratio with pretty good accuracy. The function must assume the data is from a polynomial of a specific degree. The real-world applications are probably limited, but surely exist—especially with lower degree polynomials.

From experimentation, it seems that curves that have higher curvature are reconstructed the best. That is, curve that change a lot do better than curves that are fairly flat.

The fit of the curve is being measured with residual sum of squares. The lower this value, the closer to the regression curve is to the actual curve with zero being perfect. In this graph, values below 0.5 are pretty good fits, and values below 0.01 put the true curve (in red) in the regression curve (green).

I updated the least-square regression PHP class page with the new version of the class, added some documentation, and some examples. If one person found this class useful, maybe more people will as well.

 06/01/2012 - Unevenly Weighted Random Numbers + Add a comment

A couple of weeks ago, I wrote about weighted random numbers. After implementation and some experimentation, I settled on a versatile function that incorporates all the fetchers of the weighting system. Mathematically, it's a little ugly because there is an “if” statement and we end up with a piecewise function.

Where m is the minimum value, M is the maximum value, c is the center point (mcM), S is the concentration coefficient (useful range 1 ≤ S < ∞), and α and β are random numbers between 0 and 1. The core of this function is the weighting.

This has been scaled so the output is in a given range.

Here, mwsM were as 0 ≤ w ≤ 1. From here, the body of the function is split before and after the center point. For this we require a second random number, α. This value is used to determine of the value is to the left or right of center, and the min and max of the function are adjusted accordingly.

Min (m)
Max (M)
Center (c)
Concentration (S)

The top graph shows the distribution of 1,000 samples, and the lower graph shows a histogram of the distribution. The average is calculated over all the samples. If the center value is half-way between min and max, the average should be the center value (or close to). The center value reflects the highest peak value in the histogram, which should always be close to the specified center.

There are some things you can do with this function that are not meaningful. Having a center value outside the min and max value will still generate values, but probably not useful for anything.

You can also use a concentration coefficient less than one and greater than zero (0 ≤ S < ≤ 1) . This has the effect of pushing the concentration away from the center point and toward the min and max values—basically the acting in the reverse of the normal algorithm. This may be useful for generating a value that is usually either one value or an other, with very little in between.

Here the min is 0, max is 100, center is 20, and the concentration coefficient is 0.1. Notice how the center point is the least populated area of the graph.

There are some ways to use this function to generate some of the other weighted functions. For example, let c = ½ (M – m) + m. This will make the function have equal distribution on both sides of the center point.

Here, the function C is a centered function, c is the center point, and s is the span that can be deviated from the center.

For a simple left or right weighted version of the function, simply set the center point to the min value (left weighted) or max value (right weighted).

Using a concentration coefficient of one (S =1) results in just random uniform random data (assuming β is random). Small values of S are harder to notice in this demo, but become pronounced when more samples are used.

Here is an example of a center at 70, min of 0, max of 100, and a concentration coefficient of 2. At 1000 samples it is not apparent there is any concentration, but at 100,000 samples it is easier to see. The higher sample set also makes the histogram more clear. Notice how the histogram falls to around 100 on both sides, but more rapidly to the right of center. This is necessary because of the uneven weight. So a 0 or a 100 are both equally likely (or unlikely as the case may be), but a 60 and 80, despite being equal distance from the center point are not both as likely as one an other (higher likelihood of 60 over 80).

//----------------------------------------------------------------------------
// Return a weighted random number with an uneven distribution from center.
//   \$min - Smallest possible.
//   \$max - Largest possible value.
//   \$center - Location of highest conentration.
//   \$concentration - How strongly to curve number--the higher the value,
//     the strong the curve tends toward center.
//   \$alpha - Number between 0 and 1, generally random.
//   \$beta - Number between 0 and 1, generally random.
//----------------------------------------------------------------------------
function uneven\$min\$max\$center\$concentration\$alpha\$beta )
{

// Curve beta.

\$numerator   \$beta;

\$denominator \$beta * ( \$concentration ) + 1;

\$result      \$numerator \$denominator;

// Get center point.

\$centerDivide = ( \$center \$min ) / ( \$max \$min );

// Figure out if this result is to the left or right of center.

if ( \$alpha \$centerDivide )
{

\$result *= \$center \$min;

\$result  \$center \$result;
}
else
{

\$result *= \$max \$center;

\$result += \$center;
}

return
\$result;
}

 05/19/2012 - Congratulations Tazz + Add a comment
 (600x600) (900x900) (1800x1800)
Show all photos from 2012-05-19
Congratulations to our good friend Tazz Davies for finishing his degree at UW Madison.  Well done.
 05/14/2012 - Weighted Random Number + Add a comment

I've written articles about weighted random number in the past, but today I ran into a use I've been meaning to explain for a long time.

For example when rolling two dice, the mostly likely number to roll is 7. With 4 dice, it's 14. These are weighted rolls in the context of this article as the likely outcomes are not evenly distributed, but tend toward some center point.

One of the weighting algorithm I've written about in the past is Banded Inverse Root Nonuniform Scatter. This is the function:

Where α1 and α2 are random numbers between 0 and 1, and S is the “scatter coefficient”. The root of this function is the banding part.

This weights the roll toward 0. The larger the value of S, strong the pull toward 1. Using two of these functions together give a range the function a peak centered at 0 that goes both positive and genitive. Note that the last part of the function normalizes the output so it is between 0 and 1. The process will be explained in a bit, but this function will be called nb(S). So in parts, the full function is:

This function can be simplified if the square root is removed. The root makes the curve more gradual, but this isn't needed.

The trick to this function is the use of the -1, +1 in the denominator. This allows the scatter coefficient to have a defined range between negative infinity and positive infinity (i.e. -∞ < S < +∞), although the useful range is 0 ≤ S < +∞.

The normalized function looks like this:

Rebuilding the center-weighted function results in:

So g( α1, α2, S ) is our weighted function. α1 and α2 are random numbers between 0 and 1. S is the scatted coefficient 0 ≤ S < ∞. The larger the value of S, the more weighted the output is toward 0.

The graph above shows the histogram for distribution for various scatter values and illustrates how as the scatter coefficient increases, the concentration toward the center increases. Note that this function does not create a bell curve (or normal distribution). Instead it has a sharp point at the center. This means that for larger values of S the likelihood of being away from the center point diminishes very rapidly—much more than it would with a function that has normal distribution. So the function favors the center point more strongly than those producing normal distribution.

Now some of the function's versatility. The function is normally used to generate some range.

Here, M is a scale factor (magnitude) and c is an offset that allows the function to have a range such that -(M + c) < v < (M + c). Now a function can be defined to return a value in a given range with some weight.

Where vmin < w( vmin, vmax, S ) < vmax. The floor function makes sure the values are integer numbers, and can be omitted if real number are desired. The center point will always be half way between vmin and vmax.

This function can be modified slightly to simulate a dice roll. Let n be the number of dice, and s be the number of sides on each die. Then vmin = n, vmax = n * s. The scatter coefficient (S) can be varied, but the distribution will not be identical to that of an actual dice roll.

Here the floor function is required. n < d( n, s, S ) < n*s.

In this histrogram, the difference in distribution can be seen between an actual dice roll (in this case, five 6-sides die) and the simulated function d( n, s, S ) where S = 3. Note they both peak at the same location (between 17 and 18) with roughly the same likelihood for these numbers. However, the chances for rolling a 15 are greater with a true dice roll, and less in the simulated. Likewise, rolling an 8 is less likely with dice, and more likely simulated. Keep in mind that the simulated dice roll can do something an actual dice roll can not: produce fractional results. If the floor function part of d( n, s, S ) is removed, any real number in the range can be returned. So while an exhaustive check for every dice roll is possible, every simulated roll is not. Thus, the graph above used one million samples to produce the simulated histrogram.

There are some additional way the function g( α1, α2, S ) can be used. If an uncentered value desired, the random input can be fixed.

These histrogram show the output of 10,000 samples of the function, where α is a random number (0 ≤ α ≤ 1). Note how in both cases, when S = 1 the distribution is uniform for all values. This is because when S = 1, the weighting function is doing nothing, and the random value α is being returned.

 05/11/2012 - SPAM + Add a comment
Boston
 (600x600) (900x900) (1800x1800)
Show all photos from 2012-03-09
Started getting SPAM to two e-mail addresses from the same group:  Dice Stars Casino.  They somehow got my last.fm e-mail address as well as my linkedin e-mail address.  I use a unique e-mail address for every online service so that when I get SPAM, I know the origin, and I can remove the address.  It's strange that the same group got two addresses and started using them within days of each other, but it seems even more strange that large site like linkedin and last.fm both somehow gave up my address.  It's possible that somehow my e-mail address alias list was compromised, but that seems rather unlikely.  Time to keep eyes open.
From asdf
May 11th, 2012 at 5:13PM
 What? Companies give private user data to shady information brokers? I'm shocked, simply shocked!
 05/01/2012 - Star Polygons + Add a comment

I was doing some work in Google Sketchup, and started experimenting with star polygons. I was drawing an 8-point star when it dawned on me was more than one configuration that could be used to draw such a star. After a little reading, I discovered the nomenclature on this topic. Using the Schläfli symbol, I discovered what I had normally been drawing when I made stay polygons was of the form {/ %u230A/ 2%u230B-1}. Schläfli symbol is of the form {p / q}, where p is the number of points in the star, and q is the number of points between connecting lines. For example, an octagon shape is {8 / 1} as it has 8 points, and each line is connected to the very next point. A pentagram is {5 / 2}, having 5 points, and each line is connected to the 2nd closest point to either side. What I had been drawing was a star with the distance between points always %u230A/ 2%u230B-1, or having the connecting points as far away from one an other as possible for the star.

After learning this, I decided to create a little web application to demonstrate this.

Points.

Points between connections.

Line width.

The demo is done using Scalable Vector Graphics (SVG), with some Javascript used manipulate the image. The math is quite simple. First we get the distance between vertices (points on edge). The distance is in degrees (or radians). For example, an 8-point star is 360º / 8º = 45º degrees between points. To draw an octagon, we simply start a 0º and draw a point to 45º, and then 45º to 90º, ext. In order to be coordinates, we need a distance from the center—the radius—which depends on the height and width of the image. The SVG image uses standard computer coordinate—that is (0,0) is the upper left part of the screen. To convert from the polar coordinates requires first knowing where the center of the view port is located. This is half the width and height of the view port. Polar coordinates typically start with 0º on the right side, but I wanted it like a clock—0º on the top. So the polar to screen conversions are as follows: x = centerXradius * sin( angle ), y = centerYradius * cos( angle ).

The only trick comes when drawing star figures. For example, a {6 / 2} star is actually two {3 / 1} stars (the notation is 2 {3 / 1}), and not a single continuous path. For this case, we need only to know how many smaller polygons this figure is made from, and draw each of them offset one point from the previous. For example, a {15 / 6} is the same as 3 {5 / 2}. This means there are 3 pentagrams, each offset 24º (360º / 15 sides). So the first pentagram would be drawn with it's tip at 0º, the second at 24º, ext.

D.C.
 (600x600) (900x900) (1800x1800)
Show all photos from 2012-03-06

Ubuntu 12.04 was released today. After it was, and I managed to get on their website, I started doing a torrent download of, well, all of them. If nothing else, it's a good test of our bandwidth. Our connection has been holding around 5.2 Mbit/sec, peaking out around 5.36 Mbits/sec. I don't even know what speeds our ISP say we should have, but it's nice to give them a workout from time to time.

Why do I need all the flavors of Ubuntu? Well, my main computer is a 64-bit machine, but I have several virtual machines setup, and several other computers that run Ubuntu as their primary OS. So having each of the types (desktop, server, and alternate both i386 and AMD64) will save me a step in the future. Otherwise I always find I need the version I don't have downloaded.