I was taking a look at hard drive prices over the 2 years for which I have data. The plots all include linear-regression for helping to predict the trend of where future prices should be. However, the longer the time scale, the less useful the linear regression line is. Regression doesn't have to be linear. In fact, I already wrote about 2nd degree (quadratic) regression back in September. I had to learn a little matrix math to do this, and having a slightly better understanding now, I can expand the regression to the nth degree. In doing so, there's a chance of finding a higher-degree regression that is a better curve fit or longer time frames of hard drive prices.
Let's first revisit the equation of quadratic curve fitting.
Here, x and y are the arrays of data. The value n is the number of values in the array and a, b, c are the coefficients.
We can expand this to an arbitrary degree like so:
Here, j is the polynomial degree desired and c0 through cj are the coefficients. We can use Cramer's rule to solve this matrix equation. In order to do this, we need a general determinate function. This function must take a matrix of arbitrary size and compute the determinate. This can be done with a recursive function, continually dividing the matrix into 2x2 pieces. For example, a 3x3 and 4x4 matrix can be solved like this:
The above matrices are solved by a row and column subdivide. For example, in both, the first term has a multiplied by the sub-matrix excluding the row and column that a is in. The signs alternate between columns. This can be expressed more generally as:
Where Mij is the subdivided matrix. This turns out to be fairly easy to code. I found this example written in C.
Now that we can solve a square matrix of arbitrary size, we have all the components needed to do the regression. I created this graph using a 6th degree regression:
Here in red we see the price per gigabyte of 1 terabyte hard drives from Jun of 2008 to June of 2009. In blue is the 6th degree regression curve. It looks to be a pretty good fit. Unfortunately, this fit is a lucky coincidence for this one case. In general I found the trend data never a all-around good polynomial regression curve that fit nicely for all time spans. And as for prediction, even with this curve, which looks like it fits well, the future prediction isn't good—the curve turns upward in about the middle of June. So something else will be needed if I am to make predictions about future prices.
Despite the fact my least-square regression curve fitting algorithm failed to produce a good future prediction, it is a good algorithm to have around. I went ahead and created a PHP class for calculating this regression to an arbitrary degree, and started a project page for it.
Above we see a plot with regression plotted at 7 different degrees. The horizontal blue line is the average, or the 0th degree polynomial, or the mean average. The diagonal line is 1st degree, or linear regression. The remaining curves are increasing degrees until the orange line, which is the 6th degree.
The higher the degree, the larger the numbers become in the summations. In a 6th degree polynomial, there is a sum of the values to the 12th power. Because of this, I had to implement the functions using arbitrary precision arithmetic. This slows an already slow process. The above plot takes about 20 seconds to calculate. Higher orders take much longer, each higher degree about twice as long as the previous. There are probably improvements that can be made to my algorithm, but for now, I have something that functions at a general level. I'll have to wait until I take linear algebra before I can dive into this problem more.