One of the reasons I started researching polynomial regression was its use for interpolation. The applications for this are limited as larger degree polynomials can easily oscillate wildly, but there are still times it can be useful. Today I found a use I want to share.
For the past several days our computer ππ has been logging the amount of light (in lux) that falls on our house's roof top. To my dismay it often crashes and I haven't figured out the cause. So gaps exist in my data. Today I got the most complete graph I've yet had, but there was a drop out just after 7:00 pm that lasted until 8:40 pm. The sun was setting during this time and all the days useful data had pretty much been logged. I am still left with a gap in my data.
To fill in the gap we could just do a linear approximation between the two points on either end of the data. When we do, our graph of the data right around the gap looks like this:
Here you can see the interpolation line in red does connect the two points closest to the gap, but doesn't follow the curve. The approximation is alright would probably be functional for what I am trying to do. However, I know I can get a better curve. This is where polynomial regression can be used. Using a polynomial function to approximate the existing data will give us a continuous function that will also fill in the missing data. Since a line won't do a good job, there is no point trying linear regression. So let us next start with quadratic regression (a 3 coefficient polynomial).
This isn't a great representation. Now cubic regression (4 coefficient polynomial).
This looks much better. The curve follows along most of the existing data points and makes a nice transition. We could stop here and calculate all the missing data points using this polynomial, but will a higher degree polynomial with more coefficients make the curve fit even better? Let us try quartic regression (5 coefficients).
This curve now runs through almost all the existing data points and is what we are looking for. At this point, the curve fits the data better than the data itself as the true data has noise. This is the curve I used to interpolate the data.
For the sake of inquiry we can continue to increase the degree of the polynomial. Octic regression (9 coefficients) produces this:
Here we start to see an oscillation being introduced into the data. While it fits and may be what the happened during the missing data period, it isn't any better than the curve produced with quartic regression.
Some strange things happen once more than 9 coefficients are used. At 10 coefficients we get this:
Simply knowing the physical phenomenon taking place, this graph isn't possible. However, it does actually have a better coefficient of determination than the lower degree polynomial. So it does make sense to select which curve to use based on it's visual attributes. The algorithm is designed simply to minimize the residual error with a given degree polynomial, and knows nothing of the restraints imposed by the data source.
This work was all done using my online polynomial regression calculator and only took a couple of minutes. A slightly modified version produced the graphs for this article as I removed the (meaningless) units and allowed larger degree polynomials.