Yesterday I wrote about the outline of my PHP implementation of the Gauss–Newton method. I found that there were some issues that maybe been related to floating-point precision. The way around this is to use an arbitrary-precision arithmetic. I've done this in the past, and my polynomial regression PHP library uses PHP's BC Math functions because larger-order polynomial regression requires large numbers.
PHP's BC Math functions, however, are very basic. There is no exponential function, for example. On the PHP forms, I saw a basic implementation by Thomas Oldbury for several functions in BC Math that included factorial, exponential, natural log, and exponentiation. That was work I didn't have to do, so I went about porting my PHP code to work with the BC math functions. But I started to run into problems.
Today, I'm going on a tangent away from the Gauss–Newton method to explore some topics on the implementation of the series expansion of math functions. To understand the issues I encountered, we will need to dive into Mr. Oldbury's functions. Here they are copied from the PHP forms:
This implementation uses the power series representation of the exponential function. That function is:
This is exactly the definition of the function. However no one has time to run the function to infinity, so the approximated version is used:
One of the artifacts of this is that for any m the accuracy decreases as x increases. If m = 7, and x = 4, then the calculation will be off by 5%, and if x = 10 the error jumps to 41%.
There is an other problem too, mainly with the line “$res +=”. This is probably an oversight, but it is clearly a bug. When accumulating the result, the function bcadd needs to be used.
This error will effect the accuracy of the exponentiation ( x to the n power) function as well. To see why, we have to understand exponentiation. For integer exponents, it is easy to do exponentiation.
But this only works when n is an integer number. We know that 2 3 = 8, but what about 2 2.5 ? We can solve this kind of problem with logarithms, and this is usually taught in algebra II. I wrote about this some time ago.
The general form is as follows:
So if you have natural log, and an exponential function, you can calculate any exponentiation. Thus, if the natural log and exponential function are not accurate, the exponentiation will not be accurate either.
How can this be fixed? I wrote about computing an exponential using Taylor series. Using an actual Taylor series requires look-up tables. That was fine for my C implementation because the precision of the floating-point gave definite limits to how large the table needed to be. That isn't going to work as well for arbitrary-precision. However, there was a trick I used that can be ported. Rather than calculate a set number of iterations, the calculation can be repeated until the values no longer change. It's a trick that takes advantage of the limitations of the implementation. The calculations will keep track of the most significant digits, and at some point the less-significant digits are discarded. This is true even of arbitrary-precision math (one sets an upper limit to the precision). So I incorporated this change into the exponential function.
Why this fixes accuracy issues it also means the number of iterations is not fixed. For small values this isn't a problem, but for large values the loop can take a very long time. Since the exponential series converges faster for small values, we need a way of forcing the input to be small. Turns out there is a trick that can be used.
Why this makes things easier is that we can divide x down to something small. Using a floor function on the log means m is always integer. So once x has been divided by m , take it's exponential, and then multiply that result in a loop m times to get the finial answer.
As it turns out, the natural log function suffered from the same kind of problem as the original exponential function in that series converges slowly for some values. There are several methods one can use to describe a function. The calculation of π have several series, some of which converge more rapidly. I found the Euler transform on the Mercator series converged more quickly and implemented that instead. It takes the following form:
This from also eliminates some of the large numbers using a series that has factorials. One thing this version can not deal with is x = 1, but that isn't a problem because ln( 1 ) = 0 and we can simply test for that. This function is faster to converge, but still doesn't deal well with large values. So there is one more trick to speed the function up.
Proof this works:
Again, we are using a floor function to keep the integer part of the log base-10. This is easy to compute because it just requires moving around the decimal point. It's easier to see in an example.
Now there are two natural logs but both of them 10 or less. Taking the log of a constant means the value can be cached, so after it is computed once it doesn't need to be computed again. The reason log base-10 is used is because BC math numbers are just text, and all an integer log base-10 really does is move around the decimal point. That's simple string manipulation.
After a good deal of tests I found these two functions worked well. There are still some issues with rounding. One method I used to compensate was to increase the scale while running the calculation, and then truncated the answer. Preliminary tests showed that adding 5 to the scale was enough to generate accurate answers in the current scale. All my numbers were tested against GP/PARI calculator, a computer algebra system that can deal with arbitrary-precision. Examples I tried included exp( 2 ), exp( 100 ), exp( 63531 ), ln( 2 ), ln( 100 ), ln( 985988091368 ), pow( 12.3, 43.21 ). All were accurate for 100 digits, which was the selected precision. Note that exp( 63531 ) is a 27,000+ digit number—but only the first 100 most-significant digits were accurate.
The new implementation is more complicated than the original, could probably used some more cleanup, and definitely some more testing. But the new version is more accurate and a lot faster. This project has been a fun tangent. I'm probably one of the only people who enjoyed the series work in calculus II, and I love when I get to put it into practice. And a thanks to Thomas Oldbury for sharing his implementation as it was a great starting point to work from.
With these functions define, we need three matrix functions: multiplication, transpose , and inversion. Multiplication and transposes are fairly trivial, but inversion is not. While I've not implemented inversion directly, I have implemented solving a system of equations in the past. The way the equation is being used, this prior implementation will work fine and actually save one matrix multiplication.
I wanted to make an implementation of the Gauss–Newton method, so I decided to do this in a PHP class. First I created an abstract class that preforms the steps of the Gauss-Newton algorithm, and has the three matrix functions needed for the algorithm. There are three abstract functions: the function itself, the partial-derivatives with respect to a coefficient, and a function that returns the number of coefficients. A child class is derived from this base abstract class and defines the functions for the mathematical function it solves for. I created six exponential functions, and a power function:
Most the these functions work, except those with 4 or 5 coefficients which generally just oscillate wildly.
The implementation was fairly easy and I had the basics working the first day. What I found was that as the number of coefficients increases, the better initial guess is needed. The function f( t ) = a bc t + d + g doesn't work at all, even if the function starts very close to the real coefficient. It simply begins to oscillate wildly. This may be due to the limitations of floating-point precision, and an arbitrary precision implementation may in the future.
For now, I plan on cleaning up the PHP implementation and placing it on the web.
Yesterday I wrote about how the Gauss-Newton algorithm is organized. Today I would like to show an example of the algorithm in action.
Some months ago at work I was trying to model the way a hot device cooled over time. I had plenty of data, and I knew the function. However, I could only guess at the coefficients as I had no tools to curve fit the data. Now that I know the Gauss-Newton algorithm, I decided to see what it determined should be the coefficients.
First is the function being used to model the data:
Note that b is going to be negative. I made several guesses at the coefficients back when I was working with this data. Here is a graph showing the true data (in blue) and the modeled data with my best guess at coefficients.
Now here is the same chart with the coefficients filled in from the Gauss- Newton algorithm:
Clearly, this is a much better fit.
The measured data comes from more than 43,000 samples spanning 12 hours. My PHP implementation was able to preform 100 iterations of the Gauss- Newton algorithm on this data, and took 147 seconds (just over 2 ½ minutes) to compute.
In the past I have written about a couple forms of least-square curve fitting. The two I've written about are linear regression, and polynomial regression. I've played with a couple of other functions that can be linearized, and therefor solved in a similar way. However, I've wanted to explore a more general method that can curve fit to an arbitrary function.
The basic principle of curve fitting is to have a set of data points that exhibit a shape, and then try to find a mathematical function that matches this data. One must have an idea of what function they think fits the data. In most real-world applications the functions are known. The trick is getting the coefficients of the function set correctly so that the data is most accurately mimicked. Trial and error can sometimes get close, but for functions with several coefficients this can be tricky.
The other day I came across theGauss–Newton algorithm and it is a method for curve fitting that helps find the coefficients for a given function. I started by reading the Wikipedia article on the subject. However, it wasn't much help. So after a bit of searching I found this video of a lecture by Professor C. Balaji of Indian Institute of Technology Madras. It was exactly what I needed to breakdown the steps involved in setting up the Gauss-Newton algorithm, and what I will attempt to explain in detail in this article.
First, the known values: x, y and f. x and y are arrays of data points that correspond to one an other. That is, yn corresponds to xn for some value n. The function f is some function that describes the curve in the x/y data, and in a perfect world f( xn ) = yn. In reality it's usually just that f( xn ) ≈ yn and that's the point: the function approximates (i.e. fits) the data.
The function f has some unknown coefficients, a single column matrix c. Some examples of function f are as follows:
In these examples, cn refers to row n. But matrix c is also iterative. So when it is scripted cn, it is referring to iteration n. So cmn is referencing row m of iteration n. If the iteration is not specified, it's just assumed c is of the current iteration.
Now the mathematical representation of the algorithm:
All the values are matrices. This is an iterative function, meaning you run it for as long as needed to get the answer, or close enough to the answer. So the function says that the next guess is based on the previous coefficients. This requires an initial guess, a c0. If the initial guess isn't very good the algorithm may oscillate or diverge (i.e. not produce anything useful). The variable Jf is the Jacobian matrix with respect to the function f( x ), and the function rf is the residual error function (explained further down).
The Jacobian matrix has the following form:
The Jacobian matrix is a set of partial-derivatives with respect to one of the coefficients. The number of columns is the number of coefficients needing to be solved, and the number of rows is the number of data points. An easy example is the following function:
For the function, the Jacobian matrix becomes:
Note that c0 and c1 are the current guess at coefficients. The matrix will be filled in with computed values for the current guess.
The only other part that needs explaining is the function r. This is a function that returns the residual error. That is, how close the function, using the current coefficients, is to the actual data provided. It is simply:
Tomorrow I will demonstrate how I used the algorithm on a set of data.