Andrew Que Sites list Photos
Projects Contact

May 13, 2013

Gauss-Newton algorithm, Part 1

In the past I have written about a couple forms of least-square curve fitting. The two I've written about are linear regression, and polynomial regression. I've played with a couple of other functions that can be linearized, and therefor solved in a similar way. However, I've wanted to explore a more general method that can curve fit to an arbitrary function.

The basic principle of curve fitting is to have a set of data points that exhibit a shape, and then try to find a mathematical function that matches this data. One must have an idea of what function they think fits the data. In most real-world applications the functions are known. The trick is getting the coefficients of the function set correctly so that the data is most accurately mimicked. Trial and error can sometimes get close, but for functions with several coefficients this can be tricky.

The other day I came across theGauss–Newton algorithm and it is a method for curve fitting that helps find the coefficients for a given function. I started by reading the Wikipedia article on the subject. However, it wasn't much help. So after a bit of searching I found this video of a lecture by Professor C. Balaji of Indian Institute of Technology Madras. It was exactly what I needed to breakdown the steps involved in setting up the Gauss-Newton algorithm, and what I will attempt to explain in detail in this article.

First, the known values: x, y and f. x and y are arrays of data points that correspond to one an other. That is, yn corresponds to xn for some value n. The function f is some function that describes the curve in the x/y data, and in a perfect world f( xn ) = yn. In reality it's usually just that f( xn ) ≈ yn and that's the point: the function approximates (i.e. fits) the data.

The function f has some unknown coefficients, a single column matrix c. Some examples of function f are as follows:

In these examples, cn refers to row n. But matrix c is also iterative. So when it is scripted cn, it is referring to iteration n. So cmn is referencing row m of iteration n. If the iteration is not specified, it's just assumed c is of the current iteration.

Now the mathematical representation of the algorithm:

All the values are matrices. This is an iterative function, meaning you run it for as long as needed to get the answer, or close enough to the answer. So the function says that the next guess is based on the previous coefficients. This requires an initial guess, a c0. If the initial guess isn't very good the algorithm may oscillate or diverge (i.e. not produce anything useful). The variable Jf is the Jacobian matrix with respect to the function f( x ), and the function rf is the residual error function (explained further down).

The Jacobian matrix has the following form:

The Jacobian matrix is a set of partial-derivatives with respect to one of the coefficients. The number of columns is the number of coefficients needing to be solved, and the number of rows is the number of data points. An easy example is the following function:


For the function, the Jacobian matrix becomes:

Note that c0 and c1 are the current guess at coefficients. The matrix will be filled in with computed values for the current guess.

The only other part that needs explaining is the function r. This is a function that returns the residual error. That is, how close the function, using the current coefficients, is to the actual data provided. It is simply:

Tomorrow I will demonstrate how I used the algorithm on a set of data.

May 13, 2013

Gauss-Newton algorithm, Part 2

Yesterday I wrote about how the Gauss-Newton algorithm is organized. Today I would like to show an example of the algorithm in action.

Some months ago at work I was trying to model the way a hot device cooled over time. I had plenty of data, and I knew the function. However, I could only guess at the coefficients as I had no tools to curve fit the data. Now that I know the Gauss-Newton algorithm, I decided to see what it determined should be the coefficients.

First is the function being used to model the data:

Note that b is going to be negative. I made several guesses at the coefficients back when I was working with this data. Here is a graph showing the true data (in blue) and the modeled data with my best guess at coefficients.

Now here is the same chart with the coefficients filled in from the Gauss- Newton algorithm:

Clearly, this is a much better fit.

The measured data comes from more than 43,000 samples spanning 12 hours. My PHP implementation was able to preform 100 iterations of the Gauss- Newton algorithm on this data, and took 147 seconds (just over 2 ½ minutes) to compute.

May 15, 2013

Gauss-Newton algorithm, Part 3

With these functions define, we need three matrix functions: multiplication, transpose , and inversion. Multiplication and transposes are fairly trivial, but inversion is not. While I've not implemented inversion directly, I have implemented solving a system of equations in the past. The way the equation is being used, this prior implementation will work fine and actually save one matrix multiplication.

abstract class NewtonGaussRegression
  abstract public function 
partialDifferential$x$coefficientIndex$coefficients );
  abstract public function 
  abstract public function 
getFunction$x$coefficients );

  private function 
transpose$matrix // ...

private function multiply$matrix$multiplicand // ...
private function getErrorMatrix$x$y$coefficients // ...
private function solve$matrix$answers // ...

  // Run an iteration of the Newton-Gauss algorithm.
  // INPUT:
  //   $x - array of x data points.
  //   $y - array of y data points that correspond to x.
  //   $coefficients - Initial or last guess at coefficients.
  // OUTPUT:
  //   An array that is the new guess at coefficients.
public function refineCoefficents$x$y$coefficients )
// Number of coefficients.
$numberOfCoefficients $this->getNumberOfCoefficients();

// Compute Jacobian matrix.
    // The Jacobian matrix consists of the partial-differentials.
    //  [ d f( x_0 ) / d c_0   d f( x_0 ) / d c_1  ...  d f( x_0 ) / d c_m ]
    //  [ d f( x_1 ) / d c_0   d f( x_1 ) / d c_1  ...  d f( x_1 ) / d c_m ]
    //  [       .                  .              .           .            ]
    //  [       .                  .                .         .            ]
    //  [       .                  .                  .       .            ]
    //  [ d f( x_n ) / d c_0   d f( x_n ) / d c_1  ...  d f( x_n ) / d c_m ]
    //   Where f() is the non-linear function, d f() / d c is the partial-
    // differential with respect to c.  n is the number of rows in x, and m
    // is the number of coefficients in the equation.
$jacobianMatrix = array();
    for ( 
$row 0$row count$x ); ++$row )
$jacobianMatrix$row ] = array();
      for ( 
$column 0$column $numberOfCoefficients; ++$column )
$jacobianMatrix$row ][ $column ] =
$this->partialDifferential$x$row ], $column$coefficients );

// Compute the transpose of Jacobian matrix.
$jacobianTranspose $this->transpose$jacobianMatrix );

// Compute J^T * J.
    // Where J is the Jacobian matrix, and J^T is the transpose.
$jacobianIntermediate =
$this->multiply$jacobianTranspose$jacobianMatrix );

// Get the residual error R.  This matrix defines how "good" the fit is.
$errorMatrix $this->getErrorMatrix$x$y$coefficients );

// Compute J^T * R.
$errorIntermediate $this->multiply$jacobianTranspose$errorMatrix );

// Compute the amount of change needed to each coefficient.
$coefficientsDelta =
$this->solve$jacobianIntermediate$errorIntermediate );

// Adjust coefficient by change needed.
for ( $index 0$index $this->getNumberOfCoefficients(); ++$index )
$coefficients$index ] += $coefficientsDelta$index ][ ];

// Return new guess at coefficients.
return $coefficients;

  public function 
getR_Squared$x$y$coefficients // ...

// NewtonGaussRegression


I wanted to make an implementation of the Gauss–Newton method, so I decided to do this in a PHP class. First I created an abstract class that preforms the steps of the Gauss-Newton algorithm, and has the three matrix functions needed for the algorithm. There are three abstract functions: the function itself, the partial-derivatives with respect to a coefficient, and a function that returns the number of coefficients. A child class is derived from this base abstract class and defines the functions for the mathematical function it solves for. I created six exponential functions, and a power function:

Most the these functions work, except those with 4 or 5 coefficients which generally just oscillate wildly.

The implementation was fairly easy and I had the basics working the first day. What I found was that as the number of coefficients increases, the better initial guess is needed. The function f) = a bc t g doesn't work at all, even if the function starts very close to the real coefficient. It simply begins to oscillate wildly. This may be due to the limitations of floating-point precision, and an arbitrary precision implementation may in the future.

For now, I plan on cleaning up the PHP implementation and placing it on the web.

May 27, 2013

Gauss-Newton page complete.

   After several weeks of work, I have completed the implementation, documentation, and website for my Gauss-Newton class.  The page is linked in the projects section.    The release includes 16 functions (5 of which don't converge) with a test suite.  Functions are define with both conventional floating point as well as BC Math for arbitrary precision.  I included four demos on the page, and I'm pretty happy with the inline-demo method I used to place them on the page.  The code is displayed, run, and it's output also displayed.  This makes adding additional demos easy.  The documentation was done in phpDocumentor.  I have mixed feelings about it as I think the documentation layout has become less intuitive in more recent versions.  However, it functions so I'm not going to complain too badly.
   I ran across a couple of problems when trying to publish this website.  First, I had called my polynomial regression class "least-square regression."  This is partly correct, but not the best description.  So I renamed it, which required changing a lot of locations both on the website and in software.  The second complication was getting documented software and a test suite.  I had implemented several functions, but not bothered to test them.  Once implemented and tested using regular floating-point, I had to do the implementation in BC Math.  None of this was hard, just time consuming.  The BC Math implementation required creating three new functions: Pi, sine and cosine.  I'll probably write an article about this in the future. 
   For building the website, I generally use a simple template that I have used on most of my project pages.  I pick some colors, and do a fade for the background.  The fade has always been a cosine fade because I think those look better.  I've setup my image editor to do this, but it's not a perfect cosine fade because I simply gave it a bunch of points to mimic the fade.  When my fade looked too grainy, I decided to do the fade with PHP.  This is an other side project I plan to clean up and publish.
   Lastly, I found the comment system that I wrote back in 2003 had issues as well.  I used several deprecated functions, and the captcha didn't want to work.  I brought it back to the world of the functional, but it could use some cleanup.
   It has been a lot of work to get all of this assembled, but hopefully someone will find this library useful.