In the past I have written about a couple forms of least-square curve fitting. The two I've written about are linear regression, and polynomial regression. I've played with a couple of other functions that can be linearized, and therefor solved in a similar way. However, I've wanted to explore a more general method that can curve fit to an arbitrary function.

The basic principle of curve fitting is to have a set of data points that exhibit a shape, and then try to find a mathematical function that matches this data. One must have an idea of what function they think fits the data. In most real-world applications the functions are known. The trick is getting the coefficients of the function set correctly so that the data is most accurately mimicked. Trial and error can sometimes get close, but for functions with several coefficients this can be tricky.

The other day I came across theGauss–Newton algorithm and it is a method for curve fitting that helps find the coefficients for a given function. I started by reading the Wikipedia article on the subject. However, it wasn't much help. So after a bit of searching I found this video of a lecture by Professor C. Balaji of Indian Institute of Technology Madras. It was exactly what I needed to breakdown the steps involved in setting up the Gauss-Newton algorithm, and what I will attempt to explain in detail in this article.

First, the known values: *x*, *y* and *f*. *x* and *y* are arrays of data points that correspond to one an other. That is, *y*_{n} corresponds to *x*_{n} for some value *n*. The function *f* is some function that describes the curve in the *x*/*y* data, and in a perfect world *f*( *x*_{n} ) = *y*_{n}. In reality it's usually just that *f*( *x*_{n} ) ≈ *y*_{n }and that's the point: the function approximates (i.e. fits) the data.

The function *f* has some unknown coefficients, a single column matrix *c*. Some examples of function *f* are as follows:

In these examples, *c**n* refers to row *n*. But matrix *c* is also iterative. So when it is scripted *c*_{n}, it is referring to iteration *n*. So *cm*_{n} is referencing row *m* of iteration *n*. If the iteration is not specified, it's just assumed *c* is of the current iteration.

Now the mathematical representation of the algorithm:

All the values are matrices. This is an iterative function, meaning you run it for as long as needed to get the answer, or close enough to the answer. So the function says that the next guess is based on the previous coefficients. This requires an initial guess, a *c*_{0}. If the initial guess isn't very good the algorithm may oscillate or diverge (i.e. not produce anything useful). The variable *J*_{f} is the Jacobian matrix with respect to the function *f*( *x* ), and the function *r*_{f} is the residual error function (explained further down).

The Jacobian matrix has the following form:

The Jacobian matrix is a set of partial-derivatives with respect to one of the coefficients. The number of columns is the number of coefficients needing to be solved, and the number of rows is the number of data points. An easy example is the following function:

Therefore:

For the function, the Jacobian matrix becomes:

Note that *c0* and *c1* are the current guess at coefficients. The matrix will be filled in with computed values for the current guess.

The only other part that needs explaining is the function *r*. This is a function that returns the residual error. That is, how close the function, using the current coefficients, is to the actual data provided. It is simply:

Tomorrow I will demonstrate how I used the algorithm on a set of data.