November of last year I wrote about
calculating linear regression. If the function
d(x)
returns some y for a given x, then the function for linear regression finds m,b such that
m x + b ≈ d(x)
. The linear regression algorithm is a
curve fitting algorithm that works well for data sets that center around a straight line. The function
m x + b
is a first degree polynomial. It is possible to extend the polynomial to obtain the second degree polynomial, obtaining a
x2 + b x + c
. Those who have taken pre-calculus will know that this produces a U-shaped graph. If d(x) produces curves slightly, then a second order polynomial is required to reproduce the function.
There is an algebraic answer to how to obtain a,b,c such that
a x2 + b x + c ≈ d(x)
. However, this works requires some linear algebra. It may be best to start with the matrix form of linear regression.
Although at this level the pattern isn't too apparent, it becomes so when we add a third term. For each row and column, the power of x keeps increasing. This curve fitting function will work for as high an order polynomial as one desires and is the base for
polynomial interpolation which I wrote about in January.
Summations are quite easy to work with, despite looking intimidating in equations. In software, summations are just
for loops where something is accumulated every iteration of the loop. These particular summations are very simple. Most are summing the x values raised to some power. To simplify the equation, a letter will be used to represent each summation. The matrix thus becomes:
Turning the matrix form into plain arithmetic so the algorithm can be coded wasn't something I was initially prepared to do. Thanks to Cortney, I found my way to Wikipedia's article on
Cramer's Rule and
Determinants, and thus was able to find why the math I had worked.
What is needed are the determinamts, which can be multiplied out.
The finial step of Cramer's Rule is to divide the determinamts for a,b,c by D.
Now that the math is broken down, it is time for a demonstration. I used my
banded nonuniform scatter function on a second order polynomial to simulate imperfect data and plotted it with
X/Y Plot. The red dots represent the input data. The blue line represents the "best guess" curve fit. In the margins are the true values used for a,b,c and those calculated by the curve fitting algorithm.
Below is some PHP source code for calculating the quadratic curve fit function.
//---------------------------------------------------------------------------
// Find values for a quadratic curve fit
// Finds a,b,c such that ax^2 + bx + c fits closest to data in $x and $y
// Returns array with a,b,c
//---------------------------------------------------------------------------
function FindQuadraticCurve( $x , $y )
{
// Arrays must be the same size (i.e. each x must have a y and vice versa)
assert( count( $x ) == count( $y ) );
// Elements in array
$n = count( $x );
// Various summations
$p = array_sum( $x ); // ∑ x
$q = PowerSum( $x , 2 ); // ∑ x^2
$r = PowerSum( $x , 3 ); // ∑ x^3
$s = PowerSum( $x , 4 ); // ∑ x^4
$t = array_sum( $y ); // ∑ y
// ∑ xy
$u = 0;
for ( $Index = 0; $Index < $n; ++$Index )
$u += $x[ $Index ] * $y[ $Index ];
// ∑ (x^2 * y)
$v = 0;
for ( $Index = 0; $Index < $n; ++$Index )
$v += pow( $x[ $Index ] , 2 ) * $y[ $Index ];
//
// The following math comes form the linear algebra:
//
// / \ / \ / \
// | n p q | | a | | t |
// | | | | | |
// | p q r | | b | = | u |
// | | | | | |
// | q r s | | c | | v |
// \ / \ / \ /
//
// We can solve for a,b,c using Cramer's Rule. First we need
// the determinants for the matrices.
// | n p q |
// | |
// D = | p q r | = nqs + prq + qpr - nrr - pps - qqq
// | |
// | q r s |
//
$d = $n * $q * $s
+ $p * $q * $r
+ $p * $q * $r
- $q * $q * $q
- $p * $p * $s
- $n * $r * $r;
// | n p t |
// | |
// Da = | p q u | = nqv + puq + tpr - nur - ppv - tqq
// | |
// | q r v |
$da = $n * $q * $v
+ $p * $r * $t
+ $p * $q * $u
- $q * $q * $t
- $p * $p * $v
- $n * $r * $u;
// | n t q |
// | |
// Db = | p u r | = nus + trq + qpv - nrv - tps - quq
// | |
// | q v s |
//
$db = $n * $s * $u
+ $p * $q * $v
+ $q * $r * $t
- $q * $q * $u
- $p * $s * $t
- $n * $r * $v;
// | t p q |
// | |
// Dc = | u q r | = tqs + prv + qur - trr - pus - qqv
// | |
// | v r s |
$dc = $q * $s * $t
+ $q * $r * $u
+ $p * $r * $v
- $q * $q * $v
- $p * $s * $u
- $r * $r * $t;
// The result is the varable determinants divided by the determinant
$a = $da / $d;
$b = $db / $d;
$c = $dc / $d;
return array( $a , $b , $c );
}
And lastly a shout out to Erica--keep with the math!