Over 10 years ago I was working on a DSP that did not having floating-point abilities, and I needed to preform a square root. I posted in comp.dsp to ask about an approach to this, and received a reply that implemented a very clever solution.
Here we are taking the square root of a 32-bit unsigned integer. The result will always fit into 16-bits, thus why a 16-bit result is returned. Why is due to this relationship:
In this case, n = 32.
The loop is done with a while statement, but it is like a for loop counting down by dividing in integer halfs each loop. It will always loop 32 times, once for each bit. This scales to any integer size.
Here is where the clever part happens. The loop basically asks if setting the current bit will cause the square to be greater than the answer. If not, the bit is set, otherwise the bit is left clear. Why this is fast is because (unlike the square root) computing the square is easy—just a multiply.
With the DSP I was working with multiply instructions took the same amount of time as any other instruction—one of the reasons that gave DSPs an advantage over other processors. On some systems multiplication is costly, and if you are using fixed-point math chances are you are on such a system. For this, the code can be reworked to avoid having to do the multiplication.
Here the multiplication is avoided because we accumulate the square as part of the loop, using shifts rather than multiplication. It takes a few more operations, but will likely outrun the loop above if multiplication is an expensive operation.
This system works for fixed-point values too, but with low accuracy. Fixed-point number can be represented as such:
Where x is some real number, and f is the number of shifts. Ignoring the floor function, the fixed-point square root can be simplified:
What this means to the algorithm is that the fixed-point shift has changed after the square root has been taken. For example, if a Q16:16 number went into the square root function, a Q24:8 number would come out—and the upper 16 bits all zero. To get a number with the original fixed-point format a shift to the left of half the fractional shift is required. So for the Q16:16 number, a shift to the left of 8 is required after the square root has been taken.
The requirement for this finial shift also means the fractional precision is halved. A Q16:16 fixed-point number has 1/2 16 (≈0.000015) fractional precision, but after the square root it would only have 1/2 8 (≈0.0039). That's the loss of two decimal places.
Is this still useful today? Sure. Many embedded processors do not have hardware floating-point, and software floating-point is quite slow. For code where the square root is computed often, being able to use an integer or fixed-point square root can be a great performance boost.
Whatever it was from the darkness was approaching. The lights were now out except for a room just ahead. Cobwebs and dust covered everything. Through the dirty window she could see one person alone in the room. A single florescent bulb in the fixture above him provided all the illumination. The person had his back to the window, looking down, his hands folded in his lap. She knew him, or knew of him. Never anyone she really talked to as he was not at all popular. At present, however, this person seemed her only hope.