Andrew Que Sites list Photos
Projects Contact

June 27, 2018

C compilers and circular shifts

I was evaluating some pseudorando m number generator (PRNG) and started wondering about compiler optimization. Many modern processors have a circular shift instruction, but languages such as C have no direct support for this. As a result, optimization has been designed to recognize circular shifts.

First, a circular shift:

output = ( input << shift ) | ( input >> ( sizeof( input ) * 8 - shift ) )

As noted most C compiler recognize this line of code as a circular shift and are able to implement the circular shift instruction rather than using two shifts and an OR.

I analyzed one of the smallest, fastest, and highest entropy quality PRNGs I tested was Xoroshiro128+. It is able to produce 1 TB of random data in 3 minutes (on a 3 GHz Core i7). The algorithm is quite simple:

result = state0 + state1
state1 = state0 ^ state1
state0 = ( state0 <<< 55 ) ^ state1 ^ ( state1 << 14 )
state1 = ( state1 <<< 36 )

All values are 64-bit.  The triple less than sign here means a circular shift left. The algorithm has a 128-bit state, meaning a period of 2128 (340 undecillion, or 340 trillion trillion trillion). On a native 64-bit machine the assembly is quite small.

64-bit x86 64-bit ARM
mov rax, QWORD PTR [rcx]
mov r9, QWORD PTR 8[rcx]
mov rdx, rax
mov r8, rax
add rax, r9
xor rdx, r9
ror r8, 9
mov r10, rdx
xor r8, rdx
ror rdx, 28
sal r10, 14
mov QWORD PTR 8[rcx], rdx
xor r8, r10
mov QWORD PTR [rcx], r8
mov x3, x0
ldp x2, x0, [x0]
eor x1, x2, x0
add x0, x2, x0
eor x2, x1, x2, ror 9
ror x4, x1, 28
eor x1, x2, x1, lsl 14
stp x1, x4, [x3]

Note that although the ARM assembly is less instructions, the x86 assembly can do several steps in parallel. I found this was an interesting exploration of both an algorithm and of modern C compilers.

April 28, 2018

Multiplying by Power of 10

Ran across a really clever algorithm for integer multiplication by a power of 10.

C source code:

static inline int multiplyByPowerOf10( int value, int power )
  value <<= power;
  while ( 0 != power )
    value += value << 2;
    power -= 1;

  return value;

The first thing to notice is there isn’t a single multiplication. This function is done entirely with shifts (multiplying by powers-of-two) and adds.  The function does work, but what exactly is going on?

The algorithm takes advantage of this relationship: 10n = 2n·5n. The first left shift is the multiplication by 2n and the following loop performs the multiply by 5n. The multiply by 2n is fairly obvious, but the loop is more complicated. Start with noting that 5= 4x which is (<< 2). This is done n times by the loop. Mathematically what we have is the following equivalence:

The while loop doesn't look like this as written, but keep in mind following lines are identical operations:

value += value << 2;
value += 4 * value;
value = value + 4 * value;
value = value * ( 1 + 4 );
value = value * 5;

Putting it all together the function is:

At the assembly level the function can be implemented in just a few instructions.

Intel x86 (32-bit)

  sal   eax, cl           # value <<= power
  lea   eax, [eax+eax*4]  # value = value + value * 4
  sub   ecx, 1            # power -= 1
  jne   L4

Atmel AVR (8-bit)

  mov r0,r22 # value <<= power
  rjmp 2f
  lsl r24
  rol r25
  dec r22
  brpl 1b
  # value += value << 2;
  mov r18,r24
  mov r19,r25
  lsl r18
  rol r19
  lsl r18
  rol r19
  add r24,r18
  adc r25,r19
  subi r22,lo8(-(-1))    # power -= 1
  brne .L4

ARM (32-bit)

  lsl  r0, r0, r1
  add  r0, r0, r0, lsl #2
  subs  r1, r1, #1
  bne  .L2

Atmel AVR (32-bit)

  lsl  r12, r12, r11
  sub  r11, 1
  add  r12, r12, r12 << 2
  cp.w  r11, 0
  brne .L2

The 32-bit assembly is just a few instructions (assuming power is not zero). The 8-bit Atmel assembly is larger for a couple reasons. Firstly, because this is an 8-bit processors, 16-bit integers operations require two instruction for each shift and add. Second, the AVR instruction set can only shift one bit at a time. Although the ARM is a RISC processor as and the Intel CISC, the number of ARM and x86 instructions are identical. Unlike the ARM’s subs instruction, the 32-bit AVR’s sub instruction doesn’t set flags requiring an extra compare instruction. Otherwise all three 32-bit implementations are equal.

While this technique is clever, modern CPUs have very fast multiply instructions. The ARM requires 1 to 2 clocks for a multiply. Intel’s multiplication takes 2 operations when operating on registers (the exact timing is more complex). Thus a loop simply multiplying by 10 would be faster. Still, this algorithm is interesting and was fun to investigate.

February 01, 2018

Auto-Incrementing Build Number In MakeFile

Tracing binaries back to archived sources is a necessity as a programmer. When writing software there is generally a process for releasing software where the binaries and accompanying source code are archived. Using version control it is fairly easy to tag some specific snapshot of the source code so it can be linked (or include) the resulting binary. However this formal release system doesn’t work as well during development. Releases are often lengthy processes not conducive to rapid development and debug.

One system I have been using since the early 2000s is an automated build number. Each time the source code is compiled, a build number is incremented. As an embedded developer the software usually cannot be directly identified, but since the build number is increased each compile it is generated to be unique (even if two builds are otherwise identical). This build number is accessible from the outside world and thus pinpoints exactly what software a controller is running. In addition I archived every binary ever produced so that if a problem was found on a unit under test I could duplicate the setup. That was important for a small company where software was constantly changing and tests constantly being run. Devices in the field were better controlled, but development was not.

When I started using version control more I kept the build number concept. With Subversion I was able to link in an ID related to the version in the repository. It required software first be fully checked into version control before compiling binaries but worked well enough. There was no good way to extend this functionality to Git. So I fell back on the build number concept.

Recently I discovered what I think is the best incarnation of the build number. I tend to use make files for compiling, and this section of code takes care of build numbers.

BUILD_NUMBER_FILE = buildNumber.txt


$(BUILD_NUMBER_FILE): $(cFiles) $(filter-out ./$(BUILD_NUMBER_HEADER), $(hFiles)) $(sFiles)
	@read lastNumber < $(BUILD_NUMBER_FILE);                               \
	newNumber=$$(($$lastNumber + 1));                                      \
	echo "$$newNumber" > $(BUILD_NUMBER_FILE)

# Create the build number header file.
# Increments the build number and places this in the header.
	@read buildNumber < $(BUILD_NUMBER_FILE);                                  \
	export BUILD_DATE=`stat --format=%y $(BUILD_NUMBER_FILE) | cut -c 1-23`;   \
	echo "#ifndef _BUILDNUMBER_H_"                   > $(BUILD_NUMBER_HEADER); \
	echo "#define _BUILDNUMBER_H_"                  >> $(BUILD_NUMBER_HEADER); \
	echo ""                                         >> $(BUILD_NUMBER_HEADER); \
	echo "#define BUILD_NUMBER  $$buildNumber"      >> $(BUILD_NUMBER_HEADER); \
	echo "#define BUILD_DATE    \"$$BUILD_DATE\""   >> $(BUILD_NUMBER_HEADER); \
	echo ""                                         >> $(BUILD_NUMBER_HEADER); \
	echo "#endif // _BUILDNUMBER_H_"                >> $(BUILD_NUMBER_HEADER); \
	echo ""                                         >> $(BUILD_NUMBER_HEADER);

There is a file called buildNumber.txt which contains a number that is incremented when the source code changes. There is a header file included by something in the project called buildNumber.h. This contains two defines that have the current build number, and a time stamp of when the compilation took place. The header file is auto-generated. It can be left out of the main branch of the repository, but should be included in branches for tagged releases. The build number file must be checked into the repository for both trunk and branches.

What is going on in the make file two rules. The first is to regenerate the build number text file. It depends on all the source code (less the build header). Thus, should any source file change, the build header must be recompiled. The second rule is for the build header file. It must be regenerated if the build number text file changes. So any change to source code causes a new build header file to be created. As long as the dependencies for the build header are properly handled this system will always ensure a unique build number when the software changes. However, unlike my previous system, the build number will not change if the software has not changed. This allows source code in a repository to be exactly reproduced. As long as source code was checked in, binaries can be traced back to source code and reproduced from source code.

Generally I use a universal Makefile for my projects. It simply compiles all C files and computes dependencies automatically. There was some configuration parameters such as the name of the resulting executable, but for the most part the Makefile can be dropped into any new project and with minimal change compile any project.

November 16, 2017

Impulse Noise Filtering - Slew Limit Filter

In the past few articles in this series I have demonstrated using a median filter to mitigate impulse noise. The problem with a median filter is that even using a small window size and taking advantage of the speed of a mostly sorted list entering an insertion sort, the filter is still significantly slower than a windowed average. There is another option. It does not filter as nicely as a median filter but does help to dampen impulse noise.

The slew rate is a measure of how quickly voltage changes over time. A windowed average acts as limiter on slew rate but usually not enough to attenuate impulse noise. However one can directly limit the slew rate of a signal. The equation is quite simple:

For each sample of the input (an) the filter output (fn) has its change (Δn) limited to the last to some maximum (smax). If the magnitude of change is less than the maximum, the filter output is the same as the input. If the magnitude exceeds the change the filtered output is modified by the maximum change but no further.

Following the examples of previous article on this subject, here is an example of the filter output:

There is some distortion on the signal caused by the impulse noise but the filtered output is fairly effective at eliminating the large spikes. When used in combination with a first-order low-pass filter the input signal is fairly clear.

The reason this kind of filter is attractive is because of how simple it is to implement.

// Uses:
//   In-place filter of input buffer.
// Input:
//   data - Data to filtered..
//   dataSize - Number of data points.
//   limit - Maximum change allowed.
// Output:
//   Filtered output is returned in 'data'.
// Author:
//   Andrew Que <>
static inline void slewLimitFilter
  int * data,
  size_t dataSize,
  int limit
  for ( size_t index = 1; index < dataSize; index += 1 )
    int delta = data[ index ] - data[ index - 1 ];
    if ( delta > limit )
      data[ index ] = data[ index - 1 ] + limit;
    if ( delta < -limit )
      data[ index ] = data[ index - 1 ] - limit;

Just as with any filter, artifacts are introduced by the slew limit filter.

This sine wave has become a triangle wave because the slew limit is too low. This shape is the result of the filter attenuating the actual data signal. This is an other demonstration.

Here is a clean signal, but because the slew limit is too low the filter cannot keep up with the rate of change of the true signal.

The other problem with the slew limit filter is that it is not very strong.

Here the impulse noise has been doubled and the filter has produced a very messy result. So the filter is clearly not a replacement for a median filter. If you can afford the processing power, use the median filter. If you cannot, the slew limit filter may be a possibility. It depends on what the impulse noise looks like, and how much signal distortion can be tolerated.