Andrew Que Sites list Photos
Projects Contact
Main

April 10, 2021

Non-Recursive, fully Static AVL Tree

I started this series of articles on a journey looking to accomplish the task using a binary search tree to speed the lookup of function handlers using a string as the key. My initial implementation worked it was not optimal because the search tree was not balanced. However my method would still have worked if the tree was balanced. Let’s look one of the simplest self-balancing binary search tree: the AVL tree.

While I could go into the inner workings of an AVL tree there better resources for this. I got a fairly basic AVL tree working without too much issues. However, like most implementations the tree insertions work using recursion. For some applications recursive functions are forbidden. This is because they use stack space for each call and determining total stack usage may not be possible at compile time. Thus you cannot prove the stack will not overflow. An easy solution is to simply disallow recursive functions.

Our implementation only implements insertion and search functions, and the only place where recursion is needed is the insertion. This is because we need to re-balance each parent node after inserting the new node, and recursion solves the problem of keeping track of the path to the new node. Let’s examine the reference implementation (originally found here) of the insertion.

AVL_Node * avlInsert( AVL_Node * newNode, AVL_Node * node, AVL_NodeKey key, AVL_NodeValue value )
{
  // 1.  Perform the normal BST insertion
  if ( node == NULL )
  {
    node = newNode;
    node->key    = key;
    node->value  = value;
    node->left   = NULL;
    node->right  = NULL;
    node->height = 1;  // new node is initially added at leaf
  }
  else
  {
    int compare = strcmp( key, node->key );

    if ( compare < 0 )
      node->left = avlInsert( newNode, node->left, key, value );
    else
    if ( compare > 0 )
      node->right = avlInsert( newNode, node->right, key, value );

    // 2. Update height of this ancestor node
    node->height = 1 + max( height( node->left ), height( node->right ) );

    // 3. Get the balance factor of this ancestor
    //    node to check whether this node became
    //    unbalanced
    int balance = getBalance( node );

    // If this node becomes unbalanced, then
    // there are 4 cases

    if ( balance > 1  )
    {
      int compare = strcmp( key, node->left->key );

      // Left Left Case
      if ( compare < 0 )
        node = rightRotate( node );
      else
      // Left Right Case
      {
        node->left = leftRotate( node->left );
        node = rightRotate( node );
      }
    }
    else
    if ( balance < -1 )
    {
      int compare = strcmp( key, node->right->key );

      // Right Right Case
      if ( compare > 0 )
        node = leftRotate( node );
      else
      // Right Left Case
      {
        node->right = rightRotate( node->right );
        node = leftRotate( node );
      }
    }
  }

  return node;
}

The rebalance task can be accomplish without recursion using a stack. Let’s start with the populating the stack during the search for the insert location. This is a fairly textbook example of traversing a binary tree.

bool avlInsert
(
  AVL_Node * newNode,
  AVL_Node ** tree,
  AVL_NodeKey key,
  AVL_NodeValue value
)
{
  bool isError = false;

  // Start at the base of the tree.
  AVL_Node ** nodePointer = tree;

  // Traverse tree and find location to insert new node.
  // Keeps a stack of the node path traversed.
  while ( ( NULL != *nodePointer )
       && ( ! isError ) )
  {
    // Alias pointer (make things easier to read).
    AVL_Node * node = *nodePointer;

    // Compare key of this node to the one being inserted.
    int compare = strcmp( key, node->key );

    // Check for duplicate key which is not allowed.
    if ( 0 == compare )
      isError = true;
    else
    // Select the right or left node depending on compare results.
    if ( compare < 0 )
      nodePointer = &node->right;
    else
      nodePointer = &node->left;
  }

  if ( ! isError )
  {
    // Setup new node.
    newNode->key    = key;
    newNode->value  = value;
    newNode->left   = NULL;
    newNode->right  = NULL;
    newNode->height = 1;

    // Add node to tree.
    *nodePointer = newNode;
  }

Note that the parameter for the base of the tree is now a double-pointer. Why will become apparent latter. The goal is simply to traverse the tree until a insert location is found. At this point, the node will be NULL meaning the end of the tree branch has been reached. This is the location the new record should be placed. The double pointer means that we have the address of the pointer to this location, and thus that address can be changed. So at the end of the loop *nodePointer will be NULL, but nodePointer is either the the left or right that points to NULL. Simply setting *nodePointer will insert the new node at this location.

So at this point, we have the new node inserted in a iterative rather than recursive manner. However, the tree may not be balanced. For that we need to add in the stack so we can traverse back up the tree after the insertion has been made.

bool avlInsert
(
  AVL_Node * newNode,
  AVL_Node ** nodeStack[],
  unsigned nodeStackLength,
  AVL_Node ** tree,
  AVL_NodeKey key,
  AVL_NodeValue value
)
{
  bool isError = false;

  // Index into `nodeStack` for storage and overflow checking.
  unsigned stackIndex = 0;

  // Start at the base of the tree.
  AVL_Node ** nodePointer = tree;

  // Traverse tree and find location to insert new node.
  // Keeps a stack of the node path traversed.
  while ( ( NULL != *nodePointer )
       && ( ! isError ) )
  {
    // Stack overflow?
    isError = ( stackIndex >= nodeStackLength );

    if ( ! isError )
    {
      // Add node to stack.
      nodeStack[ stackIndex ] = nodePointer;
      stackIndex += 1;

      // Alias pointer (make things easier to read).
      AVL_Node * node = *nodePointer;

      // Compare key of this node to the one being inserted.
      int compare = strcmp( key, node->key );

      // Check for duplicate key which is not allowed.
      if ( 0 == compare )
        isError = true;
      else
      // Select the right or left node depending on compare results.
      if ( compare < 0 )
        nodePointer = &node->right;
      else
        nodePointer = &node->left;
    }
  }

  if ( ! isError )
  {
    // Setup new node.
    newNode->key    = key;
    newNode->value  = value;
    newNode->left   = NULL;
    newNode->right  = NULL;
    newNode->height = 1;

    // Add node to tree.
    *nodePointer = newNode;
  }

  ...

Here we add a stack. There isn’t much change except that each step through the tree the node is saved. This provides the path taken from the trunk of the tree to the newly inserted branch. In the recursive version this happens because each recursive step remembers the starting node and simply unrolls as the recursive functions exit. Now for rebalancing the tree.

    // Walk back up the tree and rebalance as we go.
  while ( ( ! isError )
       && ( stackIndex > 0 ) )
  {
    // Pull node off stack.
    stackIndex -= 1;
    nodePointer = nodeStack[ stackIndex ];

    // Alias pointer (make things easier to read).
    AVL_Node * node = *nodePointer;

    // Update height of this node.
    node->height = 1 + max( height( node->left ), height( node->right ) );

    // Calculate the balance at this node.
    int balance = getBalance( node );

    // Left imbalance?
    if ( balance > 1 )
    {
      int compare = strcmp( key, node->left->key );

      // Left/left
      if ( compare > 0 )
        *nodePointer = rightRotate( node );
      else
      // Left/right
      {
        node->left = leftRotate( node->left );
        *nodePointer = rightRotate( node );
      }
    }
    else
    // Right imbalance?
    if ( balance < -1 )
    {
      int compare = strcmp( key, node->right->key );

      // Right/right
      if ( compare < 0 )
        *nodePointer = leftRotate( node );
      else
      // Right/left
      {
        node->right = rightRotate( node->right );
        *nodePointer = leftRotate( node );
      }
    }

Note that the first thing we do in the loop is pop an element off the stack. The node pointer starts at the node just inserted, but we know since it is the last node in the list that this node is balanced so it can be skipped.

We measure the balance of the node and rotate if needed, just as in the recursive version. However, we must modify the parent node with the rotated value. Since our list is a point to a node pointer, this is easy, but this is the key step in switching from a recursive to an iterative implementation.

The size of the stack is the maximum depth for any node—the maximum search depth. Balanced binary search trees all have a maximum depth of ⌈ log2 n ⌉ where n is the number total number of elements in the tree. If we are constructing a constant lookup table, this size is easy to calculate at compile time.

#include <stdint.h>

// Macro to return the ceiling of log base 2 of value.
#define log2_Ceiling( value )                               \
  ( ( (value) > ( 1U <<  0 ) ) + ( (value) > ( 1U <<  1 ) ) \
  + ( (value) > ( 1U <<  2 ) ) + ( (value) > ( 1U <<  3 ) ) \
  + ( (value) > ( 1U <<  4 ) ) + ( (value) > ( 1U <<  5 ) ) \
  + ( (value) > ( 1U <<  6 ) ) + ( (value) > ( 1U <<  7 ) ) \
  + ( (value) > ( 1U <<  8 ) ) + ( (value) > ( 1U <<  9 ) ) \
  + ( (value) > ( 1U << 10 ) ) + ( (value) > ( 1U << 11 ) ) \
  + ( (value) > ( 1U << 12 ) ) + ( (value) > ( 1U << 13 ) ) \
  + ( (value) > ( 1U << 14 ) ) + ( (value) > ( 1U << 15 ) ) )

Here we have a macro that does the ceiling of the log base 2 for numbers up to 32-bits. It is a brute-force method of calculating a value but one that can be done at compile time. This macro will handle up to 65535 records. The should be more than sufficient for a command lookup table, but the pattern should be fairly obvious and could easily be expanded to 32 or even 64 bits.

Using the macro to define the stack size is simple.

enum { ITEMS      = 50 };
enum { ITEM_DEPTH = log2_Ceiling( ITEMS ) };

static AVL_Node nodes[ ITEMS ];
static AVL_Node ** nodeStack[ ITEM_DEPTH ];

Here the node storage is define as well as the stack needed for insertion.

The downside of using a stack is that if we are also using static allocation this is wasted memory after the tree has been created. Since lookup tables are generally created when the system starts one could recycle other memory. For a communications stack the send/receive buffers can be used because before the lookup table is initialized these buffers are not yet in use. This is one option for a small micro controller.

Overall there are not a lot of benefits to using an AVL tree for a static lookup table. The lookup speed is comparable to a binary search. As outlined already using an iterative heapsort or using a compile-time sorting is a better alternative. However the implementation of a non-recursive AVL tree might have other uses and worth sharing.

April 09, 2021

Compile-time Sorted Lookup Table

So far the last few articles of this series have talked about sorting a lookup table at run time to enable the use of a binary search to speed up string lookups. Sorting the constant list at run time is a bit inefficient and sorting the list at compile time would eliminate the sorting setup time as well as any additional memory. It is actually fairly simple to do if using a make based project build. In this article we will examine the details of using a Makefile to build the lookup table as part of the build process.

There are a number of ways this can be done the method explored here is to have the lookup table declared as a string table. The file in which it is declared compiles in one of two ways. When a compiler define is set the file is designed to run on the build machine. It will sort the lookup table and print the results. When run without the define it will include a file that is the output of the previous system and use that lookup table.

#ifndef LOOKUP_LIST
  #include <string.h>
  #include "lookuptable.h"

  // All command handler files.
  #include "a.h"
  #include "b.h"
  #include "c.h"
  #include "d.h"
  #include "e.h"

#endif

// This section only compiles if generating the sorted lookup table.
// New command handlers should be added here.
#ifdef LOOKUP_LIST

typedef struct
{
  char const * command;
  char const * handler;
} CommandHandler;

// Table of command names and their command handlers.
// Add new commands here.
// Note: Declare command handlers as a string of their function name.
static CommandHandler const COMMAND_HANDLERS[] =
{
  // A functions.
  { "directories",     "directoriesHandler"     },
  { "cairo",           "cairoHandler"           },
  { "melee",           "meleeHandler"           },
  { "cresting",        "crestingHandler"        },
  { "oppress",         "oppressHandler"         },
  { "polls",           "pollsHandler"           },
  { "modifier",        "modifierHandler"        },
  { "suppositories",   "suppositoriesHandler"   },
  { "domestically",    "domesticallyHandler"    },

  // B functions.
  { "drainage",        "drainageHandler"        },
  { "trained",         "trainedHandler"         },
  { "illegality",      "illegalityHandler"      },
  { "prohibitive",     "prohibitiveHandler"     },
  { "bowed",           "bowedHandler"           },
  { "knolls",          "knollsHandler"          },
  { "thorn",           "thornHandler"           },
  { "loftiness",       "loftinessHandler"       },
  { "blackouts",       "blackoutsHandler"       },
  { "blacked",         "blackedHandler"         },

  // C functions.
  { "reciprocally",    "reciprocallyHandler"    },
  { "hinterlands",     "hinterlandsHandler"     },
  { "lolling",         "lollingHandler"         },
  { "griping",         "gripingHandler"         },
  { "castigate",       "castigateHandler"       },
  { "chucked",         "chuckedHandler"         },
  { "summarily",       "summarilyHandler"       },
  { "ligands",         "ligandsHandler"         },
  { "scolded",         "scoldedHandler"         },
  { "seisin",          "seisinHandler"          },
  { "disaffiliated",   "disaffiliatedHandler"   },
  { "spoilage",        "spoilageHandler"        },
  { "statures",        "staturesHandler"        },

  // D functions.
  { "postmark",        "postmarkHandler"        },
  { "approximation",   "approximationHandler"   },
  { "arranged",        "arrangedHandler"        },
  { "freeforall",      "freeforallHandler"      },
  { "owning",          "owningHandler"          },
  { "materialists",    "materialistsHandler"    },
  { "economically",    "economicallyHandler"    },
  { "fairs",           "fairsHandler"           },
  { "whitely",         "whitelyHandler"         },
  { "velodrome",       "velodromeHandler"       },
  { "nominates",       "nominatesHandler"       },
  { "wallow",          "wallowHandler"          },
  { "banishment",      "banishmentHandler"      },
  { "openheart",       "openheartHandler"       },
  { "corresponding",   "correspondingHandler"   },
  { "enthusiasts",     "enthusiastsHandler"     },
  { "sjambok",         "sjambokHandler"         },
  { "punctilious",     "punctiliousHandler"     },

  // E functions.
  { "que",             "queHandler"             },
};

// Don't compile this unless we are building the lookup table sort executable.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { LIST_SIZE = sizeof( COMMAND_HANDLERS ) / sizeof( COMMAND_HANDLERS[ 0 ] ) };

//-----------------------------------------------------------------------------
// Uses:
//   Compare function used by quick sort.
// Input:
//   valueA - Pointer to unsigned integer that is the index into `COMMAND_HANDLERS`.
//   valueB - Pointer to unsigned integer that is the index into `COMMAND_HANDLERS`.
// Output:
//   0 if two values are identical, values less than 0 if first value is less
//   than second, and greater than 0 if first value is greater than second.
//-----------------------------------------------------------------------------
static int sortCompare( void const * valueA, void const * valueB )
{
  unsigned indexA = *(unsigned*)valueA;
  unsigned indexB = *(unsigned*)valueB;

  CommandHandler const * keyPairA = &COMMAND_HANDLERS[ indexA ];
  CommandHandler const * keyPairB = &COMMAND_HANDLERS[ indexB ];

  return strcmp( keyPairA->command, keyPairB->command );
}

//-----------------------------------------------------------------------------
// Build and print the sorted lookup table.
//-----------------------------------------------------------------------------
int main()
{
  static unsigned sorting[ LIST_SIZE ];

  // Start with a 1:1 sorting.
  for ( int index = 0; index < LIST_SIZE; index += 1 )
    sorting[ index ] = index;

  // Sort the indexes in `sorting`.
  qsort( sorting, LIST_SIZE, sizeof( sorting[ 0 ] ), sortCompare );

  // Print the header on top the file.
  printf( "  // Sorted command list handler lookup table.\n" );
  printf( "  // This file auto generated by '" __FILE__ "' at " __TIME__ " on " __DATE__ ".\n" );
  printf( "  //\n" );
  printf( "  // Edits to this file will be overwritten during the build process.\n" );
  printf( "  // IMPORTANT: This list must be sorted for the binary search to function.\n" );
  printf( "  //\n" );

  // Print the sorted list.
  for ( int index = 0; index < LIST_SIZE; index += 1 )
  {
    unsigned sortedIndex = sorting[ index ];
    CommandHandler const * keyPair = &COMMAND_HANDLERS[ sortedIndex ];

    // Quote command value.
    enum { STRING_BUFFER_SIZE = 30 };
    char stringBuffer[ STRING_BUFFER_SIZE ];
    snprintf( stringBuffer, STRING_BUFFER_SIZE, "\"%s\"", keyPair->command );

    // Print output.
    printf( "  { %-30s, %-40s },\n", stringBuffer, keyPair->handler );
  }

  return 0;
}

#else // LOOKUP_LIST

// Alphabetically sorted table of all commands.
// The contents of this table are imported from a sorted that is generated
// during the build process.
static CommandHandlerEntry const HANDLERS[] =
{
  #include "sortedTable.inc"
};

// Calculated number of commands.
enum { NUMBER_OF_HANDLERS = sizeof( HANDLERS ) / sizeof( HANDLERS[ 0 ] ) };

//-----------------------------------------------------------------------------
// Uses:
//   Look up a command and find the function to handle it.
// Input:
//   command - Command to lookup.
// Output:
//   Function for handling command.  NULL if command was not found.
//-----------------------------------------------------------------------------
CommandHandler findHandler( char const * command )
{
  CommandHandler result = NULL;

  // Bounds start by encompassing all handlers.
  int left = 0;
  int right = NUMBER_OF_HANDLERS - 1;

  // Use a binary search to find command handler.
  // Loop until we either what we are looking for, or we've searched all
  // options.
  while ( ( left <= right )
       && ( NULL == result ) )
  {
    // Find middle of selected area.
    int index = ( left + right ) / 2;

    int compare = strcmp( command, HANDLERS[ index ].command );

    // Did we find it?
    if ( 0 == compare )
      result = HANDLERS[ index ].handler;
    else
    // Select either right or left segment for next check.
    if ( compare > 0 )
      left = index + 1;
    else
      right = index - 1;
  }

  return result;
}

#endif // ! LOOKUP_LIST

Here we see an example of a lookup table. It starts by including all of the files that declare the various command handlers. Note how these are wrapped in a #if section. When building the lookup table these files are not needed—only when actually declaring the lookup table.

The next section is the lookup table itself. Note that although the lookup table is actually a string and a function, the deceleration is done with two strings. The sort program needs strings, but will output the command handler without quotes in the built output.

Following the lookup table is the program that can sort the list just above it. This program uses the standard library’s build-in qsort function for doing the sorting. The sorted output is printed to standard out and formatted to be the body of an array of command handlers. That is the end of all the code that runs at compile time.

The last section starts with declaring the actual lookup table used by the program. It includes a file in the middle of the HANDLERS table. This works because in C, a #include is like any other pre-processor macro. All it does is substitute the contents of the specified file at that location. In this case it is simply importing all the values of the lookup table from what is presumably the output of the previous section of the file.

Lastly is a function to actually use the lookup table. It is a binary search of the sorted array to find and return the handler for the given command just like we’ve outlines in previous articles. That’s it for the lookup table itself.

The other part of the setup is the Makefile.

PROJECT_NAME = lookup

# Location to store object files.
OBJECT_DIRECTORY ?= /tmp/$(PROJECT_NAME)

# Lookup table files.
LOOKUP_FILE := lookupTable.c    # <- File to create lookup table.
LOOKUP_OUT  := sortedTable.inc  # <- Sorted lookup table output file.

# Compiler flags.
# Turn on just about every warning and treat them all as errors.
# Enforce strict C99 compliance.
# Level 2 optimization (some warnings only happen when inline functions are
# enabled).
CFLAGS =                                    \
  -Wall                                     \
  -Werror                                   \
  -Wextra                                   \
  -Werror-implicit-function-declaration     \
  -Wpointer-arith                           \
  -std=c99                                  \
  -pedantic                                 \
  -O2

# Code files and object file lists.
DIRECTORIES := $(shell find . -type d -not -path '*/\.*' )
C_FILES     := $(foreach dir,$(DIRECTORIES),$(wildcard $(dir)/*.c))
H_FILES     := $(foreach dir,$(DIRECTORIES),$(wildcard $(dir)/*.h))
OBJECTS     := $(patsubst %.c,$(OBJECT_DIRECTORY)/%.o,$(C_FILES))
EXECUTABLE  := $(PROJECT_NAME)
INCLUDES    := $(foreach dir,$(DIRECTORIES),-I$(dir))
OBJECT_DIRECTORIES := $(foreach dir,$(DIRECTORIES),$(OBJECT_DIRECTORY)/$(patsubst ./%,%,$(dir)))

# Compiler for platform code.
GCC_PATH =
PREFIX = # arm-none-eabi- for ARM-based processors.
CC = $(GCC_PATH)$(PREFIX)gcc
LD = $(GCC_PATH)$(PREFIX)gcc

# Local compiler.
LCC = gcc

# Default build type (everything).
default: $(EXECUTABLE)

# Phony targets.
.PHONY : all clean list

# Dependencies
-include $(C_FILES:%.c=$(OBJECT_DIRECTORY)/%.d)

#-----------------------------------------------------------------------------
#	Define build targets.
#-----------------------------------------------------------------------------

# Object directories.
$(OBJECT_DIRECTORY):
	@echo "Creating object directory: $(OBJECT_DIRECTORY)"
	@mkdir -p $(OBJECT_DIRECTORIES) $(ARTICACTS_DIRECTORY)

# Sorted lookup table.
$(LOOKUP_OUT): $(LOOKUP_FILE) | $(OBJECT_DIRECTORY)
	@echo "Lookup table: $(LOOKUP_OUT)"
	@$(LCC) -DLOOKUP_LIST $(CFLAGS) $(LOOKUP_FILE) -o $(OBJECT_DIRECTORY)/lookupList
	@$(OBJECT_DIRECTORY)/lookupList > $(LOOKUP_OUT)

# Compile C files.
$(OBJECT_DIRECTORY)/%.o: %.c | $(OBJECT_DIRECTORY) $(LOOKUP_OUT)
	@echo "Compiling: $<"
	@$(CC) $(INCLUDES) -c $(CFLAGS) -MMD -o $@ $<

# Linker.
$(EXECUTABLE): $(OBJECTS)
	@echo "Linking: $@"
	@$(LD) $(LFLAGS) -o $@ $(OBJECTS) $(LIBS)

# Clean object C files.
clean:
	@echo "Cleaning project build files..."
	@rm -Rf $(EXECUTABLE) $(LOOKUP_OUT)
	@rm -Rf $(OBJECT_DIRECTORY) $(projectName).a

# List all variables.
list:
	@echo "DIRECTORIES        = $(DIRECTORIES)"
	@echo "C_FILES            = $(C_FILES)"
	@echo "H_FILES            = $(H_FILES)"
	@echo "OBJECTS            = $(OBJECTS)"
	@echo "OBJECT_DIRECTORIES = $(OBJECT_DIRECTORIES)"
	@echo "EXECUTABLE         = $(EXECUTABLE)"
	@echo "OBJECT_DIRECTORY   = $(OBJECT_DIRECTORY)"

Here I am using a universal Makefile. It will simply build every C file in the directory tree. Where it departs for a typical universal Makefile is an additional dependency. There is a rule to make a file called sortedTable.inc. To do this, it runs the local compiler to build lookupTable.c with the define LOOKUP_LIST declared. This will compile a program that will print out the sorted command handler list. After building this program is run and the output redirected to sortedTable.inc thus creating the sorted lookup.

The only other step is to apply this dependency to all the C object files. Without this the build process is setup to first create all the object directories, compile all C files to objects, and then link all objects. The new process now creates all object directories, creates the sorted lookup table, compiles and links. Dependencies still work. A change made to the lookup table will result in a new sorted lookup table list and recompile of lookupTable.c. A change to one of the function handler C files will only recompile that handler. A change to a handler header file will cause lookupTable.c to recompile but not regenerate the lookup table.

This concept can be used for non-make based projects as well. Any compile system with a pre-build command can be used to build the sorted lookup table. Rather than compile and running a local program the Makefile could use a scripting language like Python or even shell utilities to build the sorted list. This would likely require some restructuring of how the initial lookup table is initially declared but is otherwise follows the same setup.

For very small micro processors this method might be the best. There is no sorting overhead on the target processor at all and the binary search adds a negligible amount of code to the lookup process while giving it a lookup speed increase that improves the larger the lookup table.

April 08, 2021

Non-Recursive Sorting for Command Lookup

In the last article in this series we showed how a command lookup table could be implemented be indirectly sorting an index table using the quick sort function of the standard library. There are applications where using the standard library isn’t allowed (such as with medical and avionics devices). In addition, recursive functions are also often not allowed because it can be difficult to ensure there is enough stack space.

The solution is to use a non-recursive sort function. When it comes to general purpose sorting almost nothing beats a quick sort. The downside of the algorithm is that it either needs to be recursive or use a stack. With purely static applications (i.e. applications that use no heap memory) the stack is wasted space after the command lookup table is sorted. It is possible the stack could use memory that is recycled latter on, such as the receive buffers for the communication system the lookup table is going to use after it has been initialized. However it would be better if there were no stack at all.

While you usually cannot beat a quick sort, you can sometimes match its performance. A heapsort has similar performance, and it can be implemented iteratively without the need for a stack.

#include <string.h>

typedef struct
{
  char const * key;
  int value;
} TableEntry;

static TableEntry const TABLE[] =
{
  { "which",   415 }{ "word",    466 }{ "thing",   910 },
  { "great",   3   }{ "should",  861 }{ "write",   257 },
  { "some",    811 }{ "ask",     520 }{ "after",   362 },
  { "every",   647 }{ "been",    883 }{ "if",      160 },
  { "any",     198 }{ "house",   916 }{ "day",     828 },
  { "way",     208 }{ "going",   205 }{ "their",   313 },
  { "from",    6   }{ "because"164 }{ "would",   863 },
  { "nice",    188 }{ "just",    691 }{ "live",    536 },
  { "first",   830 }{ "long",    141 }{ "put",     25  },
  { "were",    576 }{ "where",   440 }{ "rain",    156 },
  { "very",    260 }{ "thank",   615 }{ "could",   234 },
  { "before",  69  }{ "about",   206 }{ "hers",    177 },
  { "walk",    721 }{ "right",   80  }{ "your",    162 },
  { "by",      62  }{ "or",      430 }{ "jump",    150 },
  { "other",   372 }{ "think",   404 }{ "yours",   961 },
  { "back",    352 }{ "kind",    667 }{ "when",    89  },
  { "work",    499 }{ "now",     360 }{ "people",  720 },
  { "these",   995 }{ "than",    358 }{ "also",    619 },
  { "learn",   638 }{ "each",    233 }{ "another"689 },
  { "how",     941 }{ "may",     215 }{ "again",   661 },
  { "only",    783 }{ "had",     646 }{ "went",    176 },
  { "want",    648 }{ "high",    322 }{ "find",    291 },
  { "then",    381 }{ "give",    524 }{ "keep",    627 },
  { "more",    848 }{ "old",     667 }{ "over",    585 },
  { "know",    982 }{ "them",    455 }{ "much",    28  },
  { "use",     599 }{ "funny",   164 }{ "many",    970 },
};

enum { TABLE_LENGTH = sizeof( TABLE ) / sizeof( TABLE[ 0 ] ) };

static size_t sorting[ TABLE_LENGTH ];

static inline void swap( size_t * valueA, size_t * valueB )
{
  size_t hold = *valueA;
  *valueA = *valueB;
  *valueB = hold;
}

static inline char const * indexed( size_t index )
{
  return TABLE[ sorting[ index ] ].key;
}

void setup( void )
{
  // Start with 1:1 sorting.
  for ( size_t index = 0; index < TABLE_LENGTH; index += 1 )
    sorting[ index ] = index;

  // Build "Max Heap" where value of each child is always smaller
  // than value of their parent.
  for ( size_t indexA = 1; indexA < TABLE_LENGTH; indexA += 1 )
  {
    // If child is bigger than parent.
    if ( strcmp( indexed( indexA ), indexed( ( indexA - 1 ) / 2 ) ) > 0 )
    {
      size_t indexB = indexA;

      // Swap child and parent until parent is smaller.
      while ( strcmp( indexed( indexB ), indexed( ( indexB - 1 ) / 2 ) ) > 0 )
      {
        swap( &sorting[ indexB ]&sorting[ ( indexB - 1 ) / 2 ] );
        indexB = ( indexB - 1 ) / 2;
      }
    }
  }

  for ( size_t indexA = ( TABLE_LENGTH - 1 ); indexA > 0; indexA -= 1 )
  {
    // Swap value of first indexed with last indexed.
    swap( &sorting[ 0 ]&sorting[ indexA ] );

    // Maintaining heap property after each swapping.
    size_t indexB = 0;
    size_t indexC;

    do
    {
      indexC = ( 2 * indexB + 1 );

      // If left child is smaller than right child point indexC variable
      // to right child.
      if ( ( indexC < ( indexA - 1 ) )
        && ( strcmp( indexed( indexC ), indexed( indexC + 1 ) ) < 0 ) )
      {
        indexC++;
      }

      // If parent is smaller than child then swapping parent with child
      // having higher value.
      if ( ( indexC < indexA )
        && ( strcmp( indexed( indexB ), indexed( indexC ) ) < 0 ) )
      {
        swap( &sorting[ indexB ]&sorting[ indexC ] );
      }

      indexB = indexC;

    } while ( indexC < indexA );
  }
}

The setup function first initializes the sorting index table and then sorts it using a heapsort. This can be a drop-in replacement for the setup that uses a quicksort.

April 07, 2021

Sorting for Binary Search Lookup Table

In the last article of this series we looked at using a binary search to increase the speed of finding a command in a lookup table by reducing the number of comparison needing to be performed. However this requires the lookup table to be sorted. If the lookup table isn’t stored sorted, this will need to be done at run time.

By far the simplest sort to implement is a selection sort and is a good place to start.

void selectionSort( char const * array[], size_t arrayLength )
{
  for ( size_t indexA = 0; indexA < arrayLength - 1; indexA += 1 )
  {
    size_t lowest = indexA;
    for ( size_t indexB = indexA + 1; indexB < arrayLength; indexB += 1 )
    {
      int compare = strcmp( array[ indexB ], array[ lowest ] );
      if ( compare < 0 )
        lowest = indexB;
    }

    char const * hold = array[ indexA ];
    array[ indexA ] = array[ lowest ];
    array[ lowest ] = hold;
  }
}

Lookup tables are generally constant and with embedded systems it is best to keep them this way. This is because there is typically more non-volatile memory (typically flash memory) than RAM and constants can be kept in program storage. Our sort algorithm can still be used if instead of sorting the actual lookup table we simply sort a set of indexes into the lookup table.

static char const * TABLE[] =
{
  "which",  "word",    "thing",   "great",  "should""write",
  "some",   "ask",     "after",   "every",  "been",   "if",
  "any",    "house",   "day",     "way",    "going",  "their",
  "from",   "because""would",   "nice",   "just",   "live",
  "first",  "long",    "put",     "were",   "where",  "rain",
  "very",   "thank",   "could",   "before""about",  "hers",
  "walk",   "right",   "your",    "by",     "or",     "jump",
  "other",  "think",   "yours",   "back",   "kind",   "when",
  "work",   "now",     "people",  "these",  "than",   "also",
  "learn",  "each",    "another""how",    "may",    "again",
  "only",   "had",     "went",    "want",   "high",   "find",
  "then",   "give",    "keep",    "more",   "old",    "over",
  "know",   "them",    "much",    "use",    "funny",  "many",
};

enum { TABLE_LENGTH = sizeof( TABLE ) / sizeof( TABLE[ 0 ] ) };

static size_t sorting[ TABLE_LENGTH ];

static void selectionSort( void )
{
  // Start with 1:1 mapping.
  for ( size_t index = 0; index < TABLE_LENGTH; index += 1 )
    sorting[ index ] = index;

  // Selection sort mapping.
  for ( size_t indexA = 0; indexA < TABLE_LENGTH - 1; indexA += 1 )
  {
    size_t lowest = indexA;
    for ( size_t indexB = indexA + 1; indexB < TABLE_LENGTH; indexB += 1 )
    {
      // Get the indexed values.
      char const * newKey    = TABLE[ sorting[ indexB ] ];
      char const * lowestKey = TABLE[ sorting[ lowest ] ];

      // Compare these values.
      int compare = strcmp( newKey, lowestKey );

      // If this value is the new lowest value.
      if ( compare < 0 )
        lowest = indexB;
    }

    size_t hold = sorting[ indexA ];
    sorting[ indexA ] = sorting[ lowest ];
    sorting[ lowest ] = hold;
  }
}

We start by creating a 1:1 index into the lookup table. Then we sort this list. When it comes time to do the binary search, it too must use the index table.

static bool find( char const * key )
{
  bool isFound = false;

  int left = 0;
  int right = TABLE_LENGTH - 1;

  while ( ( left <= right )
       && ( ! isFound ) )
  {
    int index = ( left + right ) / 2;
    int resultIndex = sorting[ index ];
    int compare = strcmp( key, TABLE[ resultIndex ] );

    if ( 0 == compare )
      isFound = true;
    else
    if ( compare > 0 )
      left = index + 1;
    else
      right = index - 1;
  }

  return isFound;
}

A select sort has fairly terrible performance at O( n2 ). A quick sort with an average performance of O( n log2 ).

void quicksort( int first, int last )
{
  if ( first < last )
  {
    int pivot = first;
    int start = first;
    int end = last;

    while ( start < end )
    {
      while ( ( strcmp( TABLE[ sorting[ start ] ], TABLE[ sorting[ pivot ] ] ) <= 0 )
           && ( start < last ) )
      {
        start += 1;
      }

      while ( strcmp( TABLE[ sorting[ end ] ], TABLE[ sorting[ pivot ] ] ) > 0 )
        end -= 1;

      if ( start < end )
      {
        size_t temp     = sorting[ start ];
        sorting[ start ] = sorting[ end ];
        sorting[ end ]   = temp;
      }
    }

    size_t temp      = sorting[ pivot ];
    sorting[ pivot ] = sorting[ end ];
    sorting[ end ]   = temp;

    quicksort( first, end - 1 );
    quicksort( end + 1, last);
  }
}

There are typically so few elements in a function lookup table it is hardly worth taking the effort to write a quick sort function. However, quick sort is actually built into the C standard library. It requires writing a compare function. Since it already exists, may as well use it.

static int tableEntryCompare( void const * a, void const * b )
{
  TableEntry const * valueA = &TABLE[ *(size_t const *)];
  TableEntry const * valueB = &TABLE[ *(size_t const *)];

  return strcmp( valueA->key, valueB->key );
}

void setup( void)
{
  for ( size_t index = 0; index < TABLE_LENGTH; index += 1 )
    sorting[ index ] = index;

  qsort( sorting, TABLE_LENGTH, sizeof( sorting[ 0 ] ), tableEntryCompare );
}

The standard library actually has a binary search as well so we can do the entire lookup using this functionality.

static int searchCompare( void const * a, void const * b )
{
  char const * key = (char const*)a;
  size_t indexB    = *(size_t const *)b;

  char const * valueB = TABLE[ indexB ];

  return strcmp( key, valueB );
}

static bool find( char const * key )
{
  void * result = bsearch( key, sorting, TABLE_LENGTH, sizeof( sorting[ 0 ] ), searchCompare );

  return ( NULL != result );
}

Unfortunately the way the binary search is implemented it actually takes more lines of code (when cleaned up and fully commented) to use it than to implement it ourselves. So it is easier to simply write out the binary search in the lookup function.

We will finish off with a complete command handler using a binary search, indirect sorting and the standard library quick sort.

This implementation needs to be broken into several files because a demonstration with actual command handlers requires those handlers to be actual functions. However the use of such a system comes down to using the results of the lookup function.

// Get the command handler function.
CommandHandler handler = findHandler( command );
isError = ( NULL == handler );

if ( ! isError )
  // Run the command handler.
  handler();

This implementation is alright if one can live with the additional memory needed for the sorted index and a quick sort function from the standard library which is likely recursive.

April 06, 2021

Binary Search Lookup Table

In the last article of this series I asked the question if there was a more efficient way to implement a string lookup table for command handlers. One fairly simple method is to sort the lookup table by command and use a binary search.

If the command list is alphabetically sorted, one could start by checking the requested command against the command name midway through the list. If there isn’t a match, the command either comes before or after this point because the list is sorted. For example if the issues command is bar and the middle command is foo we know we must search before this point because b is before f. This is a classic binary search technique. Let’s look at a full example of a search.

Binary search example

Here we have a list of words that are alphabetically sorted. Shown is the search path for Train. It begins in the middle of the list which is Horse. We have two partitions, Bike to Foot and Mule to Wagon. Because the first letter of Train comes after the first letter in Horse, we move to the right partition Mule-Wagon and select the middle which is Starship. Now we have two new partitions: Mule to Rocket, and Train to Wagon. Train comes after Starship so the Tain-Wagon partition is selected with the middle being Truck. Partition again. To the left is just Train and to the right just Wagon. Train comes before Truck so we select the left partition. The middle is Train, which happens to be the item we are search to find and the search is complete. Now let us look at an implementation.

enum { TABLE_LENGTH = 15 };

static char const * const TABLE[ TABLE_LENGTH ] =
{
  "Bike",     "Boat",      "Bus",   "Car",   "Dog Sled",
  "Elephant""Foot",      "Horse""Mule",  "Plane",
  "Rocket",   "Spaceship""Train""Truck""Wagon"
};

bool find( char const * key )
{
  bool isFound = false;
  unsigned partition = ( TABLE_LENGTH + 1 ) / 2;
  unsigned index = TABLE_LENGTH / 2;

  while ( ( ! isFound )
       && ( partition > 0 ) )
  {
    partition /= 2;

    int compare = strcmp( key, TABLE[ index ] );

    if ( 0 == compare )
      isFound = true;
    else
    if ( compare > 0 )
      // Must be after this.
      index += partition;
    else
      // Must be before this.
      index -= partition;
  }

  return isFound;
}

In this example we are just looking for some value using the same word set as in the graphic above. The find function returns true if the word was found, false if not. It illustrates the basics of how the search works. To function as a lookup we need to associate data with the key term being searched for.

typedef struct
{
  char const * key;
  int value;
} TableEntry;

enum { TABLE_LENGTH = 15 };
static TableEntry const TABLE[ TABLE_LENGTH ] =
{
  { "Bike",       10 }{ "Boat",       20 }{ "Bus",        30 },
  { "Car",        40 }{ "Dog Sled",   50 }{ "Elephant",   60 },
  { "Foot",       70 }{ "Horse",      80 }{ "Mule",       90 },
  { "Plane",     110 }{ "Rocket",    120 }{ "Spaceship"130 },
  { "Train",     140 }{ "Truck",     150 }{ "Wagon",     160 }
};

int find( char const * key )
{
  int result = -1;
  unsigned partition = ( TABLE_LENGTH + 1 ) / 2;
  unsigned index = TABLE_LENGTH / 2;

  while ( ( -1 == result )
       && ( partition > 0 ) )
  {
    partition /= 2;
    int compare = strcmp( key, TABLE[ index ].key );

    if ( 0 == compare )
      result = TABLE[ index ].value;
    else
    if ( compare > 0 )
      index += partition;
    else
      index -= partition;
  }

  return result;
}

Here we have added an integer lookup. The implementation is the same no matter the data type of the value. One item to note is that we use a token return value for when the key isn’t found. In this case we use -1. If being used as a function lookup table NULL serves the same purpose. A token return value isn’t ideal for all situations. The function could return a Boolean result denoting if the find was successful or not, and a pointer to store the result passed as a parameter. This would be needed if the entire return space is valid.

typedef struct
{
  char const * key;
  uint8_t value;
} TableEntry;

enum { TABLE_LENGTH = 15 };
static TableEntry const TABLE[ TABLE_LENGTH ] =
{
  { "Bike",      0x00 }{ "Boat",      0x12 }{ "Bus",       0x24 },
  { "Car",       0x36 }{ "Dog Sled",  0x48 }{ "Elephant",  0x5B },
  { "Foot",      0x6D }{ "Horse",     0x7F }{ "Mule",      0x91 },
  { "Plane",     0xA3 }{ "Rocket",    0xB6 }{ "Spaceship"0xC8 },
  { "Train",     0xDA }{ "Truck",     0xEC }{ "Wagon",     0xFF }
};

bool find( char const * key, uint8_t * result )
{
  bool isFound = false;
  unsigned partition = ( TABLE_LENGTH + 1 ) / 2;
  unsigned index = TABLE_LENGTH / 2;

  while ( ( ! isFound )
       && ( partition > 0 ) )
  {
    partition /= 2;
    int compare = strcmp( key, TABLE[ index ].key );

    if ( 0 == compare )
    {
      isFound = true;
      *result = TABLE[ index ].value;
    }
    else
    if ( compare > 0 )
      index += partition;
    else
      index -= partition;
  }

  return isFound;
}

Our examples will always find a result in no more than 4 comparisons. The maximum number of comparisons done before finding a result is always ⌈ log2 n ⌉ where n is the number of entries in the table. For our example n is 15, so ⌈ log2 15 ⌉ = ⌈ 3.907⋯ ⌉ = 4.

The downside of a binary search is that it requires the data to be sorted. If you are able to specify the lookup table in sorted order this method is easy. If not a sort will be required and that will be the topic for the next article.

April 05, 2021

Basic String Command Handler Lookup Table

In the previous article of this series I wrote about how a lookup table can be used by serial protocols to map a command to a handler function for a clean and efficient method to process commands. In this article we will look at what changes if the protocol uses text rather than binary data.

Recently I had to implement a text-based protocol based on JSON in an embedded system. There is still a command as there is with the binary protocol, but now the field is a text string rather than an integer number. I can still use a lookup table, but it requires some changes to do so. Let us begin with the lookup table itself.

typedef void (*CommandHandler)( void );

typedef struct
{
  char const * command;
  CommandHandler handler;
} CommandEntry;

CommandHandler const COMMAND_HANDLERS[ NUMBER_OF_COMMANDS ] =
{
  { "version",     versionCommand      },
  { "errorStatus", errorStatusCommand  },
  { "getPosition", getPositionCommand  },
  { "start",       startCommand        },
  { "stop",        stopCommand         },
  { "forward",     forwardCommand      },
  { "backward",    backwardCommand     },

  { NULL, NULL },
};

The lookup table now has two elements: the command string, and the handler function. This implementation is NULL terminated. That is, the last element in the table has NULL values so the dispatch function can tell if it has reached the end of the table.

void dispatcher( char const * command )
{
  unsigned index = 0;
  bool isFound = false;

  // Find command handler.
  while ( ( NULL != COMMAND_HANDLERS[ index ].command )
       && ( ! isFound ) )
  {
    isFound = ( 0 == strcmp( command, COMMAND_HANDLERS[ index ].command ) );
    if ( ! isFound )
      index += 1;
  }

  if ( isFound )
    COMMAND_HANDLERS[ index ].handler();
}

The dispatcher now just looks through the list of command strings for a match the command is found. If the command is found, it can be run. If not, the the requested command doesn’t exist and the error can be handled accordingly.

This is a perfectly functional solution, but not very efficient. In the first article I pointed out that using a switch statement means that on average 50% of the compares can be assumed to happen for each command issued. The same is true with this form of lookup table because we have simply taken the switch statement and turned it into table form. The lookup table is functionally equivalent to:

void dispatcher( char const * command )
{
  if ( 0 == strcmp( command, "version" ) )
    versionCommand();
  else
  if ( 0 == strcmp( command, "errorStatus" ) )
    errorStatusCommand();
  else
  if ( 0 == strcmp( command, "getPosition" ) )
    getPositionCommand();
  else
  if ( 0 == strcmp( command, "start" ) )
    startCommand();
  else
  if ( 0 == strcmp( command, "stop" ) )
    stopCommand();
  else
  if ( 0 == strcmp( command, "forward" ) )
    forwardCommand();
  else
  if ( 0 == strcmp( command, "backward" ) )
    backwardCommand();
  else
  {
    // Command is not found.
  }
}

Now the complaint I had against using a switch statement again applies and the lookup process isn’t very efficient. So is there a better way?

April 04, 2021

Command Handler Lookup Table

Preamble: Before this series of articles begins it should be noted that the techniques explored here are generally not needed. The speed gains for most processors are negligible. Only specially constrained system—such as needing fast responses on very slow microprocessor—will benefit from the these techniques. However, there may be some lessons that port to other applications and make this series of exploratory articles worthwhile. My interest in writing this article set was to explore such options irrespective of how useful their real-world application might be.

As an embedded programmer I’ve written a number of serial protocols over the years. Serial communication is often used for the device to talk to a host computer or connect several devices together. The protocols consists of several characters grouped together to form a packet. These packets are most often binary data and use an natural integer number, the command, to denote the purpose of the packet transaction. This is very typical.

In software there is some method for dispatching specific commands to sections of code that handle that specific command. The most basic are simply switch statements.

typedef enum
{
  COMMAND__START,
  COMMAND__STOP,
  COMMAND__FORWARD,
  COMMAND__BACKWARD
} Command;

void handleCommand( Command command )
{
  switch ( command )
  {
    case COMMAND__START :    // ...
    case COMMAND__STOP :     // ...
    case COMMAND__FORWARD :  // ...
    case COMMAND__BACKWARD : // ...
  }
}

That becomes inefficient when there are a lot of commands because they are basically a series of if-statements. A rule of thumb one can assume that 50% of them are executed for each command, and if there are a 1000 commands that means 500 compares for every command. In addition this isn’t very modular. All the commands are handling code is in a single location. A better option is to use a lookup table that matches command numbers to a function that can handle the command.

void startCommand( void );
void stopCommand( void );
void forwardCommand( void );
void backwardCommand( void );

typedef enum
{
  COMMAND__START,
  COMMAND__STOP,
  COMMAND__FORWARD,
  COMMAND__BACKWARD,

  NUMBER_OF_COMMANDS
} Command;

typedef void (*CommandHandler)( void );

CommandHandler const COMMAND_HANDLERS[ NUMBER_OF_COMMANDS ] =
{
  startCommand,
  stopCommand,
  forwardCommand,
  backwardCommand
};

void handleCommand( Command command )
{
  COMMAND_HANDLERS[ command ]();
}

Most of the protocols I design use a lookup table. This table is defined in a C file with nothing else in it. Then a general dispatch can use this table without knowing anything about the commands being executed. This allows the protocol to remain general and only the commands themselves be application specific.

In implementation the command handler function has more parameters and a return value.

typedef bool (*CommandHandler)
(
  uint8_t const * input,
  uint16_t inputSize,
  uint8_t * output,
  uint16_t outputSize,
  uint16_t * outputBytes
);

Here is a more typical command handler function. It receives a pointer to either the entire packet, or the portion of the packet that has the command data. In addition, it receives a pointer to output data—data that is used in sending a response to the command. There is a location of where this data is to be stored, how much data can be stored there, and a pointer to set to denote how much data the handler function used. The handler itself is Boolean. I typically use a true on failure return result. This allows a Boolean flag to simply use a logical OR to run several function and at the end see if any of them failed.

bool isError = false;
isError |= function1();
isError |= function2();
isError |= function3();
isError |= function4();

if ( isError )
  // ...

Serial protocols differ depending on the application. The lookup table portion happens in the dispatcher. There can be a number of steps that take place before this such as packet integrity checking (such as verifying a CRC) as well as the maintenance of the transport layer. The dispatcher has to unpack enough of the packet to get the command number. It might also have to unpack an address to see if the packet is even directed to the current device. The command number also has to be verified.

In the example show, command number verification is as simple as checking that the number is less than the total number of commands. If so, there is a handler function in the lookup table. This isn’t always the case. Sometimes the command numbers are such there are gaps in command numbers.

typedef enum
{
  // Status commands
  COMMAND__VERSION = 0,
  COMMAND__ERROR_STATUS,
  COMMAND__GET_POSITION ,

  // Control commands
  COMMAND__START = 10,
  COMMAND__STOP,
  COMMAND__FORWARD,
  COMMAND__BACKWARD,

  NUMBER_OF_COMMANDS
} Command;

In this command numbering system, there are 7 commands, but 13 command table entries. The command table can simply use NULL to denote areas that are not valid commands. The dispatcher must then check to see if the command handler is valid before running it.

CommandHandler const COMMAND_HANDLERS[ NUMBER_OF_COMMANDS ] =
{
  versionCommand,      // 0  COMMAND__VERSION
  errorStatusCommand,  // 1  COMMAND__ERROR_STATUS
  getPositionCommand,  // 2  COMMAND__GET_POSITION
  NULL,                // 3
  NULL,                // 4
  NULL,                // 5
  NULL,                // 6
  NULL,                // 7
  NULL,                // 8
  NULL,                // 9
  startCommand,        // 10 COMMAND__START
  stopCommand,         // 11 COMMAND__STOP
  forwardCommand,      // 12 COMMAND__FORWARD
  backwardCommand      // 13 COMMAND__BACKWARD
};

void dispatcher( Command command )
{
  if ( ( command < NUMBER_OF_COMMANDS )
    && ( NULL != COMMAND_HANDLERS[ command ] ) )
  {
    COMMAND_HANDLERS[ command ]();
  }
}

I’ve worked on a number of applications needing a serial interface requiring a protocol like this and each application has unique protocol requirements. Some are single client/server configurations, some are multi-drop, some bidirectional, etc., etc. The lookup table command handler is part of all these system. What happens if the command is not a number, but instead a text string? That will be the topic of our next article.

   First outdoor ride of the year, and a beautiful day for it.  Temperatures were over 70°F (21°C).  There was a fairly steady wind from mostly from the west so I start my ride in that direction.  I did the my Martinsville-Waunakee loop which is right around 27 miles.  Typically I have to fight to 1,500 Calories out of the ride, but having taken such a break I burned 1,850.  Guess since I'm trying to burn more calories I don't feel bad about this, but it does show that I am in lower fitness.

April 02, 2021

Binary Search Adventure

Last Friday I had a dream about implementing some kind of linked-list tree to speed up string searches used in a command handler I had implemented for work. The following morning I learned the method has a name: Binary Search Tree (BST). By the end of the weekend I had written a lengthy article about implementing a binary search tree when I discovered I made a bad assumption early on.

My mistake: Thinking a generic binary search tree was self-balancing. While the tree worked as a lookup table, it didn’t offer any benefit because the way I had it implemented gave it the same properties as a linear search.

In the process of failing, I rediscovered a binary search. I dismissed using a binary search because you need to start with sorted data. I thought my binary tree was self-balancing, thus the sort would happen as part of the process of assembling the tree. I was also trying to avoid recursive functions and dynamic memory allocation. Quick sorts typically use recursive functions or need a stack that is best implemented with dynamic memory allocation.

The process wasn’t an entire waste, however. The I plan to write an article about using a binary search to improve the speed of a string-based lookup table.