Andrew Que Sites list Photos
Projects Contact
Main

In implementing the 8 queens puzzle I decided to crate a Javascript interface that allows one to try and place queens on the chessboard of arbitrary size.

 

By clicking on a cell a queen is placed or removed. Highlighted cells show the threats created by placing a queen at that location. Yellow cells show that the highlighted area poses a threat. Red cells show the board is in conflict. A fully green board shows a solution.

One can generally find solutions to the smaller boards pretty quickly. The larger boards get very complected but also have more possible solutions. The solve button will use the method of permutations to try and find a random solution. For the larger boards it may not find a solution and will time out after 30 seconds. However, trying again will eventually produce a result and sometimes quite fast. It really depends on how the board was shuffled.  The total number of solutions to test for any board is n! where n is the number of rows/columns.

Rows/Columns Total Permutations
4 24
5 120
6 720
7 5,040
8 40,320
9 362,880
10 3,628,800
11 39,916,800
12 479,001,600
13 6,227,020,800
14 87,178,291,200
15 1,307,674,368,000
16 20,922,789,888,000
17 355,687,428,096,000
18 6,402,373,705,728,000
19 121,645,100,408,832,000
20 2,432,902,008,176,640,000

That last number is 2.4 quintillion. For comparison a computer counting at 4 billion counts/second (4 GHz) would take 19 years just to count that high.

Download a zip file of the project: nQueens_v1.0.zip. SHA256: a0de13bc06dd9a2cb8cde3b535939825946b07e8f8164b51fb6f2af89f0c7f11.

July 02, 2017

8 Queens Puzzle Improvement, part 2

Shortly after finishing my last article about improving the 8 queens puzzle solver I thought up another improvement. This one took longer to figure out an implementation.

I had setup a brute-force method of solving where I tested every possible board layout with one queen in each row. However most of these are known to not be solutions. Every solution must have a single queen in each row, and in each column. Otherwise we know there is a row/column threat.

So how to we iterate over fields with one queen in each row and column? To start with I considered the simplest possible board.

             
             
             
             
             
             
             
             

Here we just have each queen occupy it's row/column index. We can generate each of the remaining possible fields by swapping rows. Then with every possible combination there will still be a single queen for each row and column. The number of possible combinations is a simple permutations problem. Since we have 8 rows we could swap around, the number of permutations is 8! or 40,320. That is a lot less than the 224 or 16.8 million possibilities we test in my previous methods.

The problem I had was how to iterate through the permutations. After thinking about the problem for awhile I found a parallel-permutation. The various combinations of rows was the same as if we had 8 letters to rearrange. I was pretty sure someone had come up with an efficient way to iterate through such combinations so I just had to find it. Doing a little searching I came across Heap's Algorithm. With some tweaks to the pseudocode I found in the example I had a loop that would produce each variation of the playing field that I could test. Once that work I just plugged in the test code. It produced 92 results but I had to make sure it came up with the same 92 results.

I would typically just check to see that the output from the new program matched the old, but the order in which functional boards are found is different. Since I store the board state as a 64-bit bitmap I could simply print the 64-bit value, then sort, and compare. Sure enough my new version comes up with identical results.

//=============================================================================
// Name: queens.c
// Uses: Print all possible combinations of placing 8 queens on chess board
//       without any queen threatening any other.
// Date: 2017-06-19
// Author: Andrew Que (https://www.DrQue.net/)
//
//                                Public Domain
//=============================================================================
#include <stdio.h>
#include <stdbool.h>
#include <inttypes.h>

// True to print the complete layout, false for just a hex representation.
// True for normal operation, false sometimes useful for debug.
static bool const FULL_PRINT = false;

// Rows/columns on a chess board.
enum { ROWS    = 8 };
enum { COLUMNS = 8 };

// The board state uses 3-bits/row.  This is because there are 8 columns and 2^3 = 8.
enum { COLUMN_BITS = 3 };

// Mask used to select bits for current column.
enum { COLUMN_BIT_MASK = ~-( 1 << COLUMN_BITS ) };

// Number of board permutations needing to be checked.  This is the number of
// possible combinations to arrange the board such that there is a queen in
// each row and column.
enum { PERMUTATIONS = 8*7*6*5*4*3*2 }// 8!

//-----------------------------------------------------------------------------
// Uses:
//   Print an 8x8 grid representing the queen placements for the given state.
// Input:
//   state - State of chess board.
//-----------------------------------------------------------------------------
static void printState( uint_fast32_t state )
{
  for ( unsigned row = 0; row < ROWS; row += 1 )
  {
    // Column selected in this row.
    unsigned selectedColumn = ( state & COLUMN_BIT_MASK );
    state >>= COLUMN_BITS;

    char rowText[] = "........";
    rowText[ selectedColumn ] = 'x';
    puts( rowText );
  }

  putchar( '\n' );
}

//-----------------------------------------------------------------------------
// Uses:
//   Display all the possible combinations of placing 8 queens on a chess board.
// Output:
//   Always returns 0.
//-----------------------------------------------------------------------------
int main()
{
  // Number of functional board layouts.
  unsigned goodCount = 0;

  // Initial state just has a queen in each row and each column.
  uint_fast32_t state =
       UINT32_C( 0 ) << ( 0 * COLUMN_BITS )
     | UINT32_C( 1 ) << ( 1 * COLUMN_BITS )
     | UINT32_C( 2 ) << ( 2 * COLUMN_BITS )
     | UINT32_C( 3 ) << ( 3 * COLUMN_BITS )
     | UINT32_C( 4 ) << ( 4 * COLUMN_BITS )
     | UINT32_C( 5 ) << ( 5 * COLUMN_BITS )
     | UINT32_C( 6 ) << ( 6 * COLUMN_BITS )
     | UINT32_C( 7 ) << ( 7 * COLUMN_BITS );

  uint_fast8_t permutations[ ROWS ] = { 00000000 };
  unsigned permutationIndex = 0;

  // For every possible board state that contains 8 queens...
  for ( uint_fast32_t permutation = 0; permutation < PERMUTATIONS; permutation += 1 )
  {
    // Keep track of the diagonals in use.  There can only ever
    // be one piece per row or column so no need to check that.
    uint_fast16_t leftDiagonal = 0;
    uint_fast16_t rightDiagonal = 0;

    // True if the current state has threats.  Assumes none at start.
    bool isBad = false;

    // Current row (0-7).
    unsigned row = 0;

    // Sub-state is a copy of state, but shifted as column bits are used.
    uint_fast32_t subState = state;

    // Loop through all rows, adding queens until all are placed or any of the
    // queens threaten another.
    while ( ( row < ROWS )
       && ( ! isBad ) )
    {
      // Each row must have a queen, so we just need to calculate the
      // column the queen is located from the game state.
      unsigned column = subState & COLUMN_BIT_MASK;
      subState >>= COLUMN_BITS;

      // Mask for the bits in the associated diagonal mask.
      uint_fast16_t leftDiagonalMask  = ( 1 << ( column + row ) );
      uint_fast16_t rightDiagonalMask = ( 1 << ( column + ROWS - 1 - row ) );

      // Check diagonals to see if this new piece creates a threat.
      // If so, this setup is bad.
      isBad |= ( leftDiagonal  & leftDiagonalMask  ) > 0;
      isBad |= ( rightDiagonal & rightDiagonalMask ) > 0;

      // Mark the appropriate diagonals as used.
      leftDiagonal  |= leftDiagonalMask;
      rightDiagonal |= rightDiagonalMask;

      // Next row...
      row += 1;
    }

    // If the loop finished and the state is not bad, this is one of the
    // possible layouts.
    if ( ! isBad )
    {
      // Show this layout.
      if ( FULL_PRINT )
        printState( state );
      else
        printf( "%06" PRIXFAST32 "\n", state );

      goodCount += 1;
    }

    //
    // Use Heap's algorithm to iterate through all permutations.
    //

    // Setup swap locations.
    unsigned indexA = permutationIndex * COLUMN_BITS;
    unsigned indexB = 0;
    if ( 1 == ( permutationIndex & 1 ) )
      indexB = permutations[ permutationIndex ] * COLUMN_BITS;

    // Masks for swap.
    uint_fast32_t leftDiagonalMask = COLUMN_BIT_MASK << indexA;
    uint_fast32_t rightDiagonalMask = COLUMN_BIT_MASK << indexB;

    // Swap the two bit maps.
    state = 0
      | ( state & rightDiagonalMask ) << ( indexA - indexB )
      | ( state & leftDiagonalMask  ) >> ( indexA - indexB )
      | ( state & ~( leftDiagonalMask | rightDiagonalMask ) );

    // Advance permutation index.
    permutations[ permutationIndex ] += 1;
    permutationIndex = 0;

    // Add plus carry for all permutations indices.
    while ( permutations[ permutationIndex ] >= permutationIndex )
    {
      permutations[ permutationIndex ] = 0;
      permutationIndex += 1;
    }

  } // for loop.

  // Display the total number of possible configurations.
  printf( "Total: %u\n", goodCount );

  return 0;
}

The code is very similar to the previous implementation with the new code being in how the state is generated from the permutations. The check code is smaller because we don't need to check for a column threat—each columns is guaranteed a single queen.

The speed of the new version is hardly comparable. Tests show it takes around 24 ms, but that number is low enough to have interference from the operating system scheduler. The improvement is around 9 times faster than the initial implementation, and 8 times the first improvement. Almost all of this speed is due to the fact there are more than 400 times as many checks to preform with the original implementations compared with the new.

Further improvements are possible. I could come up with more efficient ways to iterate, or even see about finding some tricks to iterate over scenarios with known diagonal threats. However I am pretty satisfied with this solution. I also learned a new algorithm as a result and who knows when need of such a thing will arise.

July 01, 2017

Integer Sizes

I wrote a simple program to display the sizes of the standard types provided by C99. Below is a table of the result of compiling and running this program on 4 different processors, two Intel x86 based processors and two ARM processors. Two are 32-bit, and two are 64-bit.

  AMD 64 i686 Cortex-A53 Cortex-A8
Char 8-bits 8-bits 8-bits 8-bits
Short 16-bits 16-bits 16-bits 16-bits
Int 32-bits 32-bits 32-bits 32-bits
Long 32-bits 64-bits 64-bits 32-bits
Long long 64-bits 64-bits 64-bits 64-bits
Float 32-bits 32-bits 32-bits 32-bits
Double 64-bits 64-bits 64-bits 64-bits
Long double 96-bits 128-bits 128-bits 64-bits
Data pointer 32-bits 64-bits 64-bits 32-bits
Function pointer 32-bits 64-bits 64-bits 32-bits
         
int_least8_t 8-bits 8-bits 8-bits 8-bits
int_least16_t 16-bits 16-bits 16-bits 16-bits
int_least32_t 32-bits 32-bits 32-bits 32-bits
int_least64_t 64-bits 64-bits 64-bits 64-bits
         
int_fast8_t 8-bits 8-bits 8-bits 8-bits
int_fast16_t 16-bits 64-bits 64-bits 32-bits
int_fast32_t 32-bits 64-bits 64-bits 32-bits
int_fast64_t 64-bits 64-bits 64-bits 64-bits
intmax_t 64-bits 64-bits 64-bits 64-bits

The results are mostly expected. The two 64-bit system treat long and long long as 64-bit values where as the 32-bit system treat long as 32-bit and long long as 64-bit. Pointers on a 64-bit system are also 64-bits wide as one would expect. Both data pointers and function pointers are identical in size. This is almost always the case with von Neumann architecture. Harvard architecture, which separates program and data memories, could have differences in data and function pointer sizes. This is something we embedded programmers pay attention to, but few others need worry about.

The fast integer section is kind of interesting. Standard 32-bit Intel, each fast integer is the it’s size. On the rest of the system, faster integers are the word size of the CPU. The latter doesn’t strike me as strange, but the fact the two x86 based CPUs differ makes me wonder. Why does a 32-bit Intel machine consider a 16-bit integer to be faster than a 32-bit integer when the CPU prefers to deal with 32-bit values? From an assembly standpoint I know the answer as x86 assembly breaks up registers by size. For example, the AX register is 16 bits, and consists of the two registers AL and AH (low and high word respectively). The is the 32-bit extension of this register, EAX and the 64-bit extension RAX. Instructions can deal with various operations depending on what registers are used, so the CPU knows how many bits are being worked on. I presume the 64-bit x86 instruction set still allows the use of the 32/16/8 bit instructions. So why is the 64-bit machine using 64-bit words for all fast types? It could be that memory is always accessed 64-bits at a time. This means reads of 16-bit values actually need to read 64-bits and then chop off the 48-bits not used. But the 32-bit processors need to do the same thing. So I’m not sure.

The only other item of notice was the size of long double. On the 32-bit x86 machine this defaults to 96-bits. Modern x86 FPUs can deal with 128-bit floating-point, but by default it looks like the size is 96-bits. Both 64-bit machines have 128-bit floating point. However, the 32-bit ARM machine doesn’t have a floating point value larger then 64-bit. I wasn’t aware that was the case.

One thing that has always bothered me in the embedded world is when programmers use the primitive integers (like int and long) when they are expecting fixed bit sizes. Yes, if you know the platform you are working with you know the primitive sizes. However, if that code ever has to move to another platform, or the CPU changes, all your assumptions are going to change. I’ve been on porting projects where this fact has required a lot of changes. Customer wants to transition from a 32-bit system to 64-bit, and now everything needs to be modified. Had they used the intxx_t and uintxx_t types, little else would have to be modified. I know, I’ve ported code written correctly with relative ease. These types have been available since the C99 standard as in the C standard formalized in 1999.

Some companies opt to alias types so they have the correct sizes. A general header file defines things like uint32 which can be defined as long (on a 16-bit system) or int (on a 32-bit system) or uint32_t on any system with a C99 compiler. That allows the use of an older compiler (C89 compliant or some K&R variant). For companies where code can live and be maintained for decades (Aerospace being an example) I think this practice is a good one. However, for new code that doesn’t need to consider really old compilers, I prefer the C99 types.

June 30, 2017

8 Queens Puzzle Improvement

In doing my write-up for the 8 queens puzzle article, I thought of a further improvement. I was creating a mask of threatened squares and checking the current layout of the chess board with it. However it occurred to me that we didn't need to check threats by row because the way the moves were being done only allowed a single piece per row. Thus there would never be common-row threat.

Then I started thinking about columns. If we simply marked a column as used, any new piece could simply check to see if that column was in use already and know the new piece was creating a threat. But what about diagonals?

For this I started with a labeling all the possible diagonals.

  0 1 2 3 4 5 6 7
0 0 1 2 3 4 5 6 7
1 1 2 3 4 5 6 7 8
2 2 3 4 5 6 7 8 9
3 3 4 5 6 7 8 9 10
4 4 5 6 7 8 9 10 11
5 5 6 7 8 9 10 11 12
6 6 7 8 9 10 11 12 13
7 7 8 9 10 11 12 13 14

 

The pattern for this diagonal is simple: the diagonal occupied by any piece is the row number plus the column number.

  0 1 2 3 4 5 6 7
0 7 8 9 10 11 12 13 14
1 6 7 8 9 10 11 12 13
2 5 6 7 8 9 10 11 12
3 4 5 6 7 8 9 10 11
4 3 4 5 6 7 8 9 10
5 2 3 4 5 6 7 8 9
6 1 2 3 4 5 6 7 8
7 0 1 2 3 4 5 6 7

The opposite diagonal is similar, but it is column plus 7 subtract row.

Both diagonals just require 14-bits, and the column check 8-bits. So I made a quick modification to my program.

//=============================================================================
// Name: queens.c
// Uses: Print all possible combinations of placing 8 queens on chess board
//       without any queen threatening any other.
// Date: 2017-06-15
// Author: Andrew Que (https://www.DrQue.net/)
//
//                                Public Domain
//=============================================================================
#include <stdio.h>
#include <stdbool.h>
#include <inttypes.h>

// Rows/columns on a chess board.
enum { ROWS    = 8 };
enum { COLUMNS = 8 };

// The board state uses 3-bits/row.  This is because there are 8 columns and 2^3 = 8.
enum { COLUMN_BITS = 3 };

// There 3-bits/row times 8 rows for a total of 2^24 possible board states.
enum { STATES = 1 << ( COLUMN_BITS * ROWS ) };

// Mask used to select bits for current column.
enum { COLUMN_BIT_MASK = ~-( 1 << COLUMN_BITS ) };

//-----------------------------------------------------------------------------
// Uses:
//   Print an 8x8 grid representing the queen placements for the given state.
// Input:
//   state - State of chess board.
//-----------------------------------------------------------------------------
static void printState( uint_fast32_t state )
{
  for ( unsigned row = 0; row < ROWS; row += 1 )
  {
    // Column selected in this row.
    unsigned selectedColumn = ( state & COLUMN_BIT_MASK );
    state >>= COLUMN_BITS;

    char rowText[] = "........";
    rowText[ selectedColumn ] = 'x';
    puts( rowText );
  }

  putchar( '\n' );
}

//-----------------------------------------------------------------------------
// Uses:
//   Display all the possible combinations of placing 8 queens on a chess board.
// Output:
//   Always returns 0.
//-----------------------------------------------------------------------------
int main()
{
  // Number of functional board layouts.
  unsigned goodCount = 0;

  // For every possible board state that contains 8 queens...
  for ( uint_fast32_t state = 0; state < STATES; state += 1 )
  {
    // Keep track of the columns and diagonals in use.  There can only ever
    // be one piece per row so no need to check that.
    uint_fast16_t columnUse = 0;
    uint_fast16_t leftDiagonal = 0;
    uint_fast16_t rightDiagonal = 0;

    // True if the current state has threats.  Assumes none at start.
    bool isBad = false;

    // Current row (0-7).
    unsigned row = 0;

    // Sub-state is a copy of state, but shifted as column bits are used.
    uint_fast32_t subState = state;

    // Loop through all rows, adding queens until all are placed or any of the
    // queens threaten another.
    while ( ( row < ROWS )
       && ( ! isBad ) )
    {
      // Each row must have a queen, so we just need to calculate the
      // column the queen is located from the game state.
      unsigned column = subState & COLUMN_BIT_MASK;
      subState >>= COLUMN_BITS;

      // Check column and diagonals to see if this new piece creates a threat.
      // If so, this setup is bad.
      uint_fast16_t mask = ( 1 << column );
      isBad |= ( columnUse & mask ) > 0;
      isBad |= ( leftDiagonal & ( mask << row ) ) > 0;
      isBad |= ( rightDiagonal & ( mask << ( 7 - row ) ) ) > 0;

      // Mark the used column and diagonals.
      columnUse |= mask;
      leftDiagonal |= ( mask << row );
      rightDiagonal |= ( mask << ( 7 - row ) );

      // Next row...
      row += 1;
    }

    // If the loop finished and the state is not bad, this is one of the
    // possible layouts.
    if ( ! isBad )
    {
      if ( false )
        printState( state );
      else
        printf( "%08" PRIXFAST32 "\n", state );
      goodCount += 1;
    }
  }

  // Display the total number of possible configurations.
  printf( "Total: %u\n", goodCount );

  return 0;
}

The results produced by the program are identical. On the same computer as the first program I went from 350 ms to around 190 ms—a 54% speed improvement. Makes we wonder: what other improvements could be made?