The other day a coworker came to me with a problem they were having. We traced the problem down to the assembly level and noticed values from an array were not being loaded into CPU registers as expected. Looking at the C code, I noticed something atypical. There was an array defined like this:
int array[ ARRAY_LENGTH ];
And in a header file, defined like this:
extern int * array;
The rational was that an array of integers was just a pointer to some integers because the following works:
int array[ ARRAY_LENGTH ];
int * pointer = array;
However, this does not imply that array is functionally identical to pointer. That is because the following is true:
array == &array;
pointer != &pointer;
Pointers reserve memory to hold the location of where they point. Arrays do not require this.
An array is just the elements of the array, one after the other:
1 |
2 |
… |
A pointer, on the other hand, is just a reference to some memory. It is (for a 32-bit system) 4 bytes of memory that hold the address to some other location.
0x12345 |
→ |
1 |
2 |
… |
For an array, the compiler knows the location of the elements at link time. For a pointer, the linker only knows there is memory that points to some other memory. Let's examine some assembly output for the following code:
extern int * array;
printf( "%i\n", array[ 0 ] );
Output (unoptimized 32-bit x86 assembly):
movl _array, %eax // Move address of 'array' into register eax.
movl (%eax), %eax // Moves the value of array[ 0 ] (in eax) into eax.
movl %eax, 4(%esp) // Put the values of array[ 0 ] in stack word 1.
movl $LC0, (%esp) // Location of the string to print in stack word 0.
call _printf // Print.
And now for the code:
extern int array[];
printf( "%i\n", array[ 0 ] );
Output:
movl _array, %eax // Moves the value of array[ 0 ] into eax.
movl %eax, 4(%esp) // Put the values of array[ 0 ] in stack word 1.
movl $LC0, (%esp) // Location of the string to print in stack word 0.
call _printf // Print.
Here, eax has its value loaded directly where with the pointer there is an intermediate step. That one extra instruction (on a different CPU architecture) was causing a world of hurt because arrays and pointers are not equivalent.
Note that the GNU C compiler will not let you to make this mistake. The following will not compile:
extern int * array;
int array[ ARRAY_LENGTH ];
This can happen, however, if each deceleration is in a separate file: one in the implementation and one in the header file. The C compiler is not able to verify at link time that global functions and variables were defined correctly by the objects that use them. But one can guard against this weakness by including the header file in the implementation file. That is, file.c must include file.h so that all the function prototypes and external variable definitions are included at compile time. If not, which was the case for this problem, it is possible to redefine externals incorrectly.