Contributor's Corner

A collection of hopefully helpful information
Functionality Clarity Elegance

Pointers


Confustoids

Confustoid is an age-old technical term I just made up. It is a factoid that has some tendency to introduce confusion despite its factuality. In their zeal to make our programming life easier, our language writers sometimes introduce shortcuts. Things may be expressed in more than one way. Perhaps there’s a way highly amenable to the human way of thinking and another way more suited to the actual architecture of the machine we are attempting to manipulate. Sometimes these variations are great for the initiated but may be confustoids when encountered by the neophyte. Pointers, when used with arrays, strew the path with a number of these little beasties, the confustoids, but they're nothing an aspiring geek can't overcome and learn to appreciate.

Array notation is generally converted by the compiler to what amounts to pointer arithmetic. In its implementation, your compiler does things that may lead to misperceptions. The next sections deal with the subject in more detail, but consider this code,

int iArray [4] = {2, 4, 6, 8}; 
int *pArray = iArray; 
Now, memory looks like this:
0x1000: iArray [4] contents: 2 
0x1004:            contents: 4 
0x1008:            contents: 6 
0x100c:            contents: 8  
0x1010: pArray     contents: 0x1000 
Perhaps the second statement would have been clearer had it been written,
int *pArray = &iArray;
('&' in this context meaning "address of").

The confustoid occurs because of what the compiler chooses to do behind the scenes with the items, iArray and pArray. It often treats them on the surface as if they are the same thing!

Suppose you wish to access element five. You may use a statement like iArray[5] or pArray[5], each of which delivers the same value. Since iArray is the name of an array, the compiler will take the address of its beginning element and add five to it. pArray, however, is the name of a pointer. The compiler knows this. The compiler will take the value from the pointer (which is the same as the address of the beginning of the array) and add five. Confustoids incoming!!

C/C++ does not have an array object, per se. Nothing that behaves atomically, takes care of itself, changes its underwear daily, and all that stuff. It is merely a collection of contiguous elements of given type, as defined by the programmer. Because it is not a single value (though you may view it as a single object) you may not use the assignment operator ("=") to assign to it; nor may you compare it as a single value with the equality ( "==") and inequality operators (">=", et al). The individual elements must have their values assigned or compared individually.

Since an array is a distributed collection of values, how does one refer to it? What is an "array reference"? By convention, the address of the beginning of the array constitutes such a reference. Consequently, iArray (by compiler convention), &iArray (address of), and &iArray[0] (address of the first element) are equal quantities. This is a distinct confustoid.

It is particularly so in the case of the first, iArray. A name (or label, or address), otherwise unqualified, normally indicates the value stored at the address represented by that name. It is simply a convention of the language and not universally true. Since an array, as a whole, has no "single" value, the notation has been treated specially, again by convention, to indicate the address and not the value. In most cases, such a meaning would require the '&' operator. If one writes, "newVar = iVariable" (the variables being integers, say), one gets the value of iVariable transferred to newVar. If, on the other hand, iVariable is an integer array, newVar (presuming it's now an integer pointer) will receive the address of iVariable, and not the contents. Yeppers, watch your step around the confustoids.

Another confustoid rears its ugly head when we write,

char *myString = "This is the string in question"; 
This little beauty certainly appears to be assigning an entire array to a pointer, thereby violating a couple of the rules and getting away with it. Again, our language writers' tendencies to spoil us have come to the fore. This would be less confusing if we were required to write,
char anonymousMemory [32] = "This is the string in question"; 
char *myString = anonymousMemory; 
We have been delivered from this requirement at the risk of finding a confustoid in our stew. The compiler has been taught to recognize this particular construct and perform the individual tasks for us. A significant number of modern machines will write-protect the memory containing the character-storage area, also. If we attempt to modify anonymousMemory we may generate a memory-violation error.

Less damaging, perhaps, is the similar statement, but without the sizing constant:

char myString [] = "This is another string"; 
Our compilers have told us again and again that we need a constant when we declare the size, and have slapped our wrists often enough. (The fact that some newer, standards-compliant compilers will accept a variable in the size declaration is entirely ignored here.) This example is a case of the compiler making appropriate inferences regarding the required size and the contents of the array.

 How Pointers WorkWhat is not a Pointer?