Section 6. Arrays and Pointers 6.1: I had the definition char a[6] in one source file, and in another I declared extern char *a. Why didn't it work? A: In one source file you defind an array of characters and in the other you declared a pointer to characters. The declaration extern char *a simply does not match the actual definition. The type pointer-to-type-T is not the same as array-of-type-T. Use extern char a[]. References: ISO Sec. 6.5.4.2; CT&P Sec. 3.3 pp. 33-4, Sec. 4.5 pp. 64-5. 6.2: But I heard that char a[] was identical to char *a. A: Not at all. (What you heard has to do with formal parameters to functions; see question 6.4.) Arrays are not pointers. The array declaration char a[6] requests that space for six characters be set aside, to be known by the name "a". That is, there is a location named "a" at which six characters can sit. The pointer declaration char *p, on the other hand, requests a place which holds a pointer, to be known by the name "p". This pointer can point almost anywhere: to any char, or to any contiguous array of chars, or nowhere (see also questions 5.1 and 1.30). As usual, a picture is worth a thousand words. The declarations char a[] = "hello"; char *p = "world"; would initialize data structures which could be represented like this: +---+---+---+---+---+---+ a: | h | e | l | l | o |\0 | +---+---+---+---+---+---+ +-----+ +---+---+---+---+---+---+ p: | *======> | w | o | r | l | d |\0 | +-----+ +---+---+---+---+---+---+ It is important to realize that a reference like x[3] generates different code depending on whether x is an array or a pointer. Given the declarations above, when the compiler sees the expression a[3], it emits code to start at the location "a", move three past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location "p", fetch the pointer value there, add three to the pointer, and finally fetch the character pointed to. In other words, a[3] is three places past (the start of) the object *named* a, while p[3] is three places past the object *pointed to* by p. In the example above, both a[3] and p[3] happen to be the character 'l', but the compiler gets there differently. (The essential difference is that the values of an array like a and a pointer like p are computed differently *whenever* they appear in expressions, whether or not they are being subscripted, as explained further in the next question.) References: K&R2 Sec. 5.5 p. 104; CT&P Sec. 4.5 pp. 64-5. 6.3: So what is meant by the "equivalence of pointers and arrays" in C? A: Much of the confusion surrounding arrays and pointers in C can be traced to a misunderstanding of this statement. Saying that arrays and pointers are "equivalent" means neither that they are identical nor even interchangeable. What it means is that array and pointer arithmetic is defined such that a pointer can be conveniently used to access an array or to simulate an array. Specifically, the cornerstone of the equivalence is this key definition: An lvalue of type array-of-T which appears in an expression decays (with three exceptions) into a pointer to its first element; the type of the resultant pointer is pointer-to-T. That is, whenever an array appears in an expression, the compiler implicitly generates a pointer to the array's first element, just as if the programmer had written &a[0]. (The exceptions are when the array is the operand of a sizeof or & operator, or is a string literal initializer for a character array.) As a consequence of this definition, the compiler doesn't apply the array subscripting operator [] that differently to arrays and pointers, after all. In an expression of the form a[i], the array decays into a pointer, following the rule above, and is then subscripted just as would be a pointer variable in the expression p[i] (although the eventual memory accesses will be different, as explained in question 6.2). If you were to assign the array's address to the pointer: p = a; then p[3] and a[3] would access the same element. See also questions 6.8 and 6.14. References: K&R1 Sec. 5.3 pp. 93-6; K&R2 Sec. 5.3 p. 99; ISO Sec. 6.2.2.1, Sec. 6.3.2.1, Sec. 6.3.6; H&S Sec. 5.4.1 p. 124. 6.4: Then why are array and pointer declarations interchangeable as function formal parameters? A: It's supposed to be a convenience. Since arrays decay immediately into pointers, an array is never actually passed to a function. Allowing pointer parameters to be declared as arrays is a simply a way of making it look as though an array was being passed, perhaps because the parameter will be used within the function as if it were an array. Specifically, any parameter declarations which "look like" arrays, e.g. void f(char a[]) { ... } are treated by the compiler as if they were pointers, since that is what the function will receive if an array is passed: void f(char *a) { ... } This conversion holds only within function formal parameter declarations, nowhere else. If the conversion bothers you, avoid it; many programmers have concluded that the confusion it causes outweighs the small advantage of having the declaration "look like" the call or the uses within the function. See also question 6.21. References: K&R1 Sec. 5.3 p. 95, Sec. A10.1 p. 205; K&R2 Sec. 5.3 p. 100, Sec. A8.6.3 p. 218, Sec. A10.1 p. 226; ISO Sec. 6.5.4.3, Sec. 6.7.1, Sec. 6.9.6; H&S Sec. 9.3 p. 271; CT&P Sec. 3.3 pp. 33-4. 6.7: How can an array be an lvalue, if you can't assign to it? A: The ANSI C Standard defines a "modifiable lvalue," which an array is not. References: ISO Sec. 6.2.2.1; Rationale Sec. 3.2.2.1; H&S Sec. 7.1 p. 179. 6.8: Practically speaking, what is the difference between arrays and pointers? A: Arrays automatically allocate space, but can't be relocated or resized. Pointers must be explicitly assigned to point to allocated space (perhaps using malloc), but can be reassigned (i.e. pointed at different objects) at will, and have many other uses besides serving as the base of blocks of memory. Due to the so-called equivalence of arrays and pointers (see question 6.3), arrays and pointers often seem interchangeable, and in particular a pointer to a block of memory assigned by malloc is frequently treated (and can be referenced using []) exactly as if it were a true array. See questions 6.14 and 6.16. (Be careful with sizeof, though.) See also questions 1.32 and 20.14. 6.9: Someone explained to me that arrays were really just constant pointers. A: This is a bit of an oversimplification. An array name is "constant" in that it cannot be assigned to, but an array is *not* a pointer, as the discussion and pictures in question 6.2 should make clear. See also questions 6.3 and 6.8. 6.11: I came across some "joke" code containing the "expression" 5["abcdef"] . How can this be legal C? A: Yes, Virginia, array subscripting is commutative in C. This curious fact follows from the pointer definition of array subscripting, namely that a[e] is identical to *((a)+(e)), for *any* two expressions a and e, as long as one of them is a pointer expression and one is integral. This unsuspected commutativity is often mentioned in C texts as if it were something to be proud of, but it finds no useful application outside of the Obfuscated C Contest (see question 20.36). References: Rationale Sec. 3.3.2.1; H&S Sec. 5.4.1 p. 124, Sec. 7.4.1 pp. 186-7. 6.12: Since array references decay into pointers, if arr is an array, what's the difference between arr and &arr? A: The type. In Standard C, &arr yields a pointer, of type pointer-to-array- of-T, to the entire array. (In pre-ANSI C, the & in &arr generally elicited a warning, and was generally ignored.) Under all C compilers, a simple reference (without an explicit &) to an array yields a pointer, of type pointer-to-T, to the array's first element. (See also questions 6.3, 6.13, and 6.18.) References: ISO Sec. 6.2.2.1, Sec. 6.3.3.2; Rationale Sec. 3.3.3.2; H&S Sec. 7.5.6 p. 198. 6.13: How do I declare a pointer to an array? A: Usually, you don't want to. When people speak casually of a pointer to an array, they usually mean a pointer to its first element. Instead of a pointer to an array, consider using a pointer to one of the array's elements. Arrays of type T decay into pointers to type T (see question 6.3), which is convenient; subscripting or incrementing the resultant pointer will access the individual members of the array. True pointers to arrays, when subscripted or incremented, step over entire arrays, and are generally useful only when operating on arrays of arrays, if at all. (See question 6.18.) If you really need to declare a pointer to an entire array, use something like "int (*ap)[N];" where N is the size of the array. (See also question 1.21.) If the size of the array is unknown, N can in principle be omitted, but the resulting type, "pointer to array of unknown size," is useless. See also question 6.12 above. References: ISO Sec. 6.2.2.1. 6.14: How can I set an array's size at run time? How can I avoid fixed-sized arrays? A: The equivalence between arrays and pointers (see question 6.3) allows a pointer to malloc'ed memory to simulate an array quite effectively. After executing #include int *dynarray; dynarray = malloc(10 * sizeof(int)); (and if the call to malloc succeeds), you can reference dynarray[i] (for i from 0 to 9) almost as if dynarray were a conventional, statically-allocated array (int a[10]). The only difference is that sizeof will not give the size of the "array". See also questions 1.31b, 6.16, and 7.7. 6.15: How can I declare local arrays of a size matching a passed-in array? A: Until recently, you couldn't. Array dimensions in C traditionally had to be compile-time constants. C9X will introduce variable-length arrays (VLA's) which will solve this problem; local arrays may have sizes set by variables or other expressions, perhaps involving function parameters. (gcc has provided parameterized arrays as an extension for some time.) If you can't use C9X or gcc, you'll have to use malloc(), and remember to call free() before the function returns. See also questions 6.14, 6.16, 6.19, 7.22, and maybe 7.32. References: ISO Sec. 6.4, Sec. 6.5.4.2; C9X Sec. 6.5.5.2. 6.16: How can I dynamically allocate a multidimensional array? A: The traditional solution is to allocate an array of pointers, and then initialize each pointer to a dynamically-allocated "row." Here is a two-dimensional example: #include int **array1 = malloc(nrows * sizeof(int *)); for(i = 0; i < nrows; i++) array1[i] = malloc(ncolumns * sizeof(int)); (In real code, of course, all of malloc's return values would be checked.) You can keep the array's contents contiguous, at the cost of making later reallocation of individual rows more difficult, with a bit of explicit pointer arithmetic: int **array2 = malloc(nrows * sizeof(int *)); array2[0] = malloc(nrows * ncolumns * sizeof(int)); for(i = 1; i < nrows; i++) array2[i] = array2[0] + i * ncolumns; In either case, the elements of the dynamic array can be accessed with normal-looking array subscripts: arrayx[i][j] (for 0 <= i < nrows and 0 <= j < ncolumns). If the double indirection implied by the above schemes is for some reason unacceptable, you can simulate a two-dimensional array with a single, dynamically-allocated one-dimensional array: int *array3 = malloc(nrows * ncolumns * sizeof(int)); However, you must now perform subscript calculations manually, accessing the i,jth element with array3[i * ncolumns + j]. (A macro could hide the explicit calculation, but invoking it would require parentheses and commas which wouldn't look exactly like multidimensional array syntax, and the macro would need access to at least one of the dimensions, as well. See also question 6.19.) Yet another option is to use pointers to arrays: int (*array4)[NCOLUMNS] = malloc(nrows * sizeof(*array4)); but the syntax starts getting horrific and at most one dimension may be specified at run time. With all of these techniques, you may of course need to remember to free the arrays (which may take several steps; see question 7.23) when they are no longer needed, and you cannot necessarily intermix dynamically-allocated arrays with conventional, statically-allocated ones (see question 6.20, and also question 6.18). Finally, in C9X you can use a variable-length array. All of these techniques can also be extended to three or more dimensions. References: C9X Sec. 6.5.5.2. 6.17: Here's a neat trick: if I write int realarray[10]; int *array = &realarray[-1]; I can treat "array" as if it were a 1-based array. A: Although this technique is attractive (and was used in old editions of the book _Numerical Recipes in C_), it is not strictly conforming to the C Standard. Pointer arithmetic is defined only as long as the pointer points within the same allocated block of memory, or to the imaginary "terminating" element one past it; otherwise, the behavior is undefined, *even if the pointer is not dereferenced*. The code above could fail if, while subtracting the offset, an illegal address were generated (perhaps because the address tried to "wrap around" past the beginning of some memory segment). References: K&R2 Sec. 5.3 p. 100, Sec. 5.4 pp. 102-3, Sec. A7.7 pp. 205-6; ISO Sec. 6.3.6; Rationale Sec. 3.2.2.3. 6.18: My compiler complained when I passed a two-dimensional array to a function expecting a pointer to a pointer. A: The rule (see question 6.3) by which arrays decay into pointers is not applied recursively. An array of arrays (i.e. a two- dimensional array in C) decays into a pointer to an array, not a pointer to a pointer. Pointers to arrays can be confusing, and must be treated carefully; see also question 6.13. If you are passing a two-dimensional array to a function: int array[NROWS][NCOLUMNS]; f(array); the function's declaration must match: void f(int a[][NCOLUMNS]) { ... } or void f(int (*ap)[NCOLUMNS]) /* ap is a pointer to an array */ { ... } In the first declaration, the compiler performs the usual implicit parameter rewriting of "array of array" to "pointer to array" (see questions 6.3 and 6.4); in the second form the pointer declaration is explicit. Since the called function does not allocate space for the array, it does not need to know the overall size, so the number of rows, NROWS, can be omitted. The width of the array is still important, so the column dimension NCOLUMNS (and, for three- or more dimensional arrays, the intervening ones) must be retained. If a function is already declared as accepting a pointer to a pointer, it is almost certainly meaningless to pass a two- dimensional array directly to it. See also questions 6.12 and 6.15. References: K&R1 Sec. 5.10 p. 110; K&R2 Sec. 5.9 p. 113; H&S Sec. 5.4.3 p. 126. 6.19: How do I write functions which accept two-dimensional arrays when the width is not known at compile time? A: It's not always easy. One way is to pass in a pointer to the [0][0] element, along with the two dimensions, and simulate array subscripting "by hand": void f2(int *aryp, int nrows, int ncolumns) { ... array[i][j] is accessed as aryp[i * ncolumns + j] ... } This function could be called with the array from question 6.18 as f2(&array[0][0], NROWS, NCOLUMNS); It must be noted, however, that a program which performs multidimensional array subscripting "by hand" in this way is not in strict conformance with the ANSI C Standard; according to an official interpretation, the behavior of accessing (&array[0][0])[x] is not defined for x >= NCOLUMNS. C9X will allow variable-length arrays, and once compilers which accept C9X's extensions become widespread, this will probably become the preferred solution. (gcc has supported variable- sized arrays for some time.) When you want to be able to use a function on multidimensional arrays of various sizes, one solution is to simulate all the arrays dynamically, as in question 6.16. See also questions 6.18, 6.20, and 6.15. References: ISO Sec. 6.3.6; C9X Sec. 6.5.5.2. 6.20: How can I use statically- and dynamically-allocated multidimensional arrays interchangeably when passing them to functions? A: There is no single perfect method. Given the declarations int array[NROWS][NCOLUMNS]; int **array1; /* ragged */ int **array2; /* contiguous */ int *array3; /* "flattened" */ int (*array4)[NCOLUMNS]; with the pointers initialized as in the code fragments in question 6.16, and functions declared as void f1a(int a[][NCOLUMNS], int nrows, int ncolumns); void f1b(int (*a)[NCOLUMNS], int nrows, int ncolumns); void f2(int *aryp, int nrows, int ncolumns); void f3(int **pp, int nrows, int ncolumns); where f1a() and f1b() accept conventional two-dimensional arrays, f2() accepts a "flattened" two-dimensional array, and f3() accepts a pointer-to-pointer, simulated array (see also questions 6.18 and 6.19), the following calls should work as expected: f1a(array, NROWS, NCOLUMNS); f1b(array, NROWS, NCOLUMNS); f1a(array4, nrows, NCOLUMNS); f1b(array4, nrows, NCOLUMNS); f2(&array[0][0], NROWS, NCOLUMNS); f2(*array, NROWS, NCOLUMNS); f2(*array2, nrows, ncolumns); f2(array3, nrows, ncolumns); f2(*array4, nrows, NCOLUMNS); f3(array1, nrows, ncolumns); f3(array2, nrows, ncolumns); The following calls would probably work on most systems, but involve questionable casts, and work only if the dynamic ncolumns matches the static NCOLUMNS: f1a((int (*)[NCOLUMNS])(*array2), nrows, ncolumns); f1a((int (*)[NCOLUMNS])(*array2), nrows, ncolumns); f1b((int (*)[NCOLUMNS])array3, nrows, ncolumns); f1b((int (*)[NCOLUMNS])array3, nrows, ncolumns); It must again be noted that passing &array[0][0] (or, equivalently, *array) to f2() is not strictly conforming; see question 6.19. If you can understand why all of the above calls work and are written as they are, and if you understand why the combinations that are not listed would not work, then you have a *very* good understanding of arrays and pointers in C. Rather than worrying about all of this, one approach to using multidimensional arrays of various sizes is to make them *all* dynamic, as in question 6.16. If there are no static multidimensional arrays -- if all arrays are allocated like array1 or array2 in question 6.16 -- then all functions can be written like f3(). 6.21: Why doesn't sizeof properly report the size of an array when the array is a parameter to a function? A: The compiler pretends that the array parameter was declared as a pointer (see question 6.4), and sizeof reports the size of the pointer. References: H&S Sec. 7.5.2 p. 195.