Section 11. ANSI/ISO Standard C 11.1: What is the "ANSI C Standard?" A: In 1983, the American National Standards Institute (ANSI) commissioned a committee, X3J11, to standardize the C language. After a long, arduous process, including several widespread public reviews, the committee's work was finally ratified as ANS X3.159-1989 on December 14, 1989, and published in the spring of 1990. For the most part, ANSI C standardizes existing practice, with a few additions from C++ (most notably function prototypes) and support for multinational character sets (including the controversial trigraph sequences). The ANSI C standard also formalizes the C run-time library support routines. More recently, the Standard has been adopted as an international standard, ISO/IEC 9899:1990, and this ISO Standard replaces the earlier X3.159 even within the United States (where it is known as ANSI/ISO 9899-1990 [1992]). As an ISO Standard, it is subject to ongoing revision through the release of Technical Corrigenda and Normative Addenda. In 1994, Technical Corrigendum 1 (TC1) amended the Standard in about 40 places, most of them minor corrections or clarifications, and Normative Addendum 1 (NA1) added about 50 pages of new material, mostly specifying new library functions for internationalization. In 1995, TC2 added a few more minor corrections. As of this writing, a complete revision of the Standard is in its final stages. The new Standard is nicknamed "C9X" on the assumption that it will be finished by the end of 1999. (Many of this article's answers have been updated to reflect new C9X features.) The original ANSI Standard included a "Rationale," explaining many of its decisions, and discussing a number of subtle points, including several of those covered here. (The Rationale was "not part of ANSI Standard X3.159-1989, but... included for information only," and is not included with the ISO Standard. A new one is being prepared for C9X.) 11.2: How can I get a copy of the Standard? A: Copies are available in the United States from American National Standards Institute 11 W. 42nd St., 13th floor New York, NY 10036 USA (+1) 212 642 4900 and Global Engineering Documents 15 Inverness Way E Englewood, CO 80112 USA (+1) 303 397 2715 (800) 854 7179 (U.S. & Canada) In other countries, contact the appropriate national standards body, or ISO in Geneva at: ISO Sales Case Postale 56 CH-1211 Geneve 20 Switzerland (or see URL http://www.iso.ch or check the comp.std.internat FAQ list, Standards.Faq). The last time I checked, the cost was $130.00 from ANSI or $400.50 from Global. Copies of the original X3.159 (including the Rationale) may still be available at $205.00 from ANSI or $162.50 from Global. Note that ANSI derives revenues to support its operations from the sale of printed standards, so electronic copies are *not* available. In the U.S., it may be possible to get a copy of the original ANSI X3.159 (including the Rationale) as "FIPS PUB 160" from National Technical Information Service (NTIS) U.S. Department of Commerce Springfield, VA 22161 703 487 4650 The mistitled _Annotated ANSI C Standard_, with annotations by Herbert Schildt, contains most of the text of ISO 9899; it is published by Osborne/McGraw-Hill, ISBN 0-07-881952-0, and sells in the U.S. for approximately $40. It has been suggested that the price differential between this work and the official standard reflects the value of the annotations: they are plagued by numerous errors and omissions, and a few pages of the Standard itself are missing. Many people on the net recommend ignoring the annotations entirely. A review of the annotations ("annotated annotations") by Clive Feather can be found on the web at http://www.lysator.liu.se/c/schildt.html . The text of the Rationale (not the full Standard) can be obtained by anonymous ftp from ftp.uu.net (see question 18.16) in directory doc/standards/ansi/X3.159-1989, and is also available on the web at http://www.lysator.liu.se/c/rat/title.html . The Rationale has also been printed by Silicon Press, ISBN 0-929306-07-4. Public review drafts of C9X are available from ISO/IEC JTC1/SC22/WG14's web site, http://www.dkuug.dk/JTC1/SC22/WG14/ . See also question 11.2b below. 11.2b: Where can I get information about updates to the Standard? A: You can find information (including C9X drafts) at the web sites http://www.lysator.liu.se/c/index.html, http://www.dkuug.dk/JTC1/SC22/WG14/, and http://www.dmk.com/ . 11.3: My ANSI compiler complains about a mismatch when it sees extern int func(float); int func(x) float x; { ... A: You have mixed the new-style prototype declaration "extern int func(float);" with the old-style definition "int func(x) float x;". It is usually possible to mix the two styles (see question 11.4), but not in this case. Old C (and ANSI C, in the absence of prototypes, and in variable- length argument lists; see question 15.2) "widens" certain arguments when they are passed to functions. floats are promoted to double, and characters and short integers are promoted to int. (For old-style function definitions, the values are automatically converted back to the corresponding narrower types within the body of the called function, if they are declared that way there.) This problem can be fixed either by using new-style syntax consistently in the definition: int func(float x) { ... } or by changing the new-style prototype declaration to match the old-style definition: extern int func(double); (In this case, it would be clearest to change the old-style definition to use double as well, if possible.) It is arguably much safer to avoid "narrow" (char, short int, and float) function arguments and return types altogether. See also question 1.25. References: K&R1 Sec. A7.1 p. 186; K&R2 Sec. A7.3.2 p. 202; ISO Sec. 6.3.2.2, Sec. 6.5.4.3; Rationale Sec. 3.3.2.2, Sec. 3.5.4.3; H&S Sec. 9.2 pp. 265-7, Sec. 9.4 pp. 272-3. 11.4: Can you mix old-style and new-style function syntax? A: Doing so is legal, but requires a certain amount of care (see especially question 11.3). Modern practice, however, is to use the prototyped form in both declarations and definitions. (The old-style syntax is marked as obsolescent, so official support for it may be removed some day.) References: ISO Sec. 6.7.1, Sec. 6.9.5; H&S Sec. 9.2.2 pp. 265- 7, Sec. 9.2.5 pp. 269-70. 11.5: Why does the declaration extern int f(struct x *p); give me an obscure warning message about "struct x introduced in prototype scope"? A: In a quirk of C's normal block scoping rules, a structure declared (or even mentioned) for the first time within a prototype cannot be compatible with other structures declared in the same source file (it goes out of scope at the end of the prototype). To resolve the problem, precede the prototype with the vacuous- looking declaration struct x; which places an (incomplete) declaration of struct x at file scope, so that all following declarations involving struct x can at least be sure they're referring to the same struct x. References: ISO Sec. 6.1.2.1, Sec. 6.1.2.6, Sec. 6.5.2.3. 11.8: I don't understand why I can't use const values in initializers and array dimensions, as in const int n = 5; int a[n]; A: The const qualifier really means "read-only"; an object so qualified is a run-time object which cannot (normally) be assigned to. The value of a const-qualified object is therefore *not* a constant expression in the full sense of the term. (C is unlike C++ in this regard.) When you need a true compile- time constant, use a preprocessor #define (or perhaps an enum). References: ISO Sec. 6.4; H&S Secs. 7.11.2,7.11.3 pp. 226-7. 11.9: What's the difference between "const char *p" and "char * const p"? A: "const char *p" (which can also be written "char const *p") declares a pointer to a constant character (you can't change the character); "char * const p" declares a constant pointer to a (variable) character (i.e. you can't change the pointer). Read these "inside out" to understand them; see also question 1.21. References: ISO Sec. 6.5.4.1; Rationale Sec. 3.5.4.1; H&S Sec. 4.4.4 p. 81. 11.10: Why can't I pass a char ** to a function which expects a const char **? A: You can use a pointer-to-T (for any type T) where a pointer-to- const-T is expected. However, the rule (an explicit exception) which permits slight mismatches in qualified pointer types is not applied recursively, but only at the top level. You must use explicit casts (e.g. (const char **) in this case) when assigning (or passing) pointers which have qualifier mismatches at other than the first level of indirection. References: ISO Sec. 6.1.2.6, Sec. 6.3.16.1, Sec. 6.5.3; H&S Sec. 7.9.1 pp. 221-2. 11.12a: What's the correct declaration of main()? A: Either int main(), int main(void), or int main(int argc, char *argv[]) (with alternate spellings of argc and *argv[] obviously allowed). See also questions 11.12b to 11.15 below. References: ISO Sec. 5.1.2.2.1, Sec. G.5.1; H&S Sec. 20.1 p. 416; CT&P Sec. 3.10 pp. 50-51. 11.12b: Can I declare main() as void, to shut off these annoying "main returns no value" messages? A: No. main() must be declared as returning an int, and as taking either zero or two arguments, of the appropriate types. If you're calling exit() but still getting warnings, you may have to insert a redundant return statement (or use some kind of "not reached" directive, if available). Declaring a function as void does not merely shut off or rearrange warnings: it may also result in a different function call/return sequence, incompatible with what the caller (in main's case, the C run-time startup code) expects. (Note that this discussion of main() pertains only to "hosted" implementations; none of it applies to "freestanding" implementations, which may not even have main(). However, freestanding implementations are comparatively rare, and if you're using one, you probably know it. If you've never heard of the distinction, you're probably using a hosted implementation, and the above rules apply.) References: ISO Sec. 5.1.2.2.1, Sec. G.5.1; H&S Sec. 20.1 p. 416; CT&P Sec. 3.10 pp. 50-51. 11.13: But what about main's third argument, envp? A: It's a non-standard (though common) extension. If you really need to access the environment in ways beyond what the standard getenv() function provides, though, the global variable environ is probably a better avenue (though it's equally non-standard). References: ISO Sec. G.5.1; H&S Sec. 20.1 pp. 416-7. 11.14: I believe that declaring void main() can't fail, since I'm calling exit() instead of returning, and anyway my operating system ignores a program's exit/return status. A: It doesn't matter whether main() returns or not, or whether anyone looks at the status; the problem is that when main() is misdeclared, its caller (the runtime startup code) may not even be able to *call* it correctly (due to the potential clash of calling conventions; see question 11.12b). It has been reported that programs using void main() and compiled using BC++ 4.5 can crash. Some compilers (including DEC C V4.1 and gcc with certain warnings enabled) will complain about void main(). Your operating system may ignore the exit status, and void main() may work for you, but it is not portable and not correct. 11.15: The book I've been using, _C Programing for the Compleat Idiot_, always uses void main(). A: Perhaps its author counts himself among the target audience. Many books unaccountably use void main() in examples, and assert that it's correct. They're wrong. 11.16: Is exit(status) truly equivalent to returning the same status from main()? A: Yes and no. The Standard says that they are equivalent. However, a return from main() cannot be expected to work if data local to main() might be needed during cleanup; see also question 16.4. A few very old, nonconforming systems may once have had problems with one or the other form. (Finally, the two forms are obviously not equivalent in a recursive call to main().) References: K&R2 Sec. 7.6 pp. 163-4; ISO Sec. 5.1.2.2.3. 11.17: I'm trying to use the ANSI "stringizing" preprocessing operator `#' to insert the value of a symbolic constant into a message, but it keeps stringizing the macro's name rather than its value. A: You can use something like the following two-step procedure to force a macro to be expanded as well as stringized: #define Str(x) #x #define Xstr(x) Str(x) #define OP plus char *opname = Xstr(OP); This code sets opname to "plus" rather than "OP". An equivalent circumlocution is necessary with the token-pasting operator ## when the values (rather than the names) of two macros are to be concatenated. References: ISO Sec. 6.8.3.2, Sec. 6.8.3.5. 11.18: What does the message "warning: macro replacement within a string literal" mean? A: Some pre-ANSI compilers/preprocessors interpreted macro definitions like #define TRACE(var, fmt) printf("TRACE: var = fmt\n", var) such that invocations like TRACE(i, %d); were expanded as printf("TRACE: i = %d\n", i); In other words, macro parameters were expanded even inside string literals and character constants. Macro expansion is *not* defined in this way by K&R or by Standard C. When you do want to turn macro arguments into strings, you can use the new # preprocessing operator, along with string literal concatenation (another new ANSI feature): #define TRACE(var, fmt) \ printf("TRACE: " #var " = " #fmt "\n", var) See also question 11.17 above. References: H&S Sec. 3.3.8 p. 51. 11.19: I'm getting strange syntax errors inside lines I've #ifdeffed out. A: Under ANSI C, the text inside a "turned off" #if, #ifdef, or #ifndef must still consist of "valid preprocessing tokens." This means that the characters " and ' must each be paired just as in real C code, and the pairs mustn't cross line boundaries. (Note particularly that an apostrophe within a contracted word looks like the beginning of a character constant.) Therefore, natural-language comments and pseudocode should always be written between the "official" comment delimiters /* and */. (But see question 20.20, and also 10.25.) References: ISO Sec. 5.1.1.2, Sec. 6.1; H&S Sec. 3.2 p. 40. 11.20: What are #pragmas and what are they good for? A: The #pragma directive provides a single, well-defined "escape hatch" which can be used for all sorts of (nonportable) implementation-specific controls and extensions: source listing control, structure packing, warning suppression (like lint's old /* NOTREACHED */ comments), etc. References: ISO Sec. 6.8.6; H&S Sec. 3.7 p. 61. 11.21: What does "#pragma once" mean? I found it in some header files. A: It is an extension implemented by some preprocessors to help make header files idempotent; it is equivalent to the #ifndef trick mentioned in question 10.7, though less portable. 11.22: Is char a[3] = "abc"; legal? What does it mean? A: It is legal in ANSI C (and perhaps in a few pre-ANSI systems), though useful only in rare circumstances. It declares an array of size three, initialized with the three characters 'a', 'b', and 'c', *without* the usual terminating '\0' character. The array is therefore not a true C string and cannot be used with strcpy, printf %s, etc. Most of the time, you should let the compiler count the initializers when initializing arrays (in the case of the initializer "abc", of course, the computed size will be 4). References: ISO Sec. 6.5.7; H&S Sec. 4.6.4 p. 98. 11.24: Why can't I perform arithmetic on a void * pointer? A: The compiler doesn't know the size of the pointed-to objects. Before performing arithmetic, convert the pointer either to char * or to the pointer type you're trying to manipulate (but see also question 4.5). References: ISO Sec. 6.1.2.5, Sec. 6.3.6; H&S Sec. 7.6.2 p. 204. 11.25: What's the difference between memcpy() and memmove()? A: memmove() offers guaranteed behavior if the source and destination arguments overlap. memcpy() makes no such guarantee, and may therefore be more efficiently implementable. When in doubt, it's safer to use memmove(). References: K&R2 Sec. B3 p. 250; ISO Sec. 7.11.2.1, Sec. 7.11.2.2; Rationale Sec. 4.11.2; H&S Sec. 14.3 pp. 341-2; PCS Sec. 11 pp. 165-6. 11.26: What should malloc(0) do? Return a null pointer or a pointer to 0 bytes? A: The ANSI/ISO Standard says that it may do either; the behavior is implementation-defined (see question 11.33). References: ISO Sec. 7.10.3; PCS Sec. 16.1 p. 386. 11.27: Why does the ANSI Standard not guarantee more than six case- insensitive characters of external identifier significance? A: The problem is older linkers which are under the control of neither the ANSI/ISO Standard nor the C compiler developers on the systems which have them. The limitation is only that identifiers be *significant* in the first six characters, not that they be restricted to six characters in length. This limitation is marked in the Standard as "obsolescent", and will be removed in C9X. References: ISO Sec. 6.1.2, Sec. 6.9.1; Rationale Sec. 3.1.2; C9X Sec. 6.1.2; H&S Sec. 2.5 pp. 22-3. 11.29: My compiler is rejecting the simplest possible test programs, with all kinds of syntax errors. A: Perhaps it is a pre-ANSI compiler, unable to accept function prototypes and the like. See also questions 1.31, 10.9, 11.30, and 16.1b. 11.30: Why are some ANSI/ISO Standard library functions showing up as undefined, even though I've got an ANSI compiler? A: It's possible to have a compiler available which accepts ANSI syntax, but not to have ANSI-compatible header files or run-time libraries installed. (In fact, this situation is rather common when using a non-vendor-supplied compiler such as gcc.) See also questions 11.29, 13.25, and 13.26. 11.31: Does anyone have a tool for converting old-style C programs to ANSI C, or vice versa, or for automatically generating prototypes? A: Two programs, protoize and unprotoize, convert back and forth between prototyped and "old style" function definitions and declarations. (These programs do *not* handle full-blown translation between "Classic" C and ANSI C.) These programs are part of the FSF's GNU C compiler distribution; see question 18.3. The unproto program (/pub/unix/unproto5.shar.Z on ftp.win.tue.nl) is a filter which sits between the preprocessor and the next compiler pass, converting most of ANSI C to traditional C on-the-fly. The GNU GhostScript package comes with a little program called ansi2knr. Before converting ANSI C back to old-style, beware that such a conversion cannot always be made both safely and automatically. ANSI C introduces new features and complexities not found in K&R C. You'll especially need to be careful of prototyped function calls; you'll probably need to insert explicit casts. See also questions 11.3 and 11.29. Several prototype generators exist, many as modifications to lint. A program called CPROTO was posted to comp.sources.misc in March, 1992. There is another program called "cextract." Many vendors supply simple utilities like these with their compilers. See also question 18.16. (But be careful when generating prototypes for old functions with "narrow" parameters; see question 11.3.) 11.32: Why won't the Frobozz Magic C Compiler, which claims to be ANSI compliant, accept this code? I know that the code is ANSI, because gcc accepts it. A: Many compilers support a few non-Standard extensions, gcc more so than most. Are you sure that the code being rejected doesn't rely on such an extension? It is usually a bad idea to perform experiments with a particular compiler to determine properties of a language; the applicable standard may permit variations, or the compiler may be wrong. See also question 11.35. 11.33: People seem to make a point of distinguishing between implementation-defined, unspecified, and undefined behavior. What's the difference? A: Briefly: implementation-defined means that an implementation must choose some behavior and document it. Unspecified means that an implementation should choose some behavior, but need not document it. Undefined means that absolutely anything might happen. In no case does the Standard impose requirements; in the first two cases it occasionally suggests (and may require a choice from among) a small set of likely behaviors. Note that since the Standard imposes *no* requirements on the behavior of a compiler faced with an instance of undefined behavior, the compiler can do absolutely anything. In particular, there is no guarantee that the rest of the program will perform normally. It's perilous to think that you can tolerate undefined behavior in a program; see question 3.2 for a relatively simple example. If you're interested in writing portable code, you can ignore the distinctions, as you'll want to avoid code that depends on any of the three behaviors. See also questions 3.9, and 11.34. References: ISO Sec. 3.10, Sec. 3.16, Sec. 3.17; Rationale Sec. 1.6. 11.34: I'm appalled that the ANSI Standard leaves so many issues undefined. Isn't a Standard's whole job to standardize these things? A: It has always been a characteristic of C that certain constructs behaved in whatever way a particular compiler or a particular piece of hardware chose to implement them. This deliberate imprecision often allows compilers to generate more efficient code for common cases, without having to burden all programs with extra code to assure well-defined behavior of cases deemed to be less reasonable. Therefore, the Standard is simply codifying existing practice. A programming language standard can be thought of as a treaty between the language user and the compiler implementor. Parts of that treaty consist of features which the compiler implementor agrees to provide, and which the user may assume will be available. Other parts, however, consist of rules which the user agrees to follow and which the implementor may assume will be followed. As long as both sides uphold their guarantees, programs have a fighting chance of working correctly. If *either* side reneges on any of its commitments, nothing is guaranteed to work. See also question 11.35. References: Rationale Sec. 1.1. 11.35: People keep saying that the behavior of i = i++ is undefined, but I just tried it on an ANSI-conforming compiler, and got the results I expected. A: A compiler may do anything it likes when faced with undefined behavior (and, within limits, with implementation-defined and unspecified behavior), including doing what you expect. It's unwise to depend on it, though. See also questions 11.32, 11.33, and 11.34.