Section 20. Miscellaneous 20.1: How can I return multiple values from a function? A: Either pass pointers to several locations which the function can fill in, or have the function return a structure containing the desired values, or (in a pinch) consider global variables. See also questions 2.7, 4.8, and 7.5a. 20.3: How do I access command-line arguments? A: They are pointed to by the argv array with which main() is called. See also questions 8.2, 13.7, and 19.20. References: K&R1 Sec. 5.11 pp. 110-114; K&R2 Sec. 5.10 pp. 114- 118; ISO Sec. 5.1.2.2.1; H&S Sec. 20.1 p. 416; PCS Sec. 5.6 pp. 81-2, Sec. 11 p. 159, pp. 339-40 Appendix F; Schumacher, ed., _Software Solutions in C_ Sec. 4 pp. 75-85. 20.5: How can I write data files which can be read on other machines with different word size, byte order, or floating point formats? A: The most portable solution is to use text files (usually ASCII), written with fprintf() and read with fscanf() or the like. (Similar advice also applies to network protocols.) Be skeptical of arguments which imply that text files are too big, or that reading and writing them is too slow. Not only is their efficiency frequently acceptable in practice, but the advantages of being able to interchange them easily between machines, and manipulate them with standard tools, can be overwhelming. If you must use a binary format, you can improve portability, and perhaps take advantage of prewritten I/O libraries, by making use of standardized formats such as Sun's XDR (RFC 1014), OSI's ASN.1 (referenced in CCITT X.409 and ISO 8825 "Basic Encoding Rules"), CDF, netCDF, or HDF. See also questions 2.12 and 12.38. References: PCS Sec. 6 pp. 86, 88. 20.6: If I have a char * variable pointing to the name of a function, how can I call that function? A: The most straightforward thing to do is to maintain a correspondence table of names and function pointers: int func(), anotherfunc(); struct { char *name; int (*funcptr)(); } symtab[] = { "func", func, "anotherfunc", anotherfunc, }; Then, search the table for the name, and call via the associated function pointer. See also questions 2.15, 18.14, and 19.36. References: PCS Sec. 11 p. 168. 20.8: How can I implement sets or arrays of bits? A: Use arrays of char or int, with a few macros to access the desired bit at the proper index. Here are some simple macros to use with arrays of char: #include /* for CHAR_BIT */ #define BITMASK(b) (1 << ((b) % CHAR_BIT)) #define BITSLOT(b) ((b) / CHAR_BIT) #define BITSET(a, b) ((a)[BITSLOT(b)] |= BITMASK(b)) #define BITTEST(a, b) ((a)[BITSLOT(b)] & BITMASK(b)) (If you don't have , try using 8 for CHAR_BIT.) References: H&S Sec. 7.6.7 pp. 211-216. 20.9: How can I determine whether a machine's byte order is big-endian or little-endian? A: One way is to use a pointer: int x = 1; if(*(char *)&x == 1) printf("little-endian\n"); else printf("big-endian\n"); It's also possible to use a union. See also question 10.16. References: H&S Sec. 6.1.2 pp. 163-4. 20.10: How can I convert integers to binary or hexadecimal? A: Make sure you really know what you're asking. Integers are stored internally in binary, although for most purposes it is not incorrect to think of them as being in octal, decimal, or hexadecimal, whichever is convenient. The base in which a number is expressed matters only when that number is read in from or written out to the outside world. In source code, a non-decimal base is indicated by a leading 0 or 0x (for octal or hexadecimal, respectively). During I/O, the base of a formatted number is controlled in the printf and scanf family of functions by the choice of format specifier (%d, %o, %x, etc.) and in the strtol() and strtoul() functions by the third argument. If you need to output numeric strings in arbitrary bases, you'll have to supply your own function to do so (it will essentially be the inverse of strtol). During *binary* I/O, however, the base again becomes immaterial. For more information about "binary" I/O, see question 2.11. See also questions 8.6 and 13.1. References: ISO Secs. 7.10.1.5,7.10.1.6. 20.11: Can I use base-2 constants (something like 0b101010)? Is there a printf() format for binary? A: No, on both counts. You can convert base-2 string representations to integers with strtol(). See also question 20.10. 20.12: What is the most efficient way to count the number of bits which are set in an integer? A: Many "bit-fiddling" problems like this one can be sped up and streamlined using lookup tables (but see question 20.13 below). 20.13: What's the best way of making my program efficient? A: By picking good algorithms, implementing them carefully, and making sure that your program isn't doing any extra work. For example, the most microoptimized character-copying loop in the world will be beat by code which avoids having to copy characters at all. When worrying about efficiency, it's important to keep several things in perspective. First of all, although efficiency is an enormously popular topic, it is not always as important as people tend to think it is. Most of the code in most programs is not time-critical. When code is not time-critical, it is usually more important that it be written clearly and portably than that it be written maximally efficiently. (Remember that computers are very, very fast, and that seemingly "inefficient" code may be quite efficiently compilable, and run without apparent delay.) It is notoriously difficult to predict what the "hot spots" in a program will be. When efficiency is a concern, it is important to use profiling software to determine which parts of the program deserve attention. Often, actual computation time is swamped by peripheral tasks such as I/O and memory allocation, which can be sped up by using buffering and caching techniques. Even for code that *is* time-critical, one of the least effective optimization techniques is to fuss with the coding details. Many of the "efficient coding tricks" which are frequently suggested (e.g. substituting shift operators for multiplication by powers of two) are performed automatically by even simpleminded compilers. Heavyhanded optimization attempts can make code so bulky that performance is actually degraded, and are rarely portable (i.e. they may speed things up on one machine but slow them down on another). In any case, tweaking the coding usually results in at best linear performance improvements; the big payoffs are in better algorithms. For more discussion of efficiency tradeoffs, as well as good advice on how to improve efficiency when it is important, see chapter 7 of Kernighan and Plauger's _The Elements of Programming Style_, and Jon Bentley's _Writing Efficient Programs_. 20.14: Are pointers really faster than arrays? How much do function calls slow things down? Is ++i faster than i = i + 1? A: Precise answers to these and many similar questions depend of course on the processor and compiler in use. If you simply must know, you'll have to time test programs carefully. (Often the differences are so slight that hundreds of thousands of iterations are required even to see them. Check the compiler's assembly language output, if available, to see if two purported alternatives aren't compiled identically.) It is "usually" faster to march through large arrays with pointers rather than array subscripts, but for some processors the reverse is true. Function calls, though obviously incrementally slower than in- line code, contribute so much to modularity and code clarity that there is rarely good reason to avoid them. Before rearranging expressions such as i = i + 1, remember that you are dealing with a compiler, not a keystroke-programmable calculator. Any decent compiler will generate identical code for ++i, i += 1, and i = i + 1. The reasons for using ++i or i += 1 over i = i + 1 have to do with style, not efficiency. (See also question 3.12.) 20.15b: People claim that optimizing compilers are good and that we no longer have to write things in assembler for speed, but my compiler can't even replace i/=2 with a shift. A: Was i signed or unsigned? If it was signed, a shift is not equivalent (hint: think about the result if i is negative and odd), so the compiler was correct not to use it. 20.15c: How can I swap two values without using a temporary? A: The standard hoary old assembly language programmer's trick is: a ^= b; b ^= a; a ^= b; But this sort of code has little place in modern, HLL programming. Temporary variables are essentially free, and the idiomatic code using three assignments, namely int t = a; a = b; b = t; is not only clearer to the human reader, it is more likely to be recognized by the compiler and turned into the most-efficient code (e.g. using a swap instruction, if available). The latter code is obviously also amenable to use with pointers and floating-point values, unlike the XOR trick. See also questions 3.3b and 10.3. 20.17: Is there a way to switch on strings? A: Not directly. Sometimes, it's appropriate to use a separate function to map strings to integer codes, and then switch on those. Otherwise, of course, you can fall back on strcmp() and a conventional if/else chain. See also questions 10.12, 20.18, and 20.29. References: K&R1 Sec. 3.4 p. 55; K&R2 Sec. 3.4 p. 58; ISO Sec. 6.6.4.2; H&S Sec. 8.7 p. 248. 20.18: Is there a way to have non-constant case labels (i.e. ranges or arbitrary expressions)? A: No. The switch statement was originally designed to be quite simple for the compiler to translate, therefore case labels are limited to single, constant, integral expressions. You *can* attach several case labels to the same statement, which will let you cover a small range if you don't mind listing all cases explicitly. If you want to select on arbitrary ranges or non-constant expressions, you'll have to use an if/else chain. See also question 20.17. References: K&R1 Sec. 3.4 p. 55; K&R2 Sec. 3.4 p. 58; ISO Sec. 6.6.4.2; Rationale Sec. 3.6.4.2; H&S Sec. 8.7 p. 248. 20.19: Are the outer parentheses in return statements really optional? A: Yes. Long ago, in the early days of C, they were required, and just enough people learned C then, and wrote code which is still in circulation, that the notion that they might still be required is widespread. (As it happens, parentheses are optional with the sizeof operator, too, under certain circumstances.) References: K&R1 Sec. A18.3 p. 218; ISO Sec. 6.3.3, Sec. 6.6.6; H&S Sec. 8.9 p. 254. 20.20: Why don't C comments nest? How am I supposed to comment out code containing comments? Are comments legal inside quoted strings? A: C comments don't nest mostly because PL/I's comments, which C's are borrowed from, don't either. Therefore, it is usually better to "comment out" large sections of code, which might contain comments, with #ifdef or #if 0 (but see question 11.19). The character sequences /* and */ are not special within double- quoted strings, and do not therefore introduce comments, because a program (particularly one which is generating C code as output) might want to print them. Note also that // comments, as in C++, are not yet legal in C, so it's not a good idea to use them in C programs (even if your compiler supports them as an extension). References: K&R1 Sec. A2.1 p. 179; K&R2 Sec. A2.2 p. 192; ISO Sec. 6.1.9, Annex F; Rationale Sec. 3.1.9; H&S Sec. 2.2 pp. 18- 9; PCS Sec. 10 p. 130. 20.20b: Is C a great language, or what? Where else could you write something like a+++++b ? A: Well, you can't meaningfully write it in C, either. The rule for lexical analysis is that at each point during a straightforward left-to-right scan, the longest possible token is determined, without regard to whether the resulting sequence of tokens makes sense. The fragment in the question is therefore interpreted as a ++ ++ + b and cannot be parsed as a valid expression. References: K&R1 Sec. A2 p. 179; K&R2 Sec. A2.1 p. 192; ISO Sec. 6.1; H&S Sec. 2.3 pp. 19-20. 20.24: Why doesn't C have nested functions? A: It's not trivial to implement nested functions such that they have the proper access to local variables in the containing function(s), so they were deliberately left out of C as a simplification. (gcc does allow them, as an extension.) For many potential uses of nested functions (e.g. qsort comparison functions), an adequate if slightly cumbersome solution is to use an adjacent function with static declaration, communicating if necessary via a few static variables. (A cleaner solution, though unsupported by qsort(), is to pass around a pointer to a structure containing the necessary context.) 20.24b: What is assert() and when would I use it? A: It is a macro, defined in , for testing "assertions". An assertion essentially documents an assumption being made by the programmer, an assumption which, if violated, would indicate a serious programming error. For example, a function which was supposed to be called with a non-null pointer could write assert(p != NULL); A failed assertion terminates the program. Assertions should *not* be used to catch expected errors, such as malloc() or fopen() failures. References: K&R2 Sec. B6 pp. 253-4; ISO Sec. 7.2; H&S Sec. 19.1 p. 406. 20.25: How can I call FORTRAN (C++, BASIC, Pascal, Ada, LISP) functions from C? (And vice versa?) A: The answer is entirely dependent on the machine and the specific calling sequences of the various compilers in use, and may not be possible at all. Read your compiler documentation very carefully; sometimes there is a "mixed-language programming guide," although the techniques for passing arguments and ensuring correct run-time startup are often arcane. More information may be found in FORT.gz by Glenn Geers, available via anonymous ftp from suphys.physics.su.oz.au in the src directory. cfortran.h, a C header file, simplifies C/FORTRAN interfacing on many popular machines. It is available via anonymous ftp from zebra.desy.de or at http://www-zeus.desy.de/~burow . In C++, a "C" modifier in an external function declaration indicates that the function is to be called using C calling conventions. References: H&S Sec. 4.9.8 pp. 106-7. 20.26: Does anyone know of a program for converting Pascal or FORTRAN (or LISP, Ada, awk, "Old" C, ...) to C? A: Several freely distributable programs are available: p2c A Pascal to C converter written by Dave Gillespie, posted to comp.sources.unix in March, 1990 (Volume 21); also available by anonymous ftp from csvax.cs.caltech.edu, file pub/p2c-1.20.tar.Z . ptoc Another Pascal to C converter, this one written in Pascal (comp.sources.unix, Volume 10, also patches in Volume 13?). f2c A FORTRAN to C converter jointly developed by people from Bell Labs, Bellcore, and Carnegie Mellon. To find out more about f2c, send the mail message "send index from f2c" to netlib@research.att.com or research!netlib. (It is also available via anonymous ftp on netlib.att.com, in directory netlib/f2c.) This FAQ list's maintainer also has available a list of a few other commercial translation products, and some for more obscure languages. See also questions 11.31 and 18.16. 20.27: Is C++ a superset of C? Can I use a C++ compiler to compile C code? A: C++ was derived from C, and is largely based on it, but there are some legal C constructs which are not legal C++. Conversely, ANSI C inherited several features from C++, including prototypes and const, so neither language is really a subset or superset of the other; the two also define the meaning of some common constructs differently. In spite of the differences, many C programs will compile correctly in a C++ environment, and many recent compilers offer both C and C++ compilation modes. See also questions 8.9 and 20.20. References: H&S p. xviii, Sec. 1.1.5 p. 6, Sec. 2.8 pp. 36-7, Sec. 4.9 pp. 104-107. 20.28: I need a sort of an "approximate" strcmp routine, for comparing two strings for close, but not necessarily exact, equality. A: Some nice information and algorithms having to do with approximate string matching, as well as a useful bibliography, can be found in Sun Wu and Udi Manber's paper "AGREP -- A Fast Approximate Pattern-Matching Tool." Another approach involves the "soundex" algorithm, which maps similar-sounding words to the same codes. Soundex was designed for discovering similar-sounding names (for telephone directory assistance, as it happens), but it can be pressed into service for processing arbitrary words. References: Knuth Sec. 6 pp. 391-2 Volume 3; Wu and Manber, "AGREP -- A Fast Approximate Pattern-Matching Tool" . 20.29: What is hashing? A: Hashing is the process of mapping strings to integers, usually in a relatively small range. A "hash function" maps a string (or some other data structure) to a bounded number (the "hash bucket") which can more easily be used as an index in an array, or for performing repeated comparisons. (Obviously, a mapping from a potentially huge set of strings to a small set of integers will not be unique. Any algorithm using hashing therefore has to deal with the possibility of "collisions.") Many hashing functions and related algorithms have been developed; a full treatment is beyond the scope of this list. References: K&R2 Sec. 6.6; Knuth Sec. 6.4 pp. 506-549 Volume 3; Sedgewick Sec. 16 pp. 231-244. 20.31: How can I find the day of the week given the date? A: Use mktime() or localtime() (see questions 13.13 and 13.14, but beware of DST adjustments if tm_hour is 0), or Zeller's congruence (see the sci.math FAQ list), or this elegant code by Tomohiko Sakamoto: int dayofweek(int y, int m, int d) /* 0 = Sunday */ { static int t[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4}; y -= m < 3; return (y + y/4 - y/100 + y/400 + t[m-1] + d) % 7; } See also questions 13.14 and 20.32. References: ISO Sec. 7.12.2.3. 20.32: Will 2000 be a leap year? Is (year % 4 == 0) an accurate test for leap years? A: Yes and no, respectively. The full expression for the present Gregorian calendar is year % 4 == 0 && (year % 100 != 0 || year % 400 == 0) See a good astronomical almanac or other reference for details. (To forestall an eternal debate: references which claim the existence of a 4000-year rule are wrong.) See also questions 13.14 and 13.14b. 20.34: Here's a good puzzle: how do you write a program which produces its own source code as output? A: It is actually quite difficult to write a self-reproducing program that is truly portable, due particularly to quoting and character set difficulties. Here is a classic example (which ought to be presented on one line, although it will fix itself the first time it's run): char*s="char*s=%c%s%c;main(){printf(s,34,s,34);}"; main(){printf(s,34,s,34);} (This program, like many of the genre, neglects to #include , and assumes that the double-quote character " has the value 34, as it does in ASCII.) 20.35: What is "Duff's Device"? A: It's a devastatingly deviously unrolled byte-copying loop, devised by Tom Duff while he was at Lucasfilm. In its "classic" form, it looks like: register n = (count + 7) / 8; /* count > 0 assumed */ switch (count % 8) { case 0: do { *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; } while (--n > 0); } where count bytes are to be copied from the array pointed to by from to the memory location pointed to by to (which is a memory- mapped device output register, which is why to isn't incremented). It solves the problem of handling the leftover bytes (when count isn't a multiple of 8) by interleaving a switch statement with the loop which copies bytes 8 at a time. (Believe it or not, it *is* legal to have case labels buried within blocks nested in a switch statement like this. In his announcement of the technique to C's developers and the world, Duff noted that C's switch syntax, in particular its "fall through" behavior, had long been controversial, and that "This code forms some sort of argument in that debate, but I'm not sure whether it's for or against.") 20.36: When will the next International Obfuscated C Code Contest (IOCCC) be held? How can I get a copy of the current and previous winning entries? A: The contest is in a state of flux; see http://www.ioccc.org/index.html for current details. Contest winners are usually announced at a Usenix conference, and are posted to the net sometime thereafter. Winning entries from previous years (back to 1984) are archived at ftp.uu.net (see question 18.16) under the directory pub/ioccc/; see also http://www.ioccc.org/index.html . 20.37: What was the entry keyword mentioned in K&R1? A: It was reserved to allow the possibility of having functions with multiple, differently-named entry points, a la FORTRAN. It was not, to anyone's knowledge, ever implemented (nor does anyone remember what sort of syntax might have been imagined for it). It has been withdrawn, and is not a keyword in ANSI C. (See also question 1.12.) References: K&R2 p. 259 Appendix C. 20.38: Where does the name "C" come from, anyway? A: C was derived from Ken Thompson's experimental language B, which was inspired by Martin Richards's BCPL (Basic Combined Programming Language), which was a simplification of CPL (Cambridge Programming Language). For a while, there was speculation that C's successor might be named P (the third letter in BCPL) instead of D, but of course the most visible descendant language today is C++. 20.39: How do you pronounce "char"? A: You can pronounce the C keyword "char" in at least three ways: like the English words "char," "care," or "car" (or maybe even "character"); the choice is arbitrary. 20.39b: What do "lvalue" and "rvalue" mean? A: Simply speaking, an "lvalue" is an expression that could appear on the left-hand sign of an assignment; you can also think of it as denoting an object that has a location. (But see question 6.7 concerning arrays.) An "rvalue" is any expression that has a value (and that can therefore appear on the right-hand sign of an assignment). 20.40: Where can I get extra copies of this list? What about back issues? A: An up-to-date copy may be obtained from ftp.eskimo.com in directory u/s/scs/C-faq/. You can also just pull it off the net; it is normally posted to comp.lang.c on the first of each month, with an Expires: line which should keep it around all month. A parallel, abridged version is available (and posted), as is a list of changes accompanying each significantly updated version. The various versions of this list are also posted to the newsgroups comp.answers and news.answers . Several sites archive news.answers postings and other FAQ lists, including this one; two sites are rtfm.mit.edu (directories pub/usenet/news.answers/C-faq/ and pub/usenet/comp.lang.c/) and ftp.uu.net (directory usenet/news.answers/C-faq/). If you don't have ftp access, a mailserver at rtfm.mit.edu can mail you FAQ lists: send a message containing the single word "help" to mail-server@rtfm.mit.edu . See the meta-FAQ list in news.answers for more information. A hypertext (HTML) version of this FAQ list is available on the World-Wide Web; the URL is http://www.eskimo.com/~scs/C-faq/top.html . A comprehensive site which references all Usenet FAQ lists is http://www.faqs.org/faqs/ . An extended version of this FAQ list has been published by Addison-Wesley as _C Programming FAQs: Frequently Asked Questions_ (ISBN 0-201-84519-9). An errata list is at http://www.eskimo.com/~scs/C-faq/book/Errata.html and on ftp.eskimo.com in u/s/scs/ftp/C-faq/book/Errata . This list is an evolving document containing questions which have been Frequent since before the Great Renaming; it is not just a collection of this month's interesting questions. Older copies are obsolete and don't contain much, except the occasional typo, that the current list doesn't.