Section 20. Miscellaneous

20.1:   How can I return multiple values from a function?

A:      Either pass pointers to several locations which the function can
	fill in, or have the function return a structure containing the
	desired values, or (in a pinch) consider global variables.  See
	also questions 2.7, 4.8, and 7.5a.

20.3:   How do I access command-line arguments?

A:      They are pointed to by the argv array with which main() is
	called.  See also questions 8.2, 13.7, and 19.20.

	References: K&R1 Sec. 5.11 pp. 110-114; K&R2 Sec. 5.10 pp. 114-
	118; ISO Sec. 5.1.2.2.1; H&S Sec. 20.1 p. 416; PCS Sec. 5.6 pp.
	81-2, Sec. 11 p. 159, pp. 339-40 Appendix F; Schumacher, ed.,
	_Software Solutions in C_ Sec. 4 pp. 75-85.

20.5:   How can I write data files which can be read on other machines
	with different word size, byte order, or floating point formats?

A:      The most portable solution is to use text files (usually ASCII),
	written with fprintf() and read with fscanf() or the like.
	(Similar advice also applies to network protocols.)  Be
	skeptical of arguments which imply that text files are too big,
	or that reading and writing them is too slow.  Not only is their
	efficiency frequently acceptable in practice, but the advantages
	of being able to interchange them easily between machines, and
	manipulate them with standard tools, can be overwhelming.

	If you must use a binary format, you can improve portability,
	and perhaps take advantage of prewritten I/O libraries, by
	making use of standardized formats such as Sun's XDR (RFC 1014),
	OSI's ASN.1 (referenced in CCITT X.409 and ISO 8825 "Basic
	Encoding Rules"), CDF, netCDF, or HDF.  See also questions 2.12
	and 12.38.

	References: PCS Sec. 6 pp. 86, 88.

20.6:   If I have a char * variable pointing to the name of a function,
	how can I call that function?

A:      The most straightforward thing to do is to maintain a
	correspondence table of names and function pointers:

		int func(), anotherfunc();

		struct { char *name; int (*funcptr)(); } symtab[] = {
			"func",         func,
			"anotherfunc",  anotherfunc,
		};

	Then, search the table for the name, and call via the associated
	function pointer.  See also questions 2.15, 18.14, and 19.36.

	References: PCS Sec. 11 p. 168.

20.8:   How can I implement sets or arrays of bits?

A:      Use arrays of char or int, with a few macros to access the
	desired bit at the proper index.  Here are some simple macros to
	use with arrays of char:

		#include <limits.h>             /* for CHAR_BIT */

		#define BITMASK(b) (1 << ((b) % CHAR_BIT))
		#define BITSLOT(b) ((b) / CHAR_BIT)
		#define BITSET(a, b) ((a)[BITSLOT(b)] |= BITMASK(b))
		#define BITTEST(a, b) ((a)[BITSLOT(b)] & BITMASK(b))

	(If you don't have <limits.h>, try using 8 for CHAR_BIT.)

	References: H&S Sec. 7.6.7 pp. 211-216.

20.9:   How can I determine whether a machine's byte order is big-endian
	or little-endian?

A:      One way is to use a pointer:

		int x = 1;
		if(*(char *)&x == 1)
			printf("little-endian\n");
		else    printf("big-endian\n");

	It's also possible to use a union.

	See also question 10.16.

	References: H&S Sec. 6.1.2 pp. 163-4.

20.10:  How can I convert integers to binary or hexadecimal?

A:      Make sure you really know what you're asking.  Integers are
	stored internally in binary, although for most purposes it is
	not incorrect to think of them as being in octal, decimal, or
	hexadecimal, whichever is convenient.  The base in which a
	number is expressed matters only when that number is read in
	from or written out to the outside world.

	In source code, a non-decimal base is indicated by a leading 0
	or 0x (for octal or hexadecimal, respectively).  During I/O, the
	base of a formatted number is controlled in the printf and scanf
	family of functions by the choice of format specifier (%d, %o,
	%x, etc.) and in the strtol() and strtoul() functions by the
	third argument.  If you need to output numeric strings in
	arbitrary bases, you'll have to supply your own function to do
	so (it will essentially be the inverse of strtol).  During
	*binary* I/O, however, the base again becomes immaterial.

	For more information about "binary" I/O, see question 2.11.
	See also questions 8.6 and 13.1.

	References: ISO Secs. 7.10.1.5,7.10.1.6.

20.11:  Can I use base-2 constants (something like 0b101010)?
	Is there a printf() format for binary?

A:      No, on both counts.  You can convert base-2 string
	representations to integers with strtol().  See also question
	20.10.

20.12:  What is the most efficient way to count the number of bits which
	are set in an integer?

A:      Many "bit-fiddling" problems like this one can be sped up and
	streamlined using lookup tables (but see question 20.13 below).

20.13:  What's the best way of making my program efficient?

A:      By picking good algorithms, implementing them carefully, and
	making sure that your program isn't doing any extra work.  For
	example, the most microoptimized character-copying loop in the
	world will be beat by code which avoids having to copy
	characters at all.

	When worrying about efficiency, it's important to keep several
	things in perspective.  First of all, although efficiency is an
	enormously popular topic, it is not always as important as
	people tend to think it is.  Most of the code in most programs
	is not time-critical.  When code is not time-critical, it is
	usually more important that it be written clearly and portably
	than that it be written maximally efficiently.  (Remember that
	computers are very, very fast, and that seemingly "inefficient"
	code may be quite efficiently compilable, and run without
	apparent delay.)

	It is notoriously difficult to predict what the "hot spots" in a
	program will be.  When efficiency is a concern, it is important
	to use profiling software to determine which parts of the
	program deserve attention.  Often, actual computation time is
	swamped by peripheral tasks such as I/O and memory allocation,
	which can be sped up by using buffering and caching techniques.

	Even for code that *is* time-critical, one of the least
	effective optimization techniques is to fuss with the coding
	details.  Many of the "efficient coding tricks" which are
	frequently suggested (e.g. substituting shift operators for
	multiplication by powers of two) are performed automatically by
	even simpleminded compilers.  Heavyhanded optimization attempts
	can make code so bulky that performance is actually degraded,
	and are rarely portable (i.e. they may speed things up on one
	machine but slow them down on another).  In any case, tweaking
	the coding usually results in at best linear performance
	improvements; the big payoffs are in better algorithms.

	For more discussion of efficiency tradeoffs, as well as good
	advice on how to improve efficiency when it is important, see
	chapter 7 of Kernighan and Plauger's _The Elements of
	Programming Style_, and Jon Bentley's _Writing Efficient
	Programs_.

20.14:  Are pointers really faster than arrays?  How much do function
	calls slow things down?  Is ++i faster than i = i + 1?

A:      Precise answers to these and many similar questions depend of
	course on the processor and compiler in use.  If you simply must
	know, you'll have to time test programs carefully.  (Often the
	differences are so slight that hundreds of thousands of
	iterations are required even to see them.  Check the compiler's
	assembly language output, if available, to see if two purported
	alternatives aren't compiled identically.)

	It is "usually" faster to march through large arrays with
	pointers rather than array subscripts, but for some processors
	the reverse is true.

	Function calls, though obviously incrementally slower than in-
	line code, contribute so much to modularity and code clarity
	that there is rarely good reason to avoid them.

	Before rearranging expressions such as i = i + 1, remember that
	you are dealing with a compiler, not a keystroke-programmable
	calculator.  Any decent compiler will generate identical code
	for ++i, i += 1, and i = i + 1.  The reasons for using ++i or
	i += 1 over i = i + 1 have to do with style, not efficiency.
	(See also question 3.12.)

20.15b: People claim that optimizing compilers are good and that we no
	longer have to write things in assembler for speed, but my
	compiler can't even replace i/=2 with a shift.

A:      Was i signed or unsigned?  If it was signed, a shift is not
	equivalent (hint: think about the result if i is negative and
	odd), so the compiler was correct not to use it.

20.15c: How can I swap two values without using a temporary?

A:      The standard hoary old assembly language programmer's trick is:

		a ^= b;
		b ^= a;
		a ^= b;

	But this sort of code has little place in modern, HLL
	programming.  Temporary variables are essentially free,
	and the idiomatic code using three assignments, namely

		int t = a;
		a = b;
		b = t;

	is not only clearer to the human reader, it is more likely to be
	recognized by the compiler and turned into the most-efficient
	code (e.g. using a swap instruction, if available).  The latter
	code is obviously also amenable to use with pointers and
	floating-point values, unlike the XOR trick.  See also questions
	3.3b and 10.3.

20.17:  Is there a way to switch on strings?

A:      Not directly.  Sometimes, it's appropriate to use a separate
	function to map strings to integer codes, and then switch on
	those.  Otherwise, of course, you can fall back on strcmp() and
	a conventional if/else chain.  See also questions 10.12, 20.18,
	and 20.29.

	References: K&R1 Sec. 3.4 p. 55; K&R2 Sec. 3.4 p. 58; ISO
	Sec. 6.6.4.2; H&S Sec. 8.7 p. 248.

20.18:  Is there a way to have non-constant case labels (i.e. ranges or
	arbitrary expressions)?

A:      No.  The switch statement was originally designed to be quite
	simple for the compiler to translate, therefore case labels are
	limited to single, constant, integral expressions.  You *can*
	attach several case labels to the same statement, which will let
	you cover a small range if you don't mind listing all cases
	explicitly.

	If you want to select on arbitrary ranges or non-constant
	expressions, you'll have to use an if/else chain.

	See also question 20.17.

	References: K&R1 Sec. 3.4 p. 55; K&R2 Sec. 3.4 p. 58; ISO
	Sec. 6.6.4.2; Rationale Sec. 3.6.4.2; H&S Sec. 8.7 p. 248.

20.19:  Are the outer parentheses in return statements really optional?

A:      Yes.

	Long ago, in the early days of C, they were required, and just
	enough people learned C then, and wrote code which is still in
	circulation, that the notion that they might still be required
	is widespread.

	(As it happens, parentheses are optional with the sizeof
	operator, too, under certain circumstances.)

	References: K&R1 Sec. A18.3 p. 218; ISO Sec. 6.3.3, Sec. 6.6.6;
	H&S Sec. 8.9 p. 254.

20.20:  Why don't C comments nest?  How am I supposed to comment out
	code containing comments?  Are comments legal inside quoted
	strings?

A:      C comments don't nest mostly because PL/I's comments, which C's
	are borrowed from, don't either.  Therefore, it is usually
	better to "comment out" large sections of code, which might
	contain comments, with #ifdef or #if 0 (but see question 11.19).

	The character sequences /* and */ are not special within double-
	quoted strings, and do not therefore introduce comments, because
	a program (particularly one which is generating C code as
	output) might want to print them.

	Note also that // comments, as in C++, are not yet legal in C,
	so it's not a good idea to use them in C programs (even if your
	compiler supports them as an extension).

	References: K&R1 Sec. A2.1 p. 179; K&R2 Sec. A2.2 p. 192; ISO
	Sec. 6.1.9, Annex F; Rationale Sec. 3.1.9; H&S Sec. 2.2 pp. 18-
	9; PCS Sec. 10 p. 130.

20.20b: Is C a great language, or what?  Where else could you write
	something like a+++++b ?

A:      Well, you can't meaningfully write it in C, either.
	The rule for lexical analysis is that at each point during a
	straightforward left-to-right scan, the longest possible token
	is determined, without regard to whether the resulting sequence
	of tokens makes sense.  The fragment in the question is
	therefore interpreted as

		a ++ ++ + b

	and cannot be parsed as a valid expression.

	References: K&R1 Sec. A2 p. 179; K&R2 Sec. A2.1 p. 192; ISO
	Sec. 6.1; H&S Sec. 2.3 pp. 19-20.

20.24:  Why doesn't C have nested functions?

A:      It's not trivial to implement nested functions such that they
	have the proper access to local variables in the containing
	function(s), so they were deliberately left out of C as a
	simplification.  (gcc does allow them, as an extension.)  For
	many potential uses of nested functions (e.g. qsort comparison
	functions), an adequate if slightly cumbersome solution is to
	use an adjacent function with static declaration, communicating
	if necessary via a few static variables.  (A cleaner solution,
	though unsupported by qsort(), is to pass around a pointer to
	a structure containing the necessary context.)

20.24b: What is assert() and when would I use it?

A:      It is a macro, defined in <assert.h>, for testing "assertions".
	An assertion essentially documents an assumption being made by
	the programmer, an assumption which, if violated, would indicate
	a serious programming error.  For example, a function which was
	supposed to be called with a non-null pointer could write

		assert(p != NULL);

	A failed assertion terminates the program.  Assertions should
	*not* be used to catch expected errors, such as malloc() or
	fopen() failures.

	References: K&R2 Sec. B6 pp. 253-4; ISO Sec. 7.2; H&S Sec. 19.1
	p. 406.

20.25:  How can I call FORTRAN (C++, BASIC, Pascal, Ada, LISP) functions
	from C?  (And vice versa?)

A:      The answer is entirely dependent on the machine and the specific
	calling sequences of the various compilers in use, and may not
	be possible at all.  Read your compiler documentation very
	carefully; sometimes there is a "mixed-language programming
	guide," although the techniques for passing arguments and
	ensuring correct run-time startup are often arcane.  More
	information may be found in FORT.gz by Glenn Geers, available
	via anonymous ftp from suphys.physics.su.oz.au in the src
	directory.

	cfortran.h, a C header file, simplifies C/FORTRAN interfacing on
	many popular machines.  It is available via anonymous ftp from
	zebra.desy.de or at http://www-zeus.desy.de/~burow .

	In C++, a "C" modifier in an external function declaration
	indicates that the function is to be called using C calling
	conventions.

	References: H&S Sec. 4.9.8 pp. 106-7.

20.26:  Does anyone know of a program for converting Pascal or FORTRAN
	(or LISP, Ada, awk, "Old" C, ...) to C?

A:      Several freely distributable programs are available:

	p2c     A Pascal to C converter written by Dave Gillespie,
		posted to comp.sources.unix in March, 1990 (Volume 21);
		also available by anonymous ftp from
		csvax.cs.caltech.edu, file pub/p2c-1.20.tar.Z .

	ptoc    Another Pascal to C converter, this one written in
		Pascal (comp.sources.unix, Volume 10, also patches in
		Volume 13?).

	f2c     A FORTRAN to C converter jointly developed by people
		from Bell Labs, Bellcore, and Carnegie Mellon.  To find
		out more about f2c, send the mail message "send index
		from f2c" to netlib@research.att.com or research!netlib.
		(It is also available via anonymous ftp on
		netlib.att.com, in directory netlib/f2c.)

	This FAQ list's maintainer also has available a list of a few
	other commercial translation products, and some for more obscure
	languages.

	See also questions 11.31 and 18.16.

20.27:  Is C++ a superset of C?  Can I use a C++ compiler to compile C
	code?

A:      C++ was derived from C, and is largely based on it, but there
	are some legal C constructs which are not legal C++.
	Conversely, ANSI C inherited several features from C++,
	including prototypes and const, so neither language is really a
	subset or superset of the other; the two also define the meaning
	of some common constructs differently.  In spite of the
	differences, many C programs will compile correctly in a C++
	environment, and many recent compilers offer both C and C++
	compilation modes.  See also questions 8.9 and 20.20.

	References: H&S p. xviii, Sec. 1.1.5 p. 6, Sec. 2.8 pp. 36-7,
	Sec. 4.9 pp. 104-107.

20.28:  I need a sort of an "approximate" strcmp routine, for comparing
	two strings for close, but not necessarily exact, equality.

A:      Some nice information and algorithms having to do with
	approximate string matching, as well as a useful bibliography,
	can be found in Sun Wu and Udi Manber's paper "AGREP -- A Fast
	Approximate Pattern-Matching Tool."

	Another approach involves the "soundex" algorithm, which maps
	similar-sounding words to the same codes.  Soundex was designed
	for discovering similar-sounding names (for telephone directory
	assistance, as it happens), but it can be pressed into service
	for processing arbitrary words.

	References: Knuth Sec. 6 pp. 391-2 Volume 3; Wu and Manber,
	"AGREP -- A Fast Approximate Pattern-Matching Tool" .

20.29:  What is hashing?

A:      Hashing is the process of mapping strings to integers, usually
	in a relatively small range.  A "hash function" maps a string
	(or some other data structure) to a bounded number (the "hash
	bucket") which can more easily be used as an index in an array,
	or for performing repeated comparisons.  (Obviously, a mapping
	from a potentially huge set of strings to a small set of
	integers will not be unique.  Any algorithm using hashing
	therefore has to deal with the possibility of "collisions.")
	Many hashing functions and related algorithms have been
	developed; a full treatment is beyond the scope of this list.

	References: K&R2 Sec. 6.6; Knuth Sec. 6.4 pp. 506-549 Volume 3;
	Sedgewick Sec. 16 pp. 231-244.

20.31:  How can I find the day of the week given the date?

A:      Use mktime() or localtime() (see questions 13.13 and 13.14, but
	beware of DST adjustments if tm_hour is 0), or Zeller's
	congruence (see the sci.math FAQ list), or this elegant code by
	Tomohiko Sakamoto:

		int dayofweek(int y, int m, int d)      /* 0 = Sunday */
		{
			static int t[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4};
			y -= m < 3;
			return (y + y/4 - y/100 + y/400 + t[m-1] + d) % 7;
		}

	See also questions 13.14 and 20.32.

	References: ISO Sec. 7.12.2.3.

20.32:  Will 2000 be a leap year?  Is (year % 4 == 0) an accurate test
	for leap years?

A:      Yes and no, respectively.  The full expression for the present
	Gregorian calendar is

		year % 4 == 0 && (year % 100 != 0 || year % 400 == 0)

	See a good astronomical almanac or other reference for details.
	(To forestall an eternal debate: references which claim the
	existence of a 4000-year rule are wrong.)  See also questions
	13.14 and 13.14b.

20.34:  Here's a good puzzle: how do you write a program which produces
	its own source code as output?

A:      It is actually quite difficult to write a self-reproducing
	program that is truly portable, due particularly to quoting and
	character set difficulties.

	Here is a classic example (which ought to be presented on one
	line, although it will fix itself the first time it's run):

		char*s="char*s=%c%s%c;main(){printf(s,34,s,34);}";
		main(){printf(s,34,s,34);}

	(This program, like many of the genre, neglects to #include
	<stdio.h>, and assumes that the double-quote character " has the
	value 34, as it does in ASCII.)

20.35:  What is "Duff's Device"?

A:      It's a devastatingly deviously unrolled byte-copying loop,
	devised by Tom Duff while he was at Lucasfilm.  In its "classic"
	form, it looks like:

		register n = (count + 7) / 8;   /* count > 0 assumed */
		switch (count % 8)
		{
		case 0:    do { *to = *from++;
		case 7:         *to = *from++;
		case 6:         *to = *from++;
		case 5:         *to = *from++;
		case 4:         *to = *from++;
		case 3:         *to = *from++;
		case 2:         *to = *from++;
		case 1:         *to = *from++;
			      } while (--n > 0);
		}

	where count bytes are to be copied from the array pointed to by
	from to the memory location pointed to by to (which is a memory-
	mapped device output register, which is why to isn't
	incremented).  It solves the problem of handling the leftover
	bytes (when count isn't a multiple of 8) by interleaving a
	switch statement with the loop which copies bytes 8 at a time.
	(Believe it or not, it *is* legal to have case labels buried
	within blocks nested in a switch statement like this.  In his
	announcement of the technique to C's developers and the world,
	Duff noted that C's switch syntax, in particular its "fall
	through" behavior, had long been controversial, and that "This
	code forms some sort of argument in that debate, but I'm not
	sure whether it's for or against.")

20.36:  When will the next International Obfuscated C Code Contest
	(IOCCC) be held?  How can I get a copy of the current and
	previous winning entries?

A:      The contest is in a state of flux; see
	http://www.ioccc.org/index.html for current details.

	Contest winners are usually announced at a Usenix conference,
	and are posted to the net sometime thereafter.  Winning entries
	from previous years (back to 1984) are archived at ftp.uu.net
	(see question 18.16) under the directory pub/ioccc/; see also
	http://www.ioccc.org/index.html .

20.37:  What was the entry keyword mentioned in K&R1?

A:      It was reserved to allow the possibility of having functions
	with multiple, differently-named entry points, a la FORTRAN.  It
	was not, to anyone's knowledge, ever implemented (nor does
	anyone remember what sort of syntax might have been imagined for
	it).  It has been withdrawn, and is not a keyword in ANSI C.
	(See also question 1.12.)

	References: K&R2 p. 259 Appendix C.

20.38:  Where does the name "C" come from, anyway?

A:      C was derived from Ken Thompson's experimental language B, which
	was inspired by Martin Richards's BCPL (Basic Combined
	Programming Language), which was a simplification of CPL
	(Cambridge Programming Language).  For a while, there was
	speculation that C's successor might be named P (the third
	letter in BCPL) instead of D, but of course the most visible
	descendant language today is C++.

20.39:  How do you pronounce "char"?

A:      You can pronounce the C keyword "char" in at least three ways:
	like the English words "char," "care," or "car" (or maybe even
	"character"); the choice is arbitrary.

20.39b: What do "lvalue" and "rvalue" mean?

A:      Simply speaking, an "lvalue" is an expression that could appear
	on the left-hand sign of an assignment; you can also think of it
	as denoting an object that has a location.  (But see question
	6.7 concerning arrays.)  An "rvalue" is any expression that has
	a value (and that can therefore appear on the right-hand sign of
	an assignment).

20.40:  Where can I get extra copies of this list?
	What about back issues?

A:      An up-to-date copy may be obtained from ftp.eskimo.com in
	directory u/s/scs/C-faq/.  You can also just pull it off the
	net; it is normally posted to comp.lang.c on the first of each
	month, with an Expires: line which should keep it around all
	month.  A parallel, abridged version is available (and posted),
	as is a list of changes accompanying each significantly updated
	version.

	The various versions of this list are also posted to the
	newsgroups comp.answers and news.answers .  Several sites
	archive news.answers postings and other FAQ lists, including
	this one; two sites are rtfm.mit.edu (directories
	pub/usenet/news.answers/C-faq/ and pub/usenet/comp.lang.c/) and
	ftp.uu.net (directory usenet/news.answers/C-faq/).  If you don't
	have ftp access, a mailserver at rtfm.mit.edu can mail you FAQ
	lists: send a message containing the single word "help" to
	mail-server@rtfm.mit.edu .  See the meta-FAQ list in
	news.answers for more information.

	A hypertext (HTML) version of this FAQ list is available on the
	World-Wide Web; the URL is http://www.eskimo.com/~scs/C-faq/top.html .
	A comprehensive site which references all Usenet FAQ lists is
	http://www.faqs.org/faqs/ .

	An extended version of this FAQ list has been published by
	Addison-Wesley as _C Programming FAQs: Frequently Asked
	Questions_ (ISBN 0-201-84519-9).  An errata list is at
	http://www.eskimo.com/~scs/C-faq/book/Errata.html and on
	ftp.eskimo.com in u/s/scs/ftp/C-faq/book/Errata .

	This list is an evolving document containing questions which
	have been Frequent since before the Great Renaming; it is not
	just a collection of this month's interesting questions.  Older
	copies are obsolete and don't contain much, except the
	occasional typo, that the current list doesn't.