Previous: Unicode Representations, Up: Unicode


5.7.3 Alphabets

Applications often need to manipulate sets of characters, such as the set of alphabetic characters or the set of whitespace characters. The alphabet abstraction provides an efficient implementation of sets of Unicode code points.

— procedure: alphabet? object

Returns #t if object is a Unicode alphabet, otherwise returns #f.

— procedure: alphabet wide-char ...

Returns a Unicode alphabet containing the wide characters passed as arguments.

— procedure: code-points->alphabet items

Returns a Unicode alphabet containing the code points described by items. Items must satisfy well-formed-code-points-list?.

— procedure: alphabet->code-points alphabet

Returns a well-formed code-points list that describes the code points represented by alphabet.

— procedure: well-formed-code-points-list? object

Returns #t if object is a well-formed code-points list, otherwise returns #f. A well-formed code-points list is a proper list, each element of which is either a code point or a pair of code points. A pair of code points represents a contiguous range of code points. The car of the pair is the lower limit, and the cdr is the upper limit. Both limits are inclusive, and the lower limit must be strictly less than the upper limit.

— procedure: char-in-alphabet? char alphabet

Returns #t if char is a member of alphabet, otherwise returns #f.

Character sets and alphabets can be converted to one another, provided that the alphabet contains only 8-bit code points. This is true because 8-bit code points in Unicode map directly to ISO-8859-1 characters, which is what character sets contain.

— procedure: char-set->alphabet char-set

Returns a Unicode alphabet containing the code points that correspond to characters that are members of char-set.

— procedure: alphabet->char-set alphabet

Returns a character set containing the characters that correspond to 8-bit code points that are members of alphabet. (Code points outside the 8-bit range are ignored.)

— procedure: string->alphabet string

Returns a Unicode alphabet containing the code points corresponding to the characters in string. Equivalent to

          (char-set->alphabet (string->char-set string))
— procedure: alphabet->string alphabet

Returns a newly-allocated string containing the characters corresponding to the 8-bit code points in alphabet. (Code points outside the 8-bit range are ignored.)

— procedure: 8-bit-alphabet? alphabet

Returns #t if alphabet contains only 8-bit code points, otherwise returns #f.

— procedure: alphabet+ alphabet ...

Returns a Unicode alphabet that contains each code point that is a member of any of the alphabet arguments.

— procedure: alphabet- alphabet1 alphabet2

Returns a Unicode alphabet that contains each code point that is a member of alphabet1 and is not a member of alphabet2.