Previous: Regular-expression procedures, Up: Regular Expressions


6.8.2 REXP abstraction

In addition to providing standard regular-expression support, MIT/GNU Scheme also provides the REXP abstraction. This is an alternative way to write regular expressions that is easier to read and understand than the standard notation. Regular expressions written in this notation can be translated into the standard notation.

The REXP abstraction is a set of combinators that are composed into a complete regular expression. Each combinator directly corresponds to a particular piece of regular-expression notation. For example, the expression (rexp-any-char) corresponds to the . character in standard regular-expression notation, while (rexp* rexp) corresponds to the * character.

The primary advantages of REXP are that it makes the nesting structure of regular expressions explicit, and that it simplifies the description of complex regular expressions by allowing them to be built up using straightforward combinators.

— procedure: rexp? object

Returns #t if object is a REXP expression, or #f otherwise. A REXP is one of: a string, which represents the pattern matching that string; a character set, which represents the pattern matching a character in that set; or an object returned by calling one of the procedures defined here.

— procedure: rexp->regexp rexp

Converts rexp to standard regular-expression notation, returning a newly-allocated string.

— procedure: rexp-compile rexp

Converts rexp to standard regular-expression notation, then compiles it and returns the compiled result. Equivalent to

          (re-compile-pattern (rexp->regexp rexp) #f)
— procedure: rexp-any-char

Returns a REXP that matches any single character except a newline. This is equivalent to the . construct.

— procedure: rexp-line-start

Returns a REXP that matches the start of a line. This is equivalent to the ^ construct.

— procedure: rexp-line-end

Returns a REXP that matches the end of a line. This is equivalent to the $ construct.

— procedure: rexp-string-start

Returns a REXP that matches the start of the text being matched. This is equivalent to the \` construct.

— procedure: rexp-string-end

Returns a REXP that matches the end of the text being matched. This is equivalent to the \' construct.

— procedure: rexp-word-edge

Returns a REXP that matches the start or end of a word. This is equivalent to the \b construct.

— procedure: rexp-not-word-edge

Returns a REXP that matches anywhere that is not the start or end of a word. This is equivalent to the \B construct.

— procedure: rexp-word-start

Returns a REXP that matches the start of a word. This is equivalent to the \< construct.

— procedure: rexp-word-end

Returns a REXP that matches the end of a word. This is equivalent to the \> construct.

— procedure: rexp-word-char

Returns a REXP that matches any word-constituent character. This is equivalent to the \w construct.

— procedure: rexp-not-word-char

Returns a REXP that matches any character that isn't a word constituent. This is equivalent to the \W construct.

The next two procedures accept a syntax-type argument specifying the syntax class to be matched against. This argument is a symbol selected from the following list. Each symbol is followed by the equivalent character used in standard regular-expression notation. whitespace (space character), punctuation (.), word (w), symbol (_), open ((), close ()), quote ('), string-delimiter ("), math-delimiter ($), escape (\), char-quote (/), comment-start (<), comment-end (>).

— procedure: rexp-syntax-char syntax-type

Returns a REXP that matches any character of type syntax-type. This is equivalent to the \s construct.

— procedure: rexp-not-syntax-char syntax-type

Returns a REXP that matches any character not of type syntax-type. This is equivalent to the \S construct.

— procedure: rexp-sequence rexp ...

Returns a REXP that matches each rexp argument in sequence. If no rexp argument is supplied, the result matches the null string. This is equivalent to concatenating the regular expressions corresponding to each rexp argument.

— procedure: rexp-alternatives rexp ...

Returns a REXP that matches any of the rexp arguments. This is equivalent to concatenating the regular expressions corresponding to each rexp argument, separating them by the \| construct.

— procedure: rexp-group rexp ...

rexp-group is like rexp-sequence, except that the result is marked as a match group. This is equivalent to the \( ... \) construct.

The next three procedures in principal accept a single REXP argument. For convenience, they accept multiple arguments, which are converted into a single argument by rexp-group. Note, however, that if only one REXP argument is supplied, and it's very simple, no grouping occurs.

— procedure: rexp* rexp ...

Returns a REXP that matches zero or more instances of the pattern matched by the rexp arguments. This is equivalent to the * construct.

— procedure: rexp+ rexp ...

Returns a REXP that matches one or more instances of the pattern matched by the rexp arguments. This is equivalent to the + construct.

— procedure: rexp-optional rexp ...

Returns a REXP that matches zero or one instances of the pattern matched by the rexp arguments. This is equivalent to the ? construct.

— procedure: rexp-case-fold rexp

Returns a REXP that matches the same pattern as rexp, but is insensitive to character case. This has no equivalent in standard regular-expression notation.