from the string until it is the correct length; if it is @code{#f}
then the string is returned unchanged. The grapheme clusters are
removed from the beginning of the string if @code{where} is
-@code{leading}, otherwise from the end of the string.
+@code{leading}, otherwise from the end of the string. The default
+value of this argument is @code{#t}.
@end itemize
Some examples:
@end example
@end deffn
-@deffn procedure string-trimmer where trim-char? copy?
+@deffn procedure string-trimmer where to-trim copy?
@cindex trimming, of string
This procedure's arguments are keyword arguments; that is, each
argument is a symbol of the same name followed by its value. The
@code{both}.
@item
@findex char-whitespace?
-@var{trim-char?} is a procedure that accepts a single character
-argument and returns a true value for a character that should be
-removed by the trimmer, or a false value for a character that should
-be retained. The default value of this argument is @code{char-whitespace?}.
+@var{to-trim} is either a character, a character set, or more
+generally a procedure that accepts a single character argument and
+returns a boolean value. The trimmer uses this to identify characters
+to remove. The default value of this argument is
+@code{char-whitespace?}.
@item
@var{copy?} is a boolean: if @code{#t}, the trimmer returns an
immutable copy of the trimmed string, if @code{#f} it returns a slice.
((string-trimmer) " ABC DEF ")
@result{} "ABC DEF"
-((string-trimmer 'trim-char? char-numeric? 'where 'leading)
+((string-trimmer 'to-trim char-numeric? 'where 'leading)
"21 East 21st Street #3")
@result{} " East 21st Street #3"
-((string-trimmer 'trim-char? char-numeric? 'where 'trailing)
+((string-trimmer 'to-trim char-numeric? 'where 'trailing)
"21 East 21st Street #3")
@result{} "21 East 21st Street #"
-((string-trimmer 'trim-char? char-numeric?)
+((string-trimmer 'to-trim char-numeric?)
"21 East 21st Street #3")
@result{} " East 21st Street #"
@end example
@node Regular Expressions, , Searching and Matching Strings, Strings
@section Regular Expressions
-MIT/GNU Scheme provides support for using regular expressions to search and
-match strings. This manual does not define regular expressions; instead
-see @ref{Regexps, , Syntax of Regular Expressions, emacs, The Emacs
-Editor}.
-
-In addition to providing standard regular-expression support, MIT/GNU
-Scheme also provides the @acronym{REXP} abstraction. This is an
-alternative way to write regular expressions that is easier to read
-and understand than the standard notation. Regular expressions
-written in this notation can be translated into the standard
-notation.
-
-The regular-expression support is a run-time-loadable option. To use
-it, execute
-
-@example
-(load-option 'regular-expression)
-@end example
-
-@noindent
-once before calling any of the procedures defined here.
+MIT/GNU Scheme provides support for matching and searching strings
+against regular expressions. This is considerably more flexible than
+ordinary string matching and searching, but potentially much slower.
+On the other hand it is less powerful than the mechanism described in
+@ref{Parser Language}.
+
+While traditional regular expressions are defined with string patterns
+in which characters like @samp{[} and @samp{*} have special meanings.
+Unfortunately, the syntax of these patterns is not only baroque but
+also comes in many different and mutually-incompatible varieties. As
+a consequence we have chosen to specify regular expressions using an
+s-expression syntax, which we call a @dfn{regular s-expression},
+abbreviated as @dfn{regsexp}.
+
+Previous releases of MIT/GNU Scheme provided a regular-expression
+mechanism nearly identical to that of GNU Emacs version 18. This
+mechanism still exists but is deprecated and will be removed in a
+future release.
@menu
-* Regular-expression procedures::
-* REXP abstraction::
+* Regular S-Expressions::
+* Regsexp Procedures::
@end menu
-@node Regular-expression procedures, REXP abstraction, Regular Expressions, Regular Expressions
-@subsection Regular-expression procedures
-@cindex searching, for regular expression
-@cindex regular expression, searching string for
-
-Procedures that perform regular-expression match and search accept
-standardized arguments. @var{Regexp} is the regular expression; it is
-either a string representation of a regular expression, or a compiled
-regular expression object. @var{String} is the string being matched
-or searched. Procedures that operate on substrings also accept
-@var{start} and @var{end} index arguments with the usual meaning. The
-optional argument @var{case-fold?} says whether the match/search is
-case-sensitive; if @var{case-fold?} is @code{#f}, it is
-case-sensitive, otherwise it is case-insensitive. The optional
-argument @var{syntax-table} is a character syntax table that defines
-the character syntax, such as which characters are legal word
-constituents. This feature is primarily for Edwin, so character
-syntax tables will not be documented here. Supplying @code{#f} for
-(or omitting) @var{syntax-table} will select the default character
-syntax, equivalent to Edwin's @code{fundamental} mode.
-
-@deffn procedure re-string-match regexp string [case-fold? [syntax-table]]
-@deffnx procedure re-substring-match regexp string start end [case-fold? [syntax-table]]
-These procedures match @var{regexp} against the respective string or
-substring, returning @code{#f} for no match, or a set of match registers
-(see below) if the match succeeds. Here is an example showing how to
-extract the matched substring:
+@node Regular S-Expressions, Regsexp Procedures, Regular Expressions, Regular Expressions
+@subsection Regular S-Expressions
-@example
-@group
-(let ((r (re-substring-match @var{regexp} @var{string} @var{start} @var{end})))
- (and r
- (substring @var{string} @var{start} (re-match-end-index 0 r))))
-@end group
-@end example
-@end deffn
+A regular s-expression is either a character or a string, which
+matches itself, or one of the following forms.
-@deffn procedure re-string-search-forward regexp string [case-fold? [syntax-table]]
-@deffnx procedure re-substring-search-forward regexp string start end [case-fold? [syntax-table]]
-Searches @var{string} for the leftmost substring matching @var{regexp}.
-Returns a set of match registers (see below) if the search is
-successful, or @code{#f} if it is unsuccessful.
-
-@code{re-substring-search-forward} limits its search to the specified
-substring of @var{string}; @code{re-string-search-forward} searches all
-of @var{string}.
-@end deffn
-
-@deffn procedure re-string-search-backward regexp string [case-fold? [syntax-table]]
-@deffnx procedure re-substring-search-backward regexp string start end [case-fold? [syntax-table]]
-Searches @var{string} for the rightmost substring matching @var{regexp}.
-Returns a set of match registers (see below) if the search is
-successful, or @code{#f} if it is unsuccessful.
-
-@code{re-substring-search-backward} limits its search to the specified
-substring of @var{string}; @code{re-string-search-backward} searches all
-of @var{string}.
-@end deffn
-
-When a successful match or search occurs, the above procedures return a
-set of @dfn{match registers}. The match registers are a set of index
-registers that record indexes into the matched string. Each index
-register corresponds to an instance of the regular-expression grouping
-operator @samp{\(}, and records the start index (inclusive) and end
-index (exclusive) of the matched group. These registers are numbered
-from @code{1} to @code{9}, corresponding left-to-right to the grouping
-operators in the expression. Additionally, register @code{0}
-corresponds to the entire substring matching the regular expression.
-
-@deffn procedure re-match-start-index n registers
-@deffnx procedure re-match-end-index n registers
-@var{N} must be an exact integer between @code{0} and @code{9}
-inclusive. @var{Registers} must be a match-registers object as returned
-by one of the regular-expression match or search procedures above.
-@code{re-match-start-index} returns the start index of the corresponding
-regular-expression register, and @code{re-match-end-index} returns the
-corresponding end index.
-@end deffn
-
-@deffn procedure re-match-extract string registers n
-@var{Registers} must be a match-registers object as returned by one of
-the regular-expression match or search procedures above. @var{String}
-must be the string that was passed as an argument to the procedure that
-returned @var{registers}. @var{N} must be an exact integer between
-@code{0} and @code{9} inclusive. If the matched regular expression
-contained @var{m} grouping operators, then the value of this procedure
-is undefined for @var{n} strictly greater than @var{m}.
-
-This procedure extracts the substring corresponding to the match
-register specified by @var{registers} and @var{n}. This is equivalent
-to the following expression:
+These forms match one or more characters literally:
-@example
-@group
-(substring @var{string}
- (re-match-start-index @var{n} @var{registers})
- (re-match-end-index @var{n} @var{registers}))
-@end group
-@end example
+@deffn {regsexp} char-ci char
+Matches @var{char} without considering case.
@end deffn
-@deffn procedure regexp-group alternative @dots{}
-Each @var{alternative} must be a string representation of a regular
-expression. The returned value is a new string representation of a
-regular expression that consists of the @var{alternative}s combined by
-a grouping operator. For example:
-
-@example
-@group
-(regexp-group "foo" "bar" "baz")
- @result{} "\\(foo\\|bar\\|baz\\)"
-@end group
-@end example
+@deffn {regsexp} string-ci string
+Matches @var{string} without considering case.
@end deffn
-@deffn procedure re-compile-pattern regexp-string
-@var{Regexp-string} must be the string representation of a regular
-expression. Returns a compiled regular expression object of the
-represented regular expression.
-
-Procedures that apply regular expressions, such as
-@code{re-string-search-forward}, are sometimes faster when used with
-compiled regular expression objects than when used with the string
-representations of regular expressions, so applications that reuse
-regular expressions may speed up matching and searching by caching the
-compiled regular expression objects. However, the regular expression
-procedures have some internal caches as well, so this is likely to
-improve performance only for applications that use a large number of
-different regular expressions before cycling through the same ones
-again.
+@deffn {regsexp} any-char
+Matches one character other than @code{#\newline}.
@end deffn
-@node REXP abstraction, , Regular-expression procedures, Regular Expressions
-@subsection REXP abstraction
-
-@cindex REXP abstraction
-In addition to providing standard regular-expression support, MIT/GNU
-Scheme also provides the @acronym{REXP} abstraction. This is an
-alternative way to write regular expressions that is easier to read
-and understand than the standard notation. Regular expressions
-written in this notation can be translated into the standard notation.
-
-The @acronym{REXP} abstraction is a set of combinators that are
-composed into a complete regular expression. Each combinator directly
-corresponds to a particular piece of regular-expression notation. For
-example, the expression @code{(rexp-any-char)} corresponds to the
-@code{.} character in standard regular-expression notation, while
-@code{(rexp* @var{rexp})} corresponds to the @code{*} character.
+@deffn {regsexp} char-set datum @dots{}
+@deffnx {regsexp} inverse-char-set datum @dots{}
+Matches one character in (not in) the character set specified by
+@code{(char-set @var{datum @dots{}})}.
+@end deffn
-The primary advantages of @acronym{REXP} are that it makes the nesting
-structure of regular expressions explicit, and that it simplifies the
-description of complex regular expressions by allowing them to be
-built up using straightforward combinators.
+These forms match no characters, but only at specific locations in the
+input string:
-@deffn procedure rexp? object
-Returns @code{#t} if @var{object} is a @acronym{REXP} expression, or
-@code{#f} otherwise. A @acronym{REXP} is one of: a string, which
-represents the pattern matching that string; a character set, which
-represents the pattern matching a character in that set; or an object
-returned by calling one of the procedures defined here.
+@deffn {regsexp} line-start
+@deffnx {regsexp} line-end
+Matches no characters at the start (end) of a line.
@end deffn
-@deffn procedure rexp->regexp rexp
-Converts @var{rexp} to standard regular-expression notation, returning
-a newly-allocated string.
+@deffn {regsexp} string-start
+@deffnx {regsexp} string-end
+Matches no characters at the start (end) of the string.
@end deffn
-@deffn procedure rexp-compile rexp
-Converts @var{rexp} to standard regular-expression notation, then
-compiles it and returns the compiled result. Equivalent to
+These forms match repetitions of a given regsexp. Most of them come
+in two forms, one of which is @dfn{greedy} and the other @dfn{shy}.
+The greedy form matches as many repetitions as it can, then uses
+failure backtracking to reduce the number of repetitions one at a
+time. The shy form matches the minimum number of repetitions, then
+uses failure backtracking to increase the number of repetitions one at
+a time. The shy form is similar to the greedy form except that a
+@code{?} is added at the end of the form's keyword.
-@example
-(re-compile-pattern (rexp->regexp @var{rexp}) #f)
-@end example
+@deffn {regsexp} ? regsexp
+@deffnx {regsexp} ?? regsexp
+Matches @var{regsexp} zero or one time.
@end deffn
-@deffn procedure rexp-any-char
-Returns a @acronym{REXP} that matches any single character except a
-newline. This is equivalent to the @code{.} construct.
+@deffn {regsexp} * regsexp
+@deffnx {regsexp} *? regsexp
+Matches @var{regsexp} zero or more times.
@end deffn
-@deffn procedure rexp-line-start
-Returns a @acronym{REXP} that matches the start of a line. This is
-equivalent to the @code{^} construct.
+@deffn {regsexp} + regsexp
+@deffnx {regsexp} +? regsexp
+Matches @var{regsexp} one or more times.
@end deffn
-@deffn procedure rexp-line-end
-Returns a @acronym{REXP} that matches the end of a line. This is
-equivalent to the @code{$} construct.
-@end deffn
+@deffn {regsexp} ** n m regsexp
+@deffnx {regsexp} **? n m regsexp
+The @var{n} argument must be an exact nonnegative integer. The
+@var{m} argument must be either an exact integer greater than or equal
+to @var{n}, or else @code{#f}.
-@deffn procedure rexp-string-start
-Returns a @acronym{REXP} that matches the start of the text being
-matched. This is equivalent to the @code{\`} construct.
+Matches @var{regsexp} at least @var{n} times and at most @var{m}
+times; if @var{m} is @code{#f} then there is no upper limit.
@end deffn
-@deffn procedure rexp-string-end
-Returns a @acronym{REXP} that matches the end of the text being
-matched. This is equivalent to the @code{\'} construct.
+@deffn {regsexp} ** n regsexp
+This is an abbreviation for @code{(** @var{n} @var{n}
+@var{regsexp})}. This matcher is neither greedy nor shy since it
+matches a fixed number of repetitions.
@end deffn
-@deffn procedure rexp-word-edge
-Returns a @acronym{REXP} that matches the start or end of a word.
-This is equivalent to the @code{\b} construct.
-@end deffn
+These forms implement alternatives and sequencing:
-@deffn procedure rexp-not-word-edge
-Returns a @acronym{REXP} that matches anywhere that is not the start
-or end of a word. This is equivalent to the @code{\B} construct.
+@deffn {regsexp} alt regsexp @dots{}
+Matches one of the @var{regsexp} arguments, trying each in order from
+left to right.
@end deffn
-@deffn procedure rexp-word-start
-Returns a @acronym{REXP} that matches the start of a word.
-This is equivalent to the @code{\<} construct.
+@deffn {regsexp} seq regsexp @dots{}
+Matches the first @var{regsexp}, then continues the match with the
+next @var{regsexp}, and so on until all of the arguments are matched.
@end deffn
-@deffn procedure rexp-word-end
-Returns a @acronym{REXP} that matches the end of a word.
-This is equivalent to the @code{\>} construct.
-@end deffn
+These forms implement named @dfn{registers}, which store matched
+segments of the input string:
-@deffn procedure rexp-word-char
-Returns a @acronym{REXP} that matches any word-constituent character.
-This is equivalent to the @code{\w} construct.
-@end deffn
+@deffn {regsexp} group key regsexp
+The @var{key} argument must be a fixnum, a character, or a symbol.
-@deffn procedure rexp-not-word-char
-Returns a @acronym{REXP} that matches any character that isn't a word
-constituent. This is equivalent to the @code{\W} construct.
+Matches @var{regsexp}. If the match succeeds, the matched segment is
+stored in the register named @var{key}.
@end deffn
-The next two procedures accept a @var{syntax-type} argument specifying
-the syntax class to be matched against. This argument is a symbol
-selected from the following list. Each symbol is followed by the
-equivalent character used in standard regular-expression notation.
-@code{whitespace} (space character),
-@code{punctuation} (@code{.}),
-@code{word} (@code{w}),
-@code{symbol} (@code{_}),
-@code{open} (@code{(}),
-@code{close} (@code{)}),
-@code{quote} (@code{'}),
-@code{string-delimiter} (@code{"}),
-@code{math-delimiter} (@code{$}),
-@code{escape} (@code{\}),
-@code{char-quote} (@code{/}),
-@code{comment-start} (@code{<}),
-@code{comment-end} (@code{>}).
+@deffn {regsexp} group-ref key
+The @var{key} argument must be a fixnum, a character, or a symbol.
-@deffn procedure rexp-syntax-char syntax-type
-Returns a @acronym{REXP} that matches any character of type
-@var{syntax-type}. This is equivalent to the @code{\s} construct.
+Matches the characters stored in the register named @var{key}. It is
+an error if that register has not been initialized with a
+corresponding @code{group} expression.
@end deffn
-@deffn procedure rexp-not-syntax-char syntax-type
-Returns a @acronym{REXP} that matches any character not of type
-@var{syntax-type}. This is equivalent to the @code{\S} construct.
-@end deffn
-
-@deffn procedure rexp-sequence rexp @dots{}
-Returns a @acronym{REXP} that matches each @var{rexp} argument in
-sequence. If no @var{rexp} argument is supplied, the result matches
-the null string. This is equivalent to concatenating the regular
-expressions corresponding to each @var{rexp} argument.
-@end deffn
+@node Regsexp Procedures, , Regular S-Expressions, Regular Expressions
+@subsection Regsexp Procedures
-@deffn procedure rexp-alternatives rexp @dots{}
-Returns a @acronym{REXP} that matches any of the @var{rexp}
-arguments. This is equivalent to concatenating the regular
-expressions corresponding to each @var{rexp} argument, separating them
-by the @code{\|} construct.
-@end deffn
+The regular s-expression implementation has two parts, like
+many other regular-expression implementations: a compiler that
+translates the pattern into an efficient form, and one or more
+procedures that use that pattern to match or search inputs.
-@deffn procedure rexp-group rexp @dots{}
-@code{rexp-group} is like @code{rexp-sequence}, except that the result
-is marked as a match group. This is equivalent to the @code{\(}
-@dots{} @code{\)} construct.
+@deffn procedure compile-regsexp regsexp
+Compiles @var{regsexp} by translating it into a procedure that
+implements the specified matcher.
@end deffn
-The next three procedures in principal accept a single @acronym{REXP}
-argument. For convenience, they accept multiple arguments, which are
-converted into a single argument by @code{rexp-group}. Note, however,
-that if only one @acronym{REXP} argument is supplied, and it's very
-simple, no grouping occurs.
+The match and search procedures each return a list when they are
+successful, and @code{#f} when they fail. The returned list is of the
+form @code{(@var{s} @var{e} @var{register} @dots{})}, where @var{s} is
+the index at which the match starts, @var{e} is the index at which the
+match ends, and each @var{register} is a pair @code{(@var{key}
+. @var{contents})} where @var{key} is the register's name and
+@var{contents} is the contents of that register as a string.
-@deffn procedure rexp* rexp @dots{}
-Returns a @acronym{REXP} that matches zero or more instances of the
-pattern matched by the @var{rexp} arguments. This is equivalent to
-the @code{*} construct.
-@end deffn
+@deffn procedure regsexp-match-string crse string [start [end]]
+The @var{crse} argument must be a value returned by
+@code{compile-regsexp}. The @var{string} argument must satisfy
+@code{string-in-nfc?}.
-@deffn procedure rexp+ rexp @dots{}
-Returns a @acronym{REXP} that matches one or more instances of the
-pattern matched by the @var{rexp} arguments. This is equivalent to
-the @code{+} construct.
+Matches @var{string} against @var{crse} and returns the result.
@end deffn
-@deffn procedure rexp-optional rexp @dots{}
-Returns a @acronym{REXP} that matches zero or one instances of the
-pattern matched by the @var{rexp} arguments. This is equivalent to
-the @code{?} construct.
-@end deffn
+@deffn procedure regsexp-search-string-forward crse string [start [end]]
+The @var{crse} argument must be a value returned by
+@code{compile-regsexp}. The @var{string} argument must satisfy
+@code{string-in-nfc?}.
-@deffn procedure rexp-case-fold rexp
-Returns a @acronym{REXP} that matches the same pattern as @var{rexp},
-but is insensitive to character case. This has no equivalent in
-standard regular-expression notation.
+Searches @var{string} from left to right for a match against
+@var{crse} and returns the result.
@end deffn