Next: *Parser, Previous: Parser Language, Up: Parser Language [Contents][Index]
The matcher language is a declarative language for specifying a matcher procedure. A matcher procedure is a procedure that accepts a single parser-buffer argument and returns a boolean value indicating whether the match it performs was successful. If the match succeeds, the internal pointer of the parser buffer is moved forward over the matched text. If the match fails, the internal pointer is unchanged.
For example, here is a matcher procedure that matches the character ‘a’:
(lambda (b) (match-parser-buffer-char b #\a))
Here is another example that matches two given characters, c1 and c2, in sequence:
(lambda (b) (let ((p (get-parser-buffer-pointer b))) (if (match-parser-buffer-char b c1) (if (match-parser-buffer-char b c2) #t (begin (set-parser-buffer-pointer! b p) #f)) #f)))
This is code is clear, but has lots of details that get in the way of understanding what it is doing. Here is the same example in the matcher language:
(*matcher (seq (char c1) (char c2)))
This is much simpler and more intuitive. And it generates virtually the same code:
(pp (*matcher (seq (char c1) (char c2)))) -| (lambda (#[b1]) -| (let ((#[p1] (get-parser-buffer-pointer #[b1]))) -| (and (match-parser-buffer-char #[b1] c1) -| (if (match-parser-buffer-char #[b1] c2) -| #t -| (begin -| (set-parser-buffer-pointer! #[b1] #[p1]) -| #f)))))
Now that we have seen an example of the language, it’s time to look at
the detail. The *matcher
special form is the interface between
the matcher language and Scheme.
The operand mexp is an expression in the matcher language. The
*matcher
expression expands into Scheme code that implements a
matcher procedure.
Here are the predefined matcher expressions. New matcher expressions can be defined using the macro facility (see Parser-language Macros). We will start with the primitive expressions.
These expressions match a given character. In each case, the expression operand is a Scheme expression that must evaluate to a character at run time. The ‘-ci’ expressions do case-insensitive matching. The ‘not-’ expressions match any character other than the given one.
These expressions match a given string. The expression operand
is a Scheme expression that must evaluate to a string at run time.
The string-ci
expression does case-insensitive matching.
These expressions match a single character that is a member of a given character set. The expression operand is a Scheme expression that must evaluate to a character set at run time.
The end-of-input
expression is successful only when there are
no more characters available to be matched.
The discard-matched
expression always successfully matches the
null string. However, it isn’t meant to be used as a matching
expression; it is used for its effect. discard-matched
causes
all of the buffered text prior to this point to be discarded (i.e.
it calls discard-parser-buffer-head!
on the parser buffer).
Note that discard-matched
may not be used in certain places in
a matcher expression. The reason for this is that it deliberately
discards information needed for backtracking, so it may not be used in
a place where subsequent backtracking will need to back over it. As a
rule of thumb, use discard-matched
only in the last operand of
a seq
or alt
expression (including any seq
or
alt
expressions in which it is indirectly contained).
In addition to the above primitive expressions, there are two
convenient abbreviations. A character literal (e.g. ‘#\A’) is
a legal primitive expression, and is equivalent to a char
expression with that literal as its operand (e.g. ‘(char
#\A)’). Likewise, a string literal is equivalent to a string
expression (e.g. ‘(string "abc")’).
Next there are several combinator expressions. These closely correspond to similar combinators in regular expressions. Parameters named mexp are arbitrary expressions in the matcher language.
This matches each mexp operand in sequence. For example,
(seq (char-set char-set:alphabetic) (char-set char-set:numeric))
matches an alphabetic character followed by a numeric character, such as ‘H4’.
Note that if there are no mexp operands, the seq
expression successfully matches the null string.
This attempts to match each mexp operand in order from left to
right. The first one that successfully matches becomes the match for
the entire alt
expression.
The alt
expression participates in backtracking. If one of the
mexp operands matches, but the overall match in which this
expression is embedded fails, the backtracking mechanism will cause
the alt
expression to try the remaining mexp operands.
For example, if the expression
(seq (alt "ab" "a") "b")
is matched against the text ‘abc’, the alt
expression will
initially match its first operand. But it will then fail to match the
second operand of the seq
expression. This will cause the
alt
to be restarted, at which time it will match ‘a’, and
the overall match will succeed.
Note that if there are no mexp operands, the alt
match
will always fail.
This matches zero or more occurrences of the mexp operand. (Consequently this match always succeeds.)
The *
expression participates in backtracking; if it matches
N occurrences of mexp, but the overall match fails, it
will backtrack to N-1 occurrences and continue. If the overall
match continues to fail, the *
expression will continue to
backtrack until there are no occurrences left.
This matches one or more occurrences of the mexp operand. It is equivalent to
(seq mexp (* mexp))
This matches zero or one occurrences of the mexp operand. It is equivalent to
(alt mexp (seq))
The sexp
expression allows arbitrary Scheme code to be embedded
inside a matcher. The expression operand must evaluate to a
matcher procedure at run time; the procedure is called to match the
parser buffer. For example,
(*matcher (seq "a" (sexp parse-foo) "b"))
expands to
(lambda (#[b1]) (let ((#[p1] (get-parser-buffer-pointer #[b1]))) (and (match-parser-buffer-char #[b1] #\a) (if (parse-foo #[b1]) (if (match-parser-buffer-char #[b1] #\b) #t (begin (set-parser-buffer-pointer! #[b1] #[p1]) #f)) (begin (set-parser-buffer-pointer! #[b1] #[p1]) #f)))))
The case in which expression is a symbol is so common that it has an abbreviation: ‘(sexp symbol)’ may be abbreviated as just symbol.
The with-pointer
expression fetches the parser buffer’s
internal pointer (using get-parser-buffer-pointer
), binds it to
identifier, and then matches the pattern specified by
mexp. Identifier must be a symbol.
This is meant to be used on conjunction with sexp
, as a way to
capture a pointer to a part of the input stream that is outside the
sexp
expression. An example of the use of with-pointer
appears above (see with-pointer example).
Next: *Parser, Previous: Parser Language, Up: Parser Language [Contents][Index]