Next: , Previous: , Up: Parser Language   [Contents][Index]


14.14.2 *Parser

The parser language is a declarative language for specifying a parser procedure. A parser procedure is a procedure that accepts a single parser-buffer argument and parses some of the input from the buffer. If the parse is successful, the procedure returns a vector of objects that are the result of the parse, and the internal pointer of the parser buffer is advanced past the input that was parsed. If the parse fails, the procedure returns #f and the internal pointer is unchanged. This interface is much like that of a matcher procedure, except that on success the parser procedure returns a vector of values rather than #t.

The *parser special form is the interface between the parser language and Scheme.

special form: *parser pexp

The operand pexp is an expression in the parser language. The *parser expression expands into Scheme code that implements a parser procedure.

There are several primitive expressions in the parser language. The first two provide a bridge to the matcher language (see *Matcher):

parser expression: match mexp

The match expression performs a match on the parser buffer. The match to be performed is specified by mexp, which is an expression in the matcher language. If the match is successful, the result of the match expression is a vector of one element: a string containing that text.

parser expression: noise mexp

The noise expression performs a match on the parser buffer. The match to be performed is specified by mexp, which is an expression in the matcher language. If the match is successful, the result of the noise expression is a vector of zero elements. (In other words, the text is matched and then thrown away.)

The mexp operand is often a known character or string, so in the case that mexp is a character or string literal, the noise expression can be abbreviated as the literal. In other words, ‘(noise "foo")’ can be abbreviated just ‘"foo"’.

parser expression: values expression …

Sometimes it is useful to be able to insert arbitrary values into the parser result. The values expression supports this. The expression arguments are arbitrary Scheme expressions that are evaluated at run time and returned in a vector. The values expression always succeeds and never modifies the internal pointer of the parser buffer.

parser expression: discard-matched

The discard-matched expression always succeeds, returning a vector of zero elements. In all other respects it is identical to the discard-matched expression in the matcher language.

Next there are several combinator expressions. Parameters named pexp are arbitrary expressions in the parser language. The first few combinators are direct equivalents of those in the matcher language.

parser expression: seq pexp …

The seq expression parses each of the pexp operands in order. If all of the pexp operands successfully match, the result is the concatenation of their values (by vector-append).

parser expression: alt pexp …

The alt expression attempts to parse each pexp operand in order from left to right. The first one that successfully parses produces the result for the entire alt expression.

Like the alt expression in the matcher language, this expression participates in backtracking.

parser expression: * pexp

The * expression parses zero or more occurrences of pexp. The results of the parsed occurrences are concatenated together (by vector-append) to produce the expression’s result.

Like the * expression in the matcher language, this expression participates in backtracking.

parser expression: + pexp

The * expression parses one or more occurrences of pexp. It is equivalent to

(seq pexp (* pexp))
parser expression: ? pexp

The * expression parses zero or one occurrences of pexp. It is equivalent to

(alt pexp (seq))

The next three expressions do not have equivalents in the matcher language. Each accepts a single pexp argument, which is parsed in the usual way. These expressions perform transformations on the returned values of a successful match.

parser expression: transform expression pexp

The transform expression performs an arbitrary transformation of the values returned by parsing pexp. Expression is a Scheme expression that must evaluate to a procedure at run time. If pexp is successfully parsed, the procedure is called with the vector of values as its argument, and must return a vector or #f. If it returns a vector, the parse is successful, and those are the resulting values. If it returns #f, the parse fails and the internal pointer of the parser buffer is returned to what it was before pexp was parsed.

For example:

(transform (lambda (v) (if (= 0 (vector-length v)) #f v)) …)
parser expression: encapsulate expression pexp

The encapsulate expression transforms the values returned by parsing pexp into a single value. Expression is a Scheme expression that must evaluate to a procedure at run time. If pexp is successfully parsed, the procedure is called with the vector of values as its argument, and may return any Scheme object. The result of the encapsulate expression is a vector of length one containing that object. (And consequently encapsulate doesn’t change the success or failure of pexp, only its value.)

For example:

(encapsulate vector->list …)
parser expression: map expression pexp

The map expression performs a per-element transform on the values returned by parsing pexp. Expression is a Scheme expression that must evaluate to a procedure at run time. If pexp is successfully parsed, the procedure is mapped (by vector-map) over the values returned from the parse. The mapped values are returned as the result of the map expression. (And consequently map doesn’t change the success or failure of pexp, nor the number of values returned.)

For example:

(map string->symbol …)

Finally, as in the matcher language, we have sexp and with-pointer to support embedding Scheme code in the parser.

parser expression: sexp expression

The sexp expression allows arbitrary Scheme code to be embedded inside a parser. The expression operand must evaluate to a parser procedure at run time; the procedure is called to parse the parser buffer. This is the parser-language equivalent of the sexp expression in the matcher language.

The case in which expression is a symbol is so common that it has an abbreviation: ‘(sexp symbol)’ may be abbreviated as just symbol.

parser expression: with-pointer identifier pexp

The with-pointer expression fetches the parser buffer’s internal pointer (using get-parser-buffer-pointer), binds it to identifier, and then parses the pattern specified by pexp. Identifier must be a symbol. This is the parser-language equivalent of the with-pointer expression in the matcher language.


Next: , Previous: , Up: Parser Language   [Contents][Index]