Next: , Previous: , Up: Regular Expressions   [Contents][Index]


6.2.1 Regular S-Expressions

A regular s-expression is either a character or a string, which matches itself, or one of the following forms.

Examples in this section use the following definitions for brevity:

(define (try-match pattern string)
  (regsexp-match-string (compile-regsexp pattern) string))

(define (try-search pattern string)
  (regsexp-search-string-forward (compile-regsexp pattern) string))

These forms match one or more characters literally:

regsexp: char-ci char

Matches char without considering case.

regsexp: string-ci string

Matches string without considering case.

regsexp: any-char

Matches one character other than #\newline.

(try-match '(any-char) "") ⇒ #f
(try-match '(any-char) "a") ⇒ (0 1)
(try-match '(any-char) "\n") ⇒ #f
(try-search '(any-char) "") ⇒ #f
(try-search '(any-char) "ab") ⇒ (0 1)
(try-search '(any-char) "\na") ⇒ (1 2)
regsexp: char-in datum …
regsexp: char-not-in datum …

Matches one character in (not in) the character set specified by (char-set datum …).

(try-match '(seq "a" (char-in "ab") "c") "abc") ⇒ (0 3)
(try-match '(seq "a" (char-not-in "ab") "c") "abc") ⇒ #f
(try-match '(seq "a" (char-not-in "ab") "c") "adc") ⇒ (0 3)
(try-match '(seq "a" (+ (char-in numeric)) "c") "a019c") ⇒ (0 5)

These forms match no characters, but only at specific locations in the input string:

regsexp: line-start
regsexp: line-end

Matches no characters at the start (end) of a line.

(try-match '(seq (line-start)
                 (* (any-char))
                 (line-end))
           "abc") ⇒ (0 3)
(try-match '(seq (line-start)
                 (* (any-char))
                 (line-end))
           "ab\nc") ⇒ (0 2)
(try-search '(seq (line-start)
                  (* (char-in alphabetic))
                  (line-end))
            "1abc") ⇒ #f
(try-search '(seq (line-start)
                  (* (char-in alphabetic))
                  (line-end))
            "1\nabc") ⇒ (2 5)
regsexp: string-start
regsexp: string-end

Matches no characters at the start (end) of the string.

(try-match '(seq (string-start)
                 (* (any-char))
                 (string-end))
           "abc") ⇒ (0 3)
(try-match '(seq (string-start)
                 (* (any-char))
                 (string-end))
           "ab\nc") ⇒ #f
(try-search '(seq (string-start)
                  (* (char-in alphabetic))
                  (string-end))
            "1abc") ⇒ #f
(try-search '(seq (string-start)
                  (* (char-in alphabetic))
                  (string-end))
            "1\nabc") ⇒ #f

These forms match repetitions of a given regsexp. Most of them come in two forms, one of which is greedy and the other shy. The greedy form matches as many repetitions as it can, then uses failure backtracking to reduce the number of repetitions one at a time. The shy form matches the minimum number of repetitions, then uses failure backtracking to increase the number of repetitions one at a time. The shy form is similar to the greedy form except that a ? is added at the end of the form’s keyword.

regsexp: ? regsexp
regsexp: ?? regsexp

Matches regsexp zero or one time.

(try-search '(seq (char-in alphabetic)
                  (? (char-in numeric)))
            "a") ⇒ (0 1)
(try-search '(seq (char-in alphabetic)
                  (?? (char-in numeric)))
            "a") ⇒ (0 1)
(try-search '(seq (char-in alphabetic)
                  (? (char-in numeric)))
            "a1") ⇒ (0 2)
(try-search '(seq (char-in alphabetic)
                  (?? (char-in numeric)))
            "a1") ⇒ (0 1)
(try-search '(seq (char-in alphabetic)
                  (? (char-in numeric)))
            "1a2") ⇒ (1 3)
(try-search '(seq (char-in alphabetic)
                  (?? (char-in numeric)))
            "1a2") ⇒ (1 2)
regsexp: * regsexp
regsexp: *? regsexp

Matches regsexp zero or more times.

(try-match '(seq (char-in alphabetic)
                 (* (char-in numeric))
                 (any-char))
           "aa") ⇒ (0 2)
(try-match '(seq (char-in alphabetic)
                 (*? (char-in numeric))
                 (any-char))
           "aa") ⇒ (0 2)
(try-match '(seq (char-in alphabetic)
                 (* (char-in numeric))
                 (any-char))
           "a123a") ⇒ (0 5)
(try-match '(seq (char-in alphabetic)
                 (*? (char-in numeric))
                 (any-char))
           "a123a") ⇒ (0 2)
regsexp: + regsexp
regsexp: +? regsexp

Matches regsexp one or more times.

(try-match '(seq (char-in alphabetic)
                 (+ (char-in numeric))
                 (any-char))
           "aa") ⇒ #f
(try-match '(seq (char-in alphabetic)
                 (+? (char-in numeric))
                 (any-char))
           "aa") ⇒ #f
(try-match '(seq (char-in alphabetic)
                 (+ (char-in numeric))
                 (any-char))
           "a123a") ⇒ (0 5)
(try-match '(seq (char-in alphabetic)
                 (+? (char-in numeric))
                 (any-char))
           "a123a") ⇒ (0 3)
regsexp: ** n m regsexp
regsexp: **? n m regsexp

The n argument must be an exact nonnegative integer. The m argument must be either an exact integer greater than or equal to n, or else #f.

Matches regsexp at least n times and at most m times; if m is #f then there is no upper limit.

(try-match '(seq (char-in alphabetic)
                 (** 0 2 (char-in numeric))
                 (any-char))
           "aa") ⇒ (0 2)
(try-match '(seq (char-in alphabetic)
                 (**? 0 2 (char-in numeric))
                 (any-char))
           "aa") ⇒ (0 2)
(try-match '(seq (char-in alphabetic)
                 (** 0 2 (char-in numeric))
                 (any-char))
           "a123a") ⇒ (0 4)
(try-match '(seq (char-in alphabetic)
                 (**? 0 2 (char-in numeric))
                 (any-char))
           "a123a") ⇒ (0 2)
regsexp: ** n regsexp

This is an abbreviation for (** n n regsexp). This matcher is neither greedy nor shy since it matches a fixed number of repetitions.

These forms implement alternatives and sequencing:

regsexp: alt regsexp …

Matches one of the regsexp arguments, trying each in order from left to right.

(try-match '(alt #\a (char-in numeric)) "a") ⇒ (0 1)
(try-match '(alt #\a (char-in numeric)) "b") ⇒ #f
(try-match '(alt #\a (char-in numeric)) "1") ⇒ (0 1)
regsexp: seq regsexp …

Matches the first regsexp, then continues the match with the next regsexp, and so on until all of the arguments are matched.

(try-match '(seq #\a #\b) "a") ⇒ #f
(try-match '(seq #\a #\b) "aa") ⇒ #f
(try-match '(seq #\a #\b) "ab") ⇒ (0 2)

These forms implement named registers, which store matched segments of the input string:

regsexp: group key regsexp

The key argument must be a fixnum, a character, or a symbol.

Matches regsexp. If the match succeeds, the matched segment is stored in the register named key.

(try-match '(seq (group a (any-char))
                 (group b (any-char))
                 (any-char))
           "radar") ⇒ (0 3 (a . "r") (b . "a"))
regsexp: group-ref key

The key argument must be a fixnum, a character, or a symbol.

Matches the characters stored in the register named key. It is an error if that register has not been initialized with a corresponding group expression.

(try-match '(seq (group a (any-char))
                 (group b (any-char))
                 (any-char)
                 (group-ref b)
                 (group-ref a))
           "radar") ⇒ (0 5 (a . "r") (b . "a"))

Next: , Previous: , Up: Regular Expressions   [Contents][Index]