Next: Regsexp Procedures, Previous: Regular Expressions, Up: Regular Expressions [Contents][Index]
A regular s-expression is either a character or a string, which matches itself, or one of the following forms.
Examples in this section use the following definitions for brevity:
(define (try-match pattern string) (regsexp-match-string (compile-regsexp pattern) string)) (define (try-search pattern string) (regsexp-search-string-forward (compile-regsexp pattern) string))
These forms match one or more characters literally:
Matches char without considering case.
Matches string without considering case.
Matches one character other than #\newline
.
(try-match '(any-char) "") ⇒ #f (try-match '(any-char) "a") ⇒ (0 1) (try-match '(any-char) "\n") ⇒ #f (try-search '(any-char) "") ⇒ #f (try-search '(any-char) "ab") ⇒ (0 1) (try-search '(any-char) "\na") ⇒ (1 2)
Matches one character in (not in) the character set specified by
(char-set datum …)
.
(try-match '(seq "a" (char-in "ab") "c") "abc") ⇒ (0 3) (try-match '(seq "a" (char-not-in "ab") "c") "abc") ⇒ #f (try-match '(seq "a" (char-not-in "ab") "c") "adc") ⇒ (0 3) (try-match '(seq "a" (+ (char-in numeric)) "c") "a019c") ⇒ (0 5)
These forms match no characters, but only at specific locations in the input string:
Matches no characters at the start (end) of a line.
(try-match '(seq (line-start) (* (any-char)) (line-end)) "abc") ⇒ (0 3)
(try-match '(seq (line-start) (* (any-char)) (line-end)) "ab\nc") ⇒ (0 2)
(try-search '(seq (line-start) (* (char-in alphabetic)) (line-end)) "1abc") ⇒ #f
(try-search '(seq (line-start) (* (char-in alphabetic)) (line-end)) "1\nabc") ⇒ (2 5)
Matches no characters at the start (end) of the string.
(try-match '(seq (string-start) (* (any-char)) (string-end)) "abc") ⇒ (0 3)
(try-match '(seq (string-start) (* (any-char)) (string-end)) "ab\nc") ⇒ #f
(try-search '(seq (string-start) (* (char-in alphabetic)) (string-end)) "1abc") ⇒ #f
(try-search '(seq (string-start) (* (char-in alphabetic)) (string-end)) "1\nabc") ⇒ #f
These forms match repetitions of a given regsexp. Most of them come
in two forms, one of which is greedy and the other shy.
The greedy form matches as many repetitions as it can, then uses
failure backtracking to reduce the number of repetitions one at a
time. The shy form matches the minimum number of repetitions, then
uses failure backtracking to increase the number of repetitions one at
a time. The shy form is similar to the greedy form except that a
?
is added at the end of the form’s keyword.
Matches regsexp zero or one time.
(try-search '(seq (char-in alphabetic) (? (char-in numeric))) "a") ⇒ (0 1)
(try-search '(seq (char-in alphabetic) (?? (char-in numeric))) "a") ⇒ (0 1)
(try-search '(seq (char-in alphabetic) (? (char-in numeric))) "a1") ⇒ (0 2)
(try-search '(seq (char-in alphabetic) (?? (char-in numeric))) "a1") ⇒ (0 1)
(try-search '(seq (char-in alphabetic) (? (char-in numeric))) "1a2") ⇒ (1 3)
(try-search '(seq (char-in alphabetic) (?? (char-in numeric))) "1a2") ⇒ (1 2)
Matches regsexp zero or more times.
(try-match '(seq (char-in alphabetic) (* (char-in numeric)) (any-char)) "aa") ⇒ (0 2)
(try-match '(seq (char-in alphabetic) (*? (char-in numeric)) (any-char)) "aa") ⇒ (0 2)
(try-match '(seq (char-in alphabetic) (* (char-in numeric)) (any-char)) "a123a") ⇒ (0 5)
(try-match '(seq (char-in alphabetic) (*? (char-in numeric)) (any-char)) "a123a") ⇒ (0 2)
Matches regsexp one or more times.
(try-match '(seq (char-in alphabetic) (+ (char-in numeric)) (any-char)) "aa") ⇒ #f
(try-match '(seq (char-in alphabetic) (+? (char-in numeric)) (any-char)) "aa") ⇒ #f
(try-match '(seq (char-in alphabetic) (+ (char-in numeric)) (any-char)) "a123a") ⇒ (0 5)
(try-match '(seq (char-in alphabetic) (+? (char-in numeric)) (any-char)) "a123a") ⇒ (0 3)
The n argument must be an exact nonnegative integer. The
m argument must be either an exact integer greater than or equal
to n, or else #f
.
Matches regsexp at least n times and at most m
times; if m is #f
then there is no upper limit.
(try-match '(seq (char-in alphabetic) (** 0 2 (char-in numeric)) (any-char)) "aa") ⇒ (0 2)
(try-match '(seq (char-in alphabetic) (**? 0 2 (char-in numeric)) (any-char)) "aa") ⇒ (0 2)
(try-match '(seq (char-in alphabetic) (** 0 2 (char-in numeric)) (any-char)) "a123a") ⇒ (0 4)
(try-match '(seq (char-in alphabetic) (**? 0 2 (char-in numeric)) (any-char)) "a123a") ⇒ (0 2)
This is an abbreviation for (** n n
regsexp)
. This matcher is neither greedy nor shy since it
matches a fixed number of repetitions.
These forms implement alternatives and sequencing:
Matches one of the regsexp arguments, trying each in order from left to right.
(try-match '(alt #\a (char-in numeric)) "a") ⇒ (0 1) (try-match '(alt #\a (char-in numeric)) "b") ⇒ #f (try-match '(alt #\a (char-in numeric)) "1") ⇒ (0 1)
Matches the first regsexp, then continues the match with the next regsexp, and so on until all of the arguments are matched.
(try-match '(seq #\a #\b) "a") ⇒ #f (try-match '(seq #\a #\b) "aa") ⇒ #f (try-match '(seq #\a #\b) "ab") ⇒ (0 2)
These forms implement named registers, which store matched segments of the input string:
The key argument must be a fixnum, a character, or a symbol.
Matches regsexp. If the match succeeds, the matched segment is stored in the register named key.
(try-match '(seq (group a (any-char)) (group b (any-char)) (any-char)) "radar") ⇒ (0 3 (a . "r") (b . "a"))
The key argument must be a fixnum, a character, or a symbol.
Matches the characters stored in the register named key. It is
an error if that register has not been initialized with a
corresponding group
expression.
(try-match '(seq (group a (any-char)) (group b (any-char)) (any-char) (group-ref b) (group-ref a)) "radar") ⇒ (0 5 (a . "r") (b . "a"))
Next: Regsexp Procedures, Previous: Regular Expressions, Up: Regular Expressions [Contents][Index]