Previous: Unicode, Up: Characters [Contents][Index]
MIT/GNU Scheme’s character-set abstraction is used to represent groups of characters, such as the letters or digits. A character set may contain any character. Alternatively, a character set can be treated as a set of code points.
Implementation note: MIT/GNU Scheme allows any “bitless” character to be stored in a character set; operations that accept characters automatically strip their bucky bits.
Returns #t
if object is a character set, otherwise it
returns #f
.
Returns #t
if char is in char-set, otherwise it
returns #f
.
Returns #t
if code-point is in char-set, otherwise
it returns #f
.
Returns a procedure of one argument that returns #t
if its
argument is a character in char-set, otherwise it returns
#f
.
Calls predicate once on each Unicode code point, and returns a character set containing exactly the code points for which predicate returns a true value.
The next procedures represent a character set as a code-point
list, which is a list of code-point range elements. A
code-point range is either a Unicode code point, or a pair
(start . end)
that specifies a contiguous range of
code points. Both start and end must be exact nonnegative
integers less than or equal to #x110000
, and start must
be less than or equal to end. The range specifies all of the
code points greater than or equal to start and strictly less
than end.
Returns a new character set consisting of the characters specified by
elements. The procedure char-set
takes these elements as
multiple arguments, while char-set*
takes them as a single
list-valued argument; in all other respects these procedures are
identical.
An element can take several forms, each of which specifies one or more characters to include in the resulting character set: a character includes itself; a string includes all of the characters it contains; a character set includes its members; or a code-point range includes the corresponding characters.
In addition, an element may be a symbol from the following table, which represents the characters as shown:
Name | Unicode character specification |
---|---|
alphabetic | Alphabetic = True |
alphanumeric | Alphabetic = True | Numeric_Type = Decimal |
cased | Cased = True |
lower-case | Lowercase = True |
numeric | Numeric_Type = Decimal |
unicode | General_Category != (Cs | Cn) |
upper-case | Uppercase = True |
whitespace | White_Space = True |
Returns a code-point list specifying the contents of char-set. The returned list consists of numerically sorted, disjoint, and non-abutting code-point ranges.
Returns #t
if char-set-1 and char-set-2 contain
exactly the same characters, otherwise it returns #f
.
Returns a character set that’s the inverse of char-set. That is, the returned character set contains exactly those characters that aren’t in char-set.
These procedures compute the respective set union, set intersection, and set difference of their arguments.
These procedures correspond to char-set-union
and
char-set-intersection
but take a single argument that’s a list
of character sets rather than multiple character-set arguments.
These constants are the character sets corresponding to
char-alphabetic?
, char-numeric?
,
char-whitespace?
, char-upper-case?
,
char-lower-case?
, and char-alphanumeric?
respectively.
Returns #t
if char-set contains only 8-bit code points
(i.e.. ISO 8859-1 characters), otherwise it returns
#f
.
Previous: Unicode, Up: Characters [Contents][Index]