Characters (MIT/GNU Scheme Pucked Reference Manual)

5 Characters

Characters are objects that represent printed characters such as letters and digits. MIT/GNU Scheme supports the full Unicode character repertoire.

Characters are written using the notation #\character or #\character-name or #\xhex-scalar-value.

The following standard character names are supported:

#\alarm                 ; U+0007
#\backspace             ; U+0008
#\delete                ; U+007F
#\escape                ; U+001B
#\newline               ; the linefeed character, U+000A
#\null                  ; the null character, U+0000
#\return                ; the return character, U+000D
#\space                 ; the preferred way to write a space, U+0020
#\tab                   ; the tab character, U+0009

Here are some additional examples:

#\a                     ; lowercase letter
#\A                     ; uppercase letter
#\(                     ; left parenthesis
#\                      ; the space character

Case is significant in #\character, and in #\character-name, but not in #\xhex-scalar-value. If character in #\character is alphabetic, then any character immediately following character cannot be one that can appear in an identifier. This rule resolves the ambiguous case where, for example, the sequence of characters ‘#\space’ could be taken to be either a representation of the space character or a representation of the character ‘#\s’ followed by a representation of the symbol ‘pace’.

Characters written in the #\ notation are self-evaluating. That is, they do not have to be quoted in programs.

Some of the procedures that operate on characters ignore the difference between upper case and lower case. The procedures that ignore case have ‘-ci’ (for “case insensitive”) embedded in their names.

MIT/GNU Scheme allows a character name to include one or more bucky bit prefixes to indicate that the character includes one or more of the keyboard shift keys Control, Meta, Super, or Hyper (note that the Control bucky bit prefix is not the same as the ASCII control key). The bucky bit prefixes and their meanings are as follows (case is not significant):

Key             Bucky bit prefix        Bucky bit
---             ----------------        ---------

Meta            M- or Meta-                 1
Control         C- or Control-              2
Super           S- or Super-                4
Hyper           H- or Hyper-                8

For example,

#\c-a                   ; Control-a
#\meta-b                ; Meta-b
#\c-s-m-h-A             ; Control-Meta-Super-Hyper-A

procedure: char->name char

Returns a string corresponding to the printed representation of char. This is the character, character-name, or xhex-scalar-value component of the external representation, combined with the appropriate bucky bit prefixes.

(char->name #\a)                        ⇒  "a"
(char->name #\space)                    ⇒  "space"
(char->name #\c-a)                      ⇒  "C-a"
(char->name #\control-a)                ⇒  "C-a"

procedure: name->char string

Converts a string that names a character into the character specified. If string does not name any character, name->char signals an error.

(name->char "a")                        ⇒  #\a
(name->char "space")                    ⇒  #\space
(name->char "SPACE")                    ⇒  #\space
(name->char "c-a")                      ⇒  #\C-a
(name->char "control-a")                ⇒  #\C-a

standard procedure: char? object: Returns #t if object is a character, otherwise returns #f.

standard procedure: char=? char1 char2 char3 …

standard procedure: char<? char1 char2 char3 …

standard procedure: char>? char1 char2 char3 …

standard procedure: char<=? char1 char2 char3 …

standard procedure: char>=? char1 char2 char3 …

These procedures return #t if the results of passing their arguments to char->integer are respectively equal, monotonically increasing, monotonically decreasing, monotonically non-decreasing, or monotonically non-increasing.

These predicates are transitive.

char library procedure: char-ci=? char1 char2 char3 …

char library procedure: char-ci<? char1 char2 char3 …

char library procedure: char-ci>? char1 char2 char3 …

char library procedure: char-ci<=? char1 char2 char3 …

char library procedure: char-ci>=? char1 char2 char3 …

These procedures are similar to char=? et cetera, but they treat upper case and lower case letters as the same. For example, (char-ci=? #\A #\a) returns #t.

Specifically, these procedures behave as if char-foldcase were applied to their arguments before they were compared.

char library procedure: char-alphabetic? char

char library procedure: char-numeric? char

char library procedure: char-whitespace? char

char library procedure: char-upper-case? char

char library procedure: char-lower-case? char

These procedures return #t if their arguments are alphabetic, numeric, whitespace, upper case, or lower case characters respectively, otherwise they return #f.

Specifically, they return #t when applied to characters with the Unicode properties Alphabetic, Numeric_Decimal, White_Space, Uppercase, or Lowercase respectively, and #f when applied to any other Unicode characters. Note that many Unicode characters are alphabetic but neither upper nor lower case.

procedure: char-alphanumeric? char: Returns #t if char is either alphabetic or numeric, otherwise it returns #f.

char library procedure: digit-value char

This procedure returns the numeric value (0 to 9) of its argument if it is a numeric digit (that is, if char-numeric? returns #t), or #f on any other character.

(digit-value #\3) ⇒ 3
(digit-value #\x0664) ⇒ 4
(digit-value #\x0AE6) ⇒ 0
(digit-value #\x0EA6) ⇒ #f

standard procedure: char->integer char

standard procedure: integer->char n

Given a Unicode character, char->integer returns an exact integer between 0 and #xD7FF or between #xE000 and #x10FFFF which is equal to the Unicode scalar value of that character. Given a non-Unicode character, it returns an exact integer greater than #x10FFFF.

Given an exact integer that is the value returned by a character when char->integer is applied to it, integer->char returns that character.

Implementation note: MIT/GNU Scheme allows any Unicode code point, not just scalar values.

Implementation note: If the argument to char->integer or integer->char is a constant, the MIT/GNU Scheme compiler will constant-fold the call, replacing it with the corresponding result. This is a very useful way to denote unusual character constants or ASCII codes.

char library procedure: char-upcase char

char library procedure: char-downcase char

char library procedure: char-foldcase char

The char-upcase procedure, given an argument that is the lowercase part of a Unicode casing pair, returns the uppercase member of the pair. Note that language-sensitive casing pairs are not used. If the argument is not the lowercase member of such a pair, it is returned.

The char-downcase procedure, given an argument that is the uppercase part of a Unicode casing pair, returns the lowercase member of the pair. Note that language-sensitive casing pairs are not used. If the argument is not the uppercase member of such a pair, it is returned.

The char-foldcase procedure applies the Unicode simple case-folding algorithm to its argument and returns the result. Note that language-sensitive folding is not used. See UAX #44 (part of the Unicode Standard) for details.

Note that many Unicode lowercase characters do not have uppercase equivalents.

procedure: char->digit char [radix]

If char is a character representing a digit in the given radix, returns the corresponding integer value. If radix is specified (which must be an exact integer between 2 and 36 inclusive), the conversion is done in that base, otherwise it is done in base 10. If char doesn’t represent a digit in base radix, char->digit returns #f.

Note that this procedure is insensitive to the alphabetic case of char.

(char->digit #\8)                       ⇒  8
(char->digit #\e 16)                    ⇒  14
(char->digit #\e)                       ⇒  #f

procedure: digit->char digit [radix]

Returns a character that represents digit in the radix given by radix. The radix argument, if given, must be an exact integer between 2 and 36 (inclusive); it defaults to 10. The digit argument must be an exact non-negative integer strictly less than radix.

(digit->char 8)                         ⇒  #\8
(digit->char 14 16)                     ⇒  #\E

• Character implementation:
• Unicode:
• Character Sets: