@findex #\U+
In addition to the standard character syntax, MIT Scheme also supports a
-general syntax that denotes any Unicode character by its code point.
-This notation is @code{#\U+@var{code-point}}, where @var{code-point} is
-a sequence of hexadecimal digits for a valid code point. So the above
+general syntax that denotes any Unicode character by its scalar value.
+This notation is @code{#\U+@var{scalar-value}}, where @var{scalar-value} is
+a sequence of hexadecimal digits for a valid scalar value. So the above
examples could also be written like this:
@example
@code{#\C-SOH}.}
MIT/GNU Scheme uses a 21-bit character code with 4 bucky bits. The
-character code contains the Unicode code point for the character. This
+character code contains the Unicode scalar value for the character. This
is a change from earlier versions of the system, which used the
-@acronym{ISO-8859-1} code point, but it is upwards compatible with
+@acronym{ISO-8859-1} scalar value, but it is upwards compatible with
previous usage, since @acronym{ISO-8859-1} is a proper subset of
Unicode.
@end example
Note that in MIT/GNU Scheme, the value of @code{char-code} is the
-Unicode code point for @var{char}.
+Unicode scalar value for @var{char}.
@end deffn
@defvr variable char-code-limit
This implies that @code{char->integer} and @code{char-code} produce
identical results for characters that have no bucky bits set, and that
-characters are ordered according to their Unicode code points.
+characters are ordered according to their Unicode scalar values.
Note: If the argument to @code{char->integer} or @code{integer->char} is
a constant, the compiler will constant-fold the call, replacing it with
The range of @code{char->integer} is defined to be the exact
non-negative integers that are less than the value of this variable
(exclusive). Note, however, that there are some holes in this range,
-because the character code must be a valid Unicode code point.
+because the character code must be a valid Unicode scalar value.
@end defvr
@node ISO-8859-1 Characters, Character Sets, Internal Representation of Characters, Characters
Unicode as its base character set, and any @acronym{XML}
implementation @emph{must} support Unicode.
-@cindex Code point, Unicode
-@cindex Wide character
-@cindex Character, wide
-The basic unit in a Unicode implementation is the @dfn{code point}. The
-character equivalent of a code point is a @dfn{wide character}.
+@cindex Scalar value, Unicode
+@cindex Unicode character
+@cindex Character, Unicode
+The basic unit in a Unicode implementation is the @dfn{scalar value}. The
+character equivalent of a scalar value is a @dfn{Unicode character}.
-@deffn procedure unicode-code-point? object
-Returns @code{#t} if @var{object} is a Unicode code point. Code points
-are implemented as exact non-negative integers. They are further
+@deffn procedure unicode-scalar-value? object
+Returns @code{#t} if @var{object} is a Unicode scalar value. Scalar
+values are implemented as exact non-negative integers. They are further
limited, by the Unicode standard, to be strictly less than
@code{#x110000}, with the values @code{#xD800} through @code{#xDFFF},
@code{#xFFFE}, and @code{#xFFFF} excluded.
@end deffn
-@deffn procedure wide-char? object
-Returns @code{#t} if @var{object} is a wide character, specifically if
-@var{object} is a character with no bucky bits and whose code satisfies
-@code{unicode-code-point?}.
+@deffn procedure unicode-char? object
+Returns @code{#t} if @var{object} is a Unicode character, specifically
+if @var{object} is a character with no bucky bits and whose code
+satisfies @code{unicode-scalar-value?}.
@end deffn
The Unicode implementation consists of three parts:
@item
An @dfn{alphabet} abstraction, which is an efficient implementation of
-sets of Unicode code points (similar to the @code{char-set}
+sets of Unicode scalar values (similar to the @code{char-set}
abstraction).
@end itemize
Returns @code{#t} if @var{object} is a wide string.
@end deffn
-@deffn procedure make-wide-string k [wide-char]
+@deffn procedure make-wide-string k [unicode-char]
Returns a newly allocated wide string of length @var{k}. If @var{char}
is specified, all elements of the returned string are initialized to
@var{char}; otherwise the contents of the string are unspecified.
@end deffn
-@deffn procedure wide-string wide-char @dots{}
+@deffn procedure wide-string unicode-char @dots{}
Returns a newly allocated wide string consisting of the specified
characters.
@end deffn
index of @var{string}.
@end deffn
-@deffn procedure wide-string-set! wide-string k wide-char
+@deffn procedure wide-string-set! wide-string k unicode-char
Stores @var{char} in element @var{k} of @var{wide-string} and returns an
unspecified value. @var{K} must be a valid index of @var{wide-string}.
@end deffn
@end deffn
@deffn procedure open-wide-output-string
-Returns an output port that accepts wide characters and strings and
+Returns an output port that accepts Unicode characters and strings and
accumulates them in a buffer. Call @code{get-output-string} on the
returned port to get a wide string containing the accumulated
characters.
Applications often need to manipulate sets of characters, such as the
set of alphabetic characters or the set of whitespace characters. The
@dfn{alphabet} abstraction provides an efficient implementation of
-sets of Unicode code points.
+sets of Unicode scalar values.
@deffn procedure alphabet? object
Returns @code{#t} if @var{object} is a Unicode alphabet, otherwise
returns @code{#f}.
@end deffn
-@deffn procedure alphabet wide-char @dots{}
-Returns a Unicode alphabet containing the wide characters passed as
+@deffn procedure alphabet unicode-char @dots{}
+Returns a Unicode alphabet containing the Unicode characters passed as
arguments.
@end deffn
-@deffn procedure code-points->alphabet items
-Returns a Unicode alphabet containing the code points described by
+@deffn procedure scalar-values->alphabet items
+Returns a Unicode alphabet containing the scalar values described by
@var{items}. @var{Items} must satisfy
-@code{well-formed-code-points-list?}.
+@code{well-formed-scalar-values-list?}.
@end deffn
-@deffn procedure alphabet->code-points alphabet
-Returns a well-formed code-points list that describes the code points
+@deffn procedure alphabet->scalar-values alphabet
+Returns a well-formed scalar-values list that describes the scalar values
represented by @var{alphabet}.
@end deffn
-@deffn procedure well-formed-code-points-list? object
-Returns @code{#t} if @var{object} is a well-formed code-points list,
-otherwise returns @code{#f}. A well-formed code-points list is a
-proper list, each element of which is either a code point or a pair of
-code points. A pair of code points represents a contiguous range of
-code points. The @sc{car} of the pair is the lower limit, and the
-@sc{cdr} is the upper limit. Both limits are inclusive, and the lower
-limit must be strictly less than the upper limit.
+@deffn procedure well-formed-scalar-values-list? object
+Returns @code{#t} if @var{object} is a well-formed scalar-values list,
+otherwise returns @code{#f}. A well-formed scalar-values list is a
+proper list, each element of which is either a unicode scalar value or a
+pair of unicode scalar values. A pair of scalar values represents a
+contiguous range of scalar values. The @sc{car} of the pair is the
+lower limit, and the @sc{cdr} is the upper limit. Both limits are
+inclusive, and the lower limit must be less than or equal to the upper
+limit.
@end deffn
@deffn procedure char-in-alphabet? char alphabet
@end deffn
Character sets and alphabets can be converted to one another, provided
-that the alphabet contains only 8-bit code points. This is true
-because 8-bit code points in Unicode map directly to
+that the alphabet contains only 8-bit scalar values. This is true
+because 8-bit scalar values in Unicode map directly to
@acronym{ISO-8859-1} characters, which is what character sets contain.
@deffn procedure char-set->alphabet char-set
-Returns a Unicode alphabet containing the code points that correspond
+Returns a Unicode alphabet containing the scalar values that correspond
to characters that are members of @var{char-set}.
@end deffn
@deffn procedure alphabet->char-set alphabet
Returns a character set containing the characters that correspond to
-8-bit code points that are members of @var{alphabet}. (Code points
+8-bit scalar values that are members of @var{alphabet}. (Scalar values
outside the 8-bit range are ignored.)
@end deffn
@deffn procedure string->alphabet string
-Returns a Unicode alphabet containing the code points corresponding to
+Returns a Unicode alphabet containing the scalar values corresponding to
the characters in @var{string}. Equivalent to
@example
@end deffn
@deffn procedure alphabet->string alphabet
-Returns a newly-allocated string containing the characters
-corresponding to the 8-bit code points in @var{alphabet}. (Code
-points outside the 8-bit range are ignored.)
+Returns a newly-allocated string containing the characters corresponding
+to the 8-bit scalar values in @var{alphabet}. (Scalar values outside
+the 8-bit range are ignored.)
@end deffn
@deffn procedure 8-bit-alphabet? alphabet
-Returns @code{#t} if @var{alphabet} contains only 8-bit code points,
+Returns @code{#t} if @var{alphabet} contains only 8-bit scalar values,
otherwise returns @code{#f}.
@end deffn
@deffn procedure alphabet+ alphabet @dots{}
-Returns a Unicode alphabet that contains each code point that is a
+Returns a Unicode alphabet that contains each scalar value that is a
member of any of the @var{alphabet} arguments.
@end deffn
@deffn procedure alphabet- alphabet1 alphabet2
-Returns a Unicode alphabet that contains each code point that is a
+Returns a Unicode alphabet that contains each scalar value that is a
member of @var{alphabet1} and is not a member of @var{alphabet2}.
@end deffn
(* (get-byte bv bs 1) #x10000)
(* (get-byte bv bs 2) #x100)
(get-byte bv bs 3))))
- (if (unicode-code-point? cp)
+ (if (unicode-scalar-value? cp)
(begin
(set-input-buffer-start! ib (fix:+ bs 4))
cp)
(* (get-byte bv bs 2) #x10000)
(* (get-byte bv bs 1) #x100)
(get-byte bv bs 0))))
- (if (unicode-code-point? cp)
+ (if (unicode-scalar-value? cp)
(begin
(set-input-buffer-start! ib (fix:+ bs 4))
cp)
(files "unicode")
(parent (runtime))
(export ()
+ (error:not-wide-char error:not-unicode-char)
+ (guarantee-wide-char guarantee-unicode-char)
+ (wide-char? unicode-char?)
(wide-string->utf16-be-string string->utf16-be-string)
(wide-string->utf16-le-string string->utf16-le-string)
(wide-string->utf16-string string->utf16-string)
alphabet+
alphabet-
alphabet->char-set
- alphabet->code-points
+ alphabet->scalar-values
alphabet->string
alphabet-predicate
alphabet?
call-with-utf8-output-string
char-in-alphabet?
char-set->alphabet
- code-points->alphabet
combine-utf16-surrogates
error:not-8-bit-alphabet
error:not-alphabet
- error:not-unicode-code-point
+ error:not-unicode-char
+ error:not-unicode-scalar-value
error:not-utf16-be-string
error:not-utf16-high-surrogate
error:not-utf16-le-string
error:not-utf32-le-string
error:not-utf32-string
error:not-utf8-string
- error:not-well-formed-code-point-list
- error:not-wide-char
+ error:not-well-formed-scalar-value-list
error:not-wide-string
for-all-chars-in-string?
for-any-char-in-string?
guarantee-8-bit-alphabet
guarantee-alphabet
- guarantee-unicode-code-point
+ guarantee-unicode-char
+ guarantee-unicode-scalar-value
guarantee-utf16-be-string
guarantee-utf16-high-surrogate
guarantee-utf16-le-string
guarantee-utf32-le-string
guarantee-utf32-string
guarantee-utf8-string
- guarantee-well-formed-code-point-list
- guarantee-wide-char
+ guarantee-well-formed-scalar-value-list
guarantee-wide-string
guarantee-wide-string-index
guarantee-wide-substring
open-utf32-output-string
open-utf8-input-string
open-utf8-output-string
+ scalar-values->alphabet
split-into-utf16-surrogates
string->alphabet
string->utf16-be-string
string->utf8-string
string->utf8-string
string->wide-string
- unicode-code-point?
+ unicode-char?
+ unicode-scalar-value?
utf16-be-string->wide-string
utf16-be-string-length
utf16-be-string-valid?
utf8-string-length
utf8-string-valid?
utf8-string?
- well-formed-code-point-list?
- wide-char?
+ well-formed-scalar-value-list?
wide-string
wide-string->string
wide-string-index?
(cond ((string? object) (string->utf8-string object))
((symbol? object) (symbol-name object))
((wide-string? object) (wide-string->utf8-string object))
- ((wide-char? object) (wide-string->utf8-string (wide-string object)))
+ ((unicode-char? object)
+ (wide-string->utf8-string (wide-string object)))
(else (%->string object 'UTF8-STRING))))
(define (%->string object caller)
\f
;;;; Unicode characters
-(define (wide-char? object)
+(define (unicode-char? object)
(and (char? object)
(legal-code-32? (char->integer object))))
-(define-guarantee wide-char "a Unicode character")
+(define-guarantee unicode-char "a Unicode character")
-(define (unicode-code-point? object)
- (and (%unicode-code-point? object)
- (not (illegal? object))))
-
-(define (%unicode-code-point? object)
+(define (unicode-scalar-value? object)
(and (index-fixnum? object)
- (fix:< object char-code-limit)))
+ (fix:< object char-code-limit)
+ (not (illegal? object))))
-(define-guarantee unicode-code-point "a Unicode code point")
+(define-guarantee unicode-scalar-value "a Unicode scalar value")
(define-integrable (legal-code-32? pt)
(and (fix:< pt char-code-limit)
(define-integrable (make-alphabet-low)
(make-string #x100 (integer->char 0)))
-(define-integrable (alphabet-low-ref low code-point)
- (not (fix:= (fix:and (vector-8b-ref low (fix:lsh code-point -3))
- (fix:lsh 1 (fix:and code-point 7)))
+(define-integrable (alphabet-low-ref low scalar-value)
+ (not (fix:= (fix:and (vector-8b-ref low (fix:lsh scalar-value -3))
+ (fix:lsh 1 (fix:and scalar-value 7)))
0)))
-(define-integrable (alphabet-low-set! low code-point)
+(define-integrable (alphabet-low-set! low scalar-value)
(vector-8b-set! low
- (fix:lsh code-point -3)
- (fix:or (vector-8b-ref low (fix:lsh code-point -3))
- (fix:lsh 1 (fix:and code-point 7)))))
+ (fix:lsh scalar-value -3)
+ (fix:or (vector-8b-ref low (fix:lsh scalar-value -3))
+ (fix:lsh 1 (fix:and scalar-value 7)))))
(define null-alphabet
(make-alphabet (make-alphabet-low) '#() '#()))
(define (char-in-alphabet? char alphabet)
- (guarantee-wide-char char 'CHAR-IN-ALPHABET?)
+ (guarantee-unicode-char char 'CHAR-IN-ALPHABET?)
(guarantee-alphabet alphabet 'CHAR-IN-ALPHABET?)
- (%code-point-in-alphabet? (char-code char) alphabet))
+ (%scalar-value-in-alphabet? (char-code char) alphabet))
-(define (%code-point-in-alphabet? pt alphabet)
+(define (%scalar-value-in-alphabet? pt alphabet)
(if (fix:< pt #x800)
(alphabet-low-ref (alphabet-low alphabet) pt)
(let ((high1 (alphabet-high1 alphabet))
(loop (fix:+ index 1) upper))
(else #t))))))))
-(define (well-formed-code-point-list? items)
- (if (pair? items)
- (and (well-formed-item? (car items))
- (let loop ((a (car items)) (items (cdr items)))
- (if (pair? items)
- (let ((b (car items))
- (items (cdr items)))
- (and (well-formed-item? b)
- (fix:< (if (pair? a) (cdr a) a)
- (if (pair? b) (car b) b))
- (loop b items)))
- (null? items))))
- (null? items)))
+(define (well-formed-scalar-value-list? items)
+ (list-of-type? items well-formed-item?))
(define (well-formed-item? item)
(if (pair? item)
- (and (%unicode-code-point? (car item))
- (%unicode-code-point? (cdr item))
- (fix:< (car item) (cdr item)))
- (%unicode-code-point? item)))
+ (and (unicode-scalar-value? (car item))
+ (unicode-scalar-value? (cdr item))
+ (fix:<= (car item) (cdr item)))
+ (unicode-scalar-value? item)))
-(define-guarantee well-formed-code-point-list "a Unicode code-point list")
+(define-guarantee well-formed-scalar-value-list "a Unicode scalar-value list")
\f
-(define (code-points->alphabet items)
- (guarantee-well-formed-code-point-list items 'CODE-POINTS->ALPHABET)
- (%code-points->alphabet items))
+(define (scalar-values->alphabet items)
+ (guarantee-well-formed-scalar-value-list items 'SCALAR-VALUES->ALPHABET)
+ (%scalar-values->alphabet items))
+
+(define (alphabet . chars)
+ (for-each (lambda (char)
+ (guarantee-unicode-char char 'ALPHABET))
+ chars)
+ (%scalar-values->alphabet (map char->integer chars)))
-(define (%code-points->alphabet items)
+(define (%scalar-values->alphabet items)
(receive (low-items high-items)
- (split-list (canonicalize-code-point-list items) #x800)
+ (split-list (canonicalize-scalar-value-list items) #x800)
(let ((low (make-alphabet-low)))
(for-each (lambda (item)
(if (pair? item)
(vector-set! high2 i (car items)))))
(make-alphabet low high1 high2))))))
-(define (canonicalize-code-point-list items)
- (if (pair? items)
- (let ((a (car items)))
- (let loop
- ((al (if (pair? a) (car a) a))
- (ah (if (pair? a) (cdr a) a))
- (items (cdr items)))
- (if (pair? items)
- (let ((b (car items))
- (items (cdr items)))
- (let ((bl (if (pair? b) (car b) b))
- (bh (if (pair? b) (cdr b) b)))
- (if (fix:= (fix:+ ah 1) bl)
- (loop al bh items)
- (cons (if (fix:= al ah) al (cons al ah))
- (loop bl bh items)))))
- (list (if (fix:= al ah) al (cons al ah))))))
- '()))
-
-(define (split-list items limit)
- (let loop ((items items) (low '()))
- (if (pair? items)
- (let ((item (car items)))
- (cond ((not (pair? item))
- (if (fix:< item limit)
- (loop (cdr items) (cons item low))
- (values low items)))
- ((fix:< (cdr item) limit)
- (loop (cdr items) (cons item low)))
- ((fix:<= limit (car item))
- (values low items))
- (else
- (values (cons (cons (car item) (fix:- limit 1)) low)
- (cons (cons limit (cdr item)) items)))))
- (values low '()))))
-\f
-(define (alphabet . chars)
- (%code-points->alphabet (chars->wfcp-list (remove-duplicate-chars chars))))
-
-(define (remove-duplicate-chars chars)
- (let ((table (make-eq-hash-table)))
- (for-each (lambda (char)
- (guarantee-wide-char char 'REMOVE-DUPLICATE-CHARS)
- (hash-table/put! table char #t))
- chars)
- (hash-table/key-list table)))
-
-(define (chars->wfcp-list chars)
- (let ((chars (sort chars char<?)))
- (if (pair? chars)
- (let loop
- ((pt (char->integer (car chars)))
- (chars (cdr chars))
- (items '()))
- (if (pair? chars)
- (let ((pt* (char->integer (car chars)))
- (chars (cdr chars)))
- (if (fix:= pt* (fix:+ pt 1))
- (let find-max ((pt* pt*) (chars chars))
- (if (pair? chars)
- (let ((pt** (char->integer (car chars)))
- (chars (cdr chars)))
- (if (fix:= pt** (fix:+ pt* 1))
- (find-max pt** chars)
- (loop pt** chars (cons (cons pt pt*) items))))
- (reverse! (cons (cons pt pt*) items))))
- (loop pt* chars (cons pt items))))
- (reverse! (cons pt items))))
- '())))
-
-(define (alphabet->code-points alphabet)
- (guarantee-alphabet alphabet 'ALPHABET->CODE-POINTS)
- (append! (alphabet-low->code-points (alphabet-low alphabet))
- (alphabet-high->code-points (alphabet-high1 alphabet)
+(define (alphabet->scalar-values alphabet)
+ (guarantee-alphabet alphabet 'ALPHABET->SCALAR-VALUES)
+ (append! (alphabet-low->scalar-values (alphabet-low alphabet))
+ (alphabet-high->scalar-values (alphabet-high1 alphabet)
(alphabet-high2 alphabet))))
-(define (alphabet-low->code-points low)
+(define (alphabet-low->scalar-values low)
(let find-lower ((i 0) (result '()))
(if (fix:< i #x800)
(if (alphabet-low-ref low i)
(find-lower (fix:+ i 1) result))
(reverse! result))))
-(define (alphabet-high->code-points lower upper)
+(define (alphabet-high->scalar-values lower upper)
(let ((n (vector-length lower)))
(let loop ((i 0) (result '()))
(if (fix:< i n)
(loop (fix:+ i 1)
- (cons (cons (vector-ref lower i) (vector-ref upper i))
+ (cons (if (fix:< (vector-ref lower i) (vector-ref upper i))
+ (cons (vector-ref lower i) (vector-ref upper i))
+ (vector-ref lower i))
result))
(reverse! result)))))
\f
+(define (canonicalize-scalar-value-list items)
+ (if (pair? items)
+ (let ((items
+ (sort items
+ (lambda (a b)
+ (let ((al (if (pair? a) (car a) a))
+ (ah (if (pair? a) (cdr a) a))
+ (bl (if (pair? b) (car b) b))
+ (bh (if (pair? b) (cdr b) b)))
+ (or (fix:< al bl)
+ (and (fix:= al bl)
+ (fix:< ah bh)))))))
+ (make-item
+ (lambda (l h)
+ (if (fix:= l h)
+ l
+ (cons l h)))))
+ (let loop
+ ((al (if (pair? (car items)) (caar items) (car items)))
+ (ah (if (pair? (car items)) (cdar items) (car items)))
+ (items (cdr items)))
+ (if (pair? items)
+ (let ((bl (if (pair? (car items)) (caar items) (car items)))
+ (bh (if (pair? (car items)) (cdar items) (car items)))
+ (items (cdr items)))
+ (if (fix:< (fix:+ ah 1) bl)
+ (cons (make-item al ah)
+ (loop bl bh items))
+ (loop al (fix:max ah bh) items)))
+ (list (make-item al ah)))))
+ items))
+
+(define (split-list items limit)
+ (let loop ((items items) (low '()))
+ (if (pair? items)
+ (let ((item (car items)))
+ (cond ((not (pair? item))
+ (if (fix:< item limit)
+ (loop (cdr items) (cons item low))
+ (values low items)))
+ ((fix:< (cdr item) limit)
+ (loop (cdr items) (cons item low)))
+ ((fix:<= limit (car item))
+ (values low items))
+ (else
+ (values (cons (cons (car item) (fix:- limit 1)) low)
+ (cons (cons limit (cdr item)) items)))))
+ (values low '()))))
+\f
+#|
+
+(define (test-canonicalize-scalar-value-list n-items n-iter)
+ (run-cpl-test n-items n-iter canonicalize-scalar-value-list))
+
+(define (test-alphabet->scalar-values n-items n-iter)
+ (run-cpl-test n-items n-iter
+ (lambda (cpl)
+ (alphabet->scalar-values (scalar-values->alphabet cpl)))))
+
+(define (run-cpl-test n-items n-iter procedure)
+ (do ((i 0 (+ i 1))
+ (failures '()
+ (let ((cpl (make-test-cpl n-items)))
+ (guarantee-well-formed-scalar-value-list cpl)
+ (let ((cpl* (procedure cpl)))
+ (if (canonical-scalar-value-list? cpl*)
+ failures
+ (cons (cons cpl cpl*) failures))))))
+ ((not (< i n-iter))
+ (let ((n-failures (length failures)))
+ (if (> n-failures 0)
+ (begin
+ (write-string "Got ")
+ (write n-failures)
+ (write-string " failure")
+ (if (> n-failures 1)
+ (write-string "s"))
+ (write-string " out of ")
+ (write n-iter)
+ (newline)
+ (pp failures)))))))
+
+(define (make-test-cpl n-items)
+ (make-initialized-list n-items
+ (lambda (i)
+ (let loop ()
+ (let ((n (random #x10000)))
+ (if (unicode-scalar-value? n)
+ (let ((m (random #x100)))
+ (if (fix:= m 0)
+ n
+ (if (unicode-scalar-value? (fix:+ n m))
+ (fix:+ n m)
+ (loop))))
+ (loop)))))))
+
+(define (canonical-scalar-value-list? items)
+ (and (well-formed-scalar-value-list? items)
+ (if (pair? items)
+ (let loop ((a (car items)) (items (cdr items)))
+ (if (pair? items)
+ (let ((b (car items))
+ (items (cdr items)))
+ (and (fix:< (fix:+ (if (pair? a) (cdr a) a) 1)
+ (if (pair? b) (car b) b))
+ (loop b items)))
+ #t))
+ #t)))
+|#
+\f
(define (8-bit-alphabet? alphabet)
(and (fix:= (vector-length (alphabet-high1 alphabet)) 0)
(let ((low (alphabet-low alphabet)))
(let loop ((i 0) (chars '()))
(if (fix:< i #x100)
(loop (fix:+ i 1)
- (if (%code-point-in-alphabet? i alphabet)
+ (if (%scalar-value-in-alphabet? i alphabet)
(cons (integer->char i) chars)
chars))
(apply string (reverse! chars)))))
(make-vector length
(if (if (default-object? char) #f char)
(begin
- (guarantee-wide-char char 'MAKE-WIDE-STRING)
+ (guarantee-unicode-char char 'MAKE-WIDE-STRING)
char)
(integer->char 0)))))
(define (wide-string . chars)
- (for-each (lambda (char) (guarantee-wide-char char 'WIDE-STRING)) chars)
+ (for-each (lambda (char) (guarantee-unicode-char char 'WIDE-STRING)) chars)
(%make-wide-string (list->vector chars)))
(define (wide-string-length string)
(define (wide-string-set! string index char)
(guarantee-wide-string string 'WIDE-STRING-SET!)
(guarantee-wide-string-index index string 'WIDE-STRING-SET!)
- (guarantee-wide-char char 'WIDE-STRING-SET!)
+ (guarantee-unicode-char char 'WIDE-STRING-SET!)
(%wide-string-set! string index char))
(define-integrable (%wide-string-set! string index char)
#x10000))
(define (split-into-utf16-surrogates n)
- (guarantee-unicode-code-point n 'split-into-utf16-surrogates)
+ (guarantee-unicode-scalar-value n 'split-into-utf16-surrogates)
(let ((n (fix:- n #x10000)))
(values (fix:or (fix:and (fix:lsh n -10) #x03FF) #xD800)
(fix:or (fix:and n #x03FF) #xDC00))))
char-set:turtle-digit))
(define alphabet:name-start-char
- (code-points->alphabet
+ (scalar-values->alphabet
'((#x0041 . #x005A)
#x005F
(#x0061 . #x007A)
(define alphabet:name-char
(alphabet+ alphabet:name-start-char
- (code-points->alphabet
+ (scalar-values->alphabet
'(#x002D
(#x0030 . #x0039)
#x00B7
(alphabet- alphabet:name-start-char (alphabet #\_)))
(define alphabet:character
- (code-points->alphabet '((#x20 . #x5B) (#x5D . #x10FFFF))))
+ (scalar-values->alphabet '((#x20 . #x5B) (#x5D . #x10FFFF))))
(define alphabet:ucharacter
(alphabet- alphabet:character (alphabet #\>)))
(declare (usual-integrations))
(define alphabet:xml-base-char
- (code-points->alphabet
+ (scalar-values->alphabet
'((#x0041 . #x005A)
(#x0061 . #x007A)
(#x00C0 . #x00D6)
(#xAC00 . #xD7A3))))
(define alphabet:xml-ideographic
- (code-points->alphabet
+ (scalar-values->alphabet
'(#x3007
(#x3021 . #x3029)
(#x4E00 . #x9FA5))))
(define alphabet:xml-combining-char
- (code-points->alphabet
+ (scalar-values->alphabet
'((#x0300 . #x0345)
(#x0360 . #x0361)
(#x0483 . #x0486)
#x309A)))
(define alphabet:xml-digit
- (code-points->alphabet
+ (scalar-values->alphabet
'((#x0030 . #x0039)
(#x0660 . #x0669)
(#x06F0 . #x06F9)
(#x0F20 . #x0F29))))
(define alphabet:xml-extender
- (code-points->alphabet
+ (scalar-values->alphabet
'(#x00B7
#x02D0
#x02D1
(#x30FC . #x30FE))))
(define alphabet:xml-char
- (code-points->alphabet
+ (scalar-values->alphabet
'(#x0009
#x000A
#x000D
(define (emit-string string ctx)
(let ((port (ctx-port ctx)))
- (for-each-wide-char string
+ (for-each-unicode-char string
(lambda (char)
(write-char char port)))))
(define (xml-string-columns string)
(let ((n 0))
- (for-each-wide-char string
+ (for-each-unicode-char string
(lambda (char)
(set! n
(fix:+ n
(emit-char #\space ctx)))
(define (write-escaped-string string escapes ctx)
- (for-each-wide-char string
+ (for-each-unicode-char string
(lambda (char)
(cond ((assq char escapes)
=> (lambda (e)
(else
(emit-char char ctx))))))
-(define (for-each-wide-char string procedure)
+(define (for-each-unicode-char string procedure)
(let ((port (open-utf8-input-string string)))
(let loop ()
(let ((char (read-char port)))
(let ((make-ref
(lambda (s r p)
(let ((n (string->number s r)))
- (if (not (unicode-code-point? n))
+ (if (not (unicode-scalar-value? n))
(perror p "Invalid code point" n))
(let ((char (integer->char n)))
(if (not (char-in-alphabet? char alphabet:xml-char))
(value canonicalize canonicalize-char-data))
(define (xml-char-data? object)
- (or (wide-char? object)
+ (or (unicode-char? object)
(and (or (wide-string? object)
(and (string? object)
(utf8-string-valid? object)))
'UTF-8))
(define (canonicalize-char-data object)
- (cond ((wide-char? object)
+ (cond ((unicode-char? object)
(call-with-utf8-output-string
(lambda (port)
(write-char object port))))