The I/O subsystem has once again been redesigned. The primary goal of
this large change is to integrate support for Unicode and character
coding directly into the I/O subsystem. Secondary goals are to
improve I/O performance, to simplify the design, and to provide
flexibility for future enhancement.
This change set has received cursory testing, and no doubt a number of
problems remain. Additionally, there are several unfinished aspects
to the change. But this version works well enough to run Edwin.
Detailed changes
----------------
The term "line translation" is everywhere replaced with "line ending".
A line ending is now specified by a symbol, such as 'crlf or 'lf;
previously it was a string. I/O files now support a single line
ending for both input and output sides; previously there were two
independent line translations.
The I/O buffers have been completely redesigned. They now operate in
three stages: one stage does byte-stream I/O, the second manages
coding (e.g. UTF-8), and the third manages line endings. Only bytes
are buffered. As a consequence, READ-CHAR and WRITE-CHAR will now
handle any Unicode character, provided the port's coding is set to an
appropriate value.
The READ-SUBSTRING port operation can now assume that its START
argument is strictly less than its END argument. Likewise for the new
operations READ-WIDE-SUBSTRING and READ-EXTERNAL-SUBSTRING.
The WRITE-SUBSTRING port operation now returns either #F or a
non-negative integer. It can also now assume that its START argument
is strictly less than its END argument. Both of these properties are
true for the new WRITE-WIDE-SUBSTRING and WRITE-EXTERNAL-SUBSTRING.
The WRITE-CHAR port operation now returns either #F, 0, or 1, as if it
was a call to WRITE-SUBSTRING with a one-char string.
The CHAR-READY? port operation and the INPUT-PORT/CHAR-READY?
procedure no longer accept a second "interval" argument. Handling of
the timeout interval is instead implemented directly in the
CHAR-READY? procedure.
Strings are always considered to be encoded using ISO-8859-1.
The parser-buffer datatype has been widened to handle all Unicode
characters.
All ports now support the FRESH-LINE operation, which is implemented
as a layer on top of the supplied operations. Similarly, the
PEEK-CHAR, DISCARD-CHAR, and new UNREAD-CHAR operations are
implemented for all ports.
End-of-file objects now have an associated port.
RUN-SHELL-COMMAND and RUN-SYNCHRONOUS-SUBPROCESS now accept a keyword
argument LINE-ENDING, which replaces the old options
INPUT-LINE-TRANSLATION and OUTPUT-LINE-TRANSLATION.
Transcript support has been moved into the core port abstraction.
Consequently, it is no longer necessary to encapsulate a port in order
to get transcript support. Encapsulated ports have been eliminated,
as this was their only use.
The procedures OPEN-TCP-STREAM-SOCKET, OPEN-UNIX-STREAM-SOCKET,
SUBPROCESS-I/O-PORT, and TCP-SERVER-CONNECTION-ACCEPT have changed
their argument structure. All arguments dealing with buffer size and
line translation have been eliminated. In the new implementation, the
buffer size is fixed, and handling of line endings is changed by
calling PORT/SET-LINE-ENDING.
The following variables have been eliminated:
CHANNEL-WRITE-CHAR-BLOCK
CHANNEL-WRITE-STRING-BLOCK
ENCAPSULATED-PORT/PORT
ENCAPSULATED-PORT/STATE
ENCAPSULATED-PORT?
GUARANTEE-ENCAPSULATED-PORT
INPUT-PORT/CHANNEL
INPUT-PORT/COPY
INPUT-PORT/CUSTOM-OPERATION
INPUT-PORT/OPERATION
INPUT-PORT/OPERATION
INPUT-PORT/OPERATION-NAMES
INPUT-PORT/STATE
MAKE-ENCAPSULATED-PORT
MAKE-GENERIC-INPUT-PORT
MAKE-GENERIC-OUTPUT-PORT
MAKE-I/O-PORT
MAKE-INPUT-PORT
MAKE-OUTPUT-PORT
MATCH-UTF8-CHAR-IN-ALPHABET
OUTPUT-PORT/CHANNEL
OUTPUT-PORT/COPY
OUTPUT-PORT/CUSTOM-OPERATION
OUTPUT-PORT/OPERATION
OUTPUT-PORT/OPERATION
OUTPUT-PORT/OPERATION-NAMES
OUTPUT-PORT/STATE
PATHNAME-END-OF-LINE-STRING
PATHNAME-NEWLINE-TRANSLATION
SET-ENCAPSULATED-PORT/STATE!
SET-INPUT-PORT/STATE!
SET-OUTPUT-PORT/STATE!
The following port operations have been eliminated:
BUFFERED-INPUT-CHARS
BUFFERED-OUTPUT-CHARS
CHARS-REMAINING
DISCARD-CHAR
DISCARD-CHARS
FRESH-LINE
INPUT-BUFFER-SIZE
OUTPUT-BUFFER-SIZE
PEEK-CHAR
READ-STRING
REST->STRING
SET-INPUT-BUFFER-SIZE
SET-OUTPUT-BUFFER-SIZE
\f
To do:
* locking
* column tracking
* convert parser from peek/discard to read/unread
* [?] integrate parser-buffer support (port.scm/input.scm)
* change buffer I/O ports to handle line endings as needed
Change arg structure of:
char-ready? port operation
input-port/char-ready?
make-generic-i/o-port
make-input-buffer
make-output-buffer
open-tcp-stream-socket
open-unix-stream-socket
subprocess-i/o-port
tcp-server-connection-accept
Renamed variables:
os/default-end-of-line-translation => default-line-ending
os/file-end-of-line-translation => file-line-ending
New variables:
channel-has-input?
channel-write-byte-block
condition-type:char-decoding-error
condition-type:char-encoding-error
condition-type:not-8-bit-char
console-i/o-port?
eof-object-port
error:char-decoding
error:char-encoding
error:not-8-bit-char
guarantee-wide-substring
input-port/read-external-substring
input-port/read-wide-substring
input-port/unread-char
match-parser-buffer-char-in-alphabet
match-parser-buffer-char-in-alphabet-no-advance
match-parser-buffer-char-not-in-alphabet
match-parser-buffer-char-not-in-alphabet-no-advance
match-parser-buffer-char-not-in-set
match-parser-buffer-char-not-in-set-no-advance
output-port/write-external-substring
output-port/write-wide-substring
port/coding
port/line-ending
port/set-coding
port/set-line-ending
port=?
set-channel-port!
unread-char
wide-string->parser-buffer
wide-substring
wide-substring->parser-buffer
New port operations:
coding
line-ending
read-external-substring
read-wide-substring
set-coding
set-line-ending
write-external-substring
write-wide-substring
30 files changed: