Chris Hanson [Tue, 14 Feb 2017 05:17:52 +0000 (21:17 -0800)]
Major refactor to minimize size of character sets.
Chris Hanson [Mon, 13 Feb 2017 10:12:36 +0000 (02:12 -0800)]
Eliminate unused binding.
Chris Hanson [Mon, 13 Feb 2017 10:12:24 +0000 (02:12 -0800)]
Fix typos in previous change.
Chris Hanson [Sun, 12 Feb 2017 22:12:59 +0000 (14:12 -0800)]
Change is-X-of from compound to parametric predicates.
Chris Hanson [Sun, 12 Feb 2017 20:13:32 +0000 (12:13 -0800)]
Rewrite unparser to pass context rather than use parameters.
Also eliminate unparser-table abstraction.
Chris Hanson [Sun, 12 Feb 2017 09:25:56 +0000 (01:25 -0800)]
Reduce the size of character sets by computing the old format on demand.
Chris Hanson [Sun, 12 Feb 2017 06:06:50 +0000 (22:06 -0800)]
Change printer to be smarter about when quoting is needed.
Chris Hanson [Sun, 12 Feb 2017 05:51:34 +0000 (21:51 -0800)]
Add some additional useful character sets.
Chris Hanson [Sun, 12 Feb 2017 05:50:52 +0000 (21:50 -0800)]
Fix bug: missed package name change in cold load.
Chris Hanson [Sun, 12 Feb 2017 05:31:04 +0000 (21:31 -0800)]
Allow conjoin and disjoin to be used with unregistered predicates.
Chris Hanson [Sun, 12 Feb 2017 01:21:13 +0000 (17:21 -0800)]
Add tables for CWCF, CWL, and CWU.
Chris Hanson [Sun, 12 Feb 2017 01:20:17 +0000 (17:20 -0800)]
Change code generator for boolean sets to use standard names.
Chris Hanson [Sun, 12 Feb 2017 00:41:07 +0000 (16:41 -0800)]
Rename ucd-table-glue to ucd-glue.
Chris Hanson [Sun, 12 Feb 2017 00:37:10 +0000 (16:37 -0800)]
Change pattern-white-space to pattern-whitespace for consistency.
Chris Hanson [Sat, 11 Feb 2017 23:42:52 +0000 (15:42 -0800)]
Rename port/char-set to textual-port-char-set.
Make it work on all textual ports and default to iso-8859-1.
Chris Hanson [Sat, 11 Feb 2017 23:37:47 +0000 (15:37 -0800)]
Add character sets to textual ports.
This will help the printer decide what characters it should emit.
Chris Hanson [Sat, 11 Feb 2017 22:41:01 +0000 (14:41 -0800)]
Implement char-set:unicode.
Chris Hanson [Sat, 11 Feb 2017 22:40:18 +0000 (14:40 -0800)]
Implement unicode-char-code?.
Chris Hanson [Sat, 11 Feb 2017 22:39:47 +0000 (14:39 -0800)]
Clean up char->digit and digit->char.
Chris Hanson [Sat, 11 Feb 2017 21:56:03 +0000 (13:56 -0800)]
Implement digit-value.
Chris Hanson [Sat, 11 Feb 2017 21:03:44 +0000 (13:03 -0800)]
Change generated tables to use characters instead of integers.
Chris Hanson [Sat, 11 Feb 2017 21:02:57 +0000 (13:02 -0800)]
Rename "WSpace" full name to "whitespace".
Chris Hanson [Sat, 11 Feb 2017 20:39:25 +0000 (12:39 -0800)]
Remove timestamp from generated files.
It forces a new check-in when nothing else has changed.
Chris Hanson [Sat, 11 Feb 2017 08:32:54 +0000 (00:32 -0800)]
Change implementation of #\<char> to show all "graphic" characters.
This isn't quite right -- it doesn't support Unicode very well -- but will do
for now.
Chris Hanson [Sat, 11 Feb 2017 08:32:12 +0000 (00:32 -0800)]
Fix bug: use atom delimiters instead of symbol-constituents.
Proper handling of parser character sets needs review.
Chris Hanson [Sat, 11 Feb 2017 07:52:59 +0000 (23:52 -0800)]
Implement proper handling of symbol quoting and case folding in parser.
Disallows use of | in symbols except at beginning and end.
Disallows use of \ in symbols unless in ||.
Chris Hanson [Sat, 11 Feb 2017 07:52:19 +0000 (23:52 -0800)]
Implement char-{down,fold,up}case-full and use in ustring.
Chris Hanson [Sat, 11 Feb 2017 06:42:30 +0000 (22:42 -0800)]
Use correct case-folding algorithm for symbols.
Chris Hanson [Sat, 11 Feb 2017 06:40:58 +0000 (22:40 -0800)]
Change ustring implementation to simplify to 8-bit legacy strings.
This was happening anyway given the previous definition of char-ascii?.
Chris Hanson [Sat, 11 Feb 2017 06:06:34 +0000 (22:06 -0800)]
Fix char-ascii? to be 7-bit instead of 8.
Also create char-8-bit?.
Chris Hanson [Sat, 11 Feb 2017 05:20:28 +0000 (21:20 -0800)]
Fix bug: typo meant value of utfX->string was wrong.
Also, consistently use the char decoding procedures.
Chris Hanson [Sat, 11 Feb 2017 04:54:35 +0000 (20:54 -0800)]
Character case mappers should preserve the bits.
Chris Hanson [Sat, 11 Feb 2017 04:40:57 +0000 (20:40 -0800)]
Fix parser case-folding to use ustring-foldcase.
Chris Hanson [Sat, 11 Feb 2017 04:40:46 +0000 (20:40 -0800)]
Implement char-foldcase and ustring-foldcase.
Also fix implementations of ustring-{up,down}case.
Chris Hanson [Sat, 11 Feb 2017 04:39:03 +0000 (20:39 -0800)]
Add tables and support for case folding and string case conversion.
Chris Hanson [Fri, 10 Feb 2017 08:14:02 +0000 (00:14 -0800)]
Use non-pointer hash tables for UCD tables.
Chris Hanson [Fri, 10 Feb 2017 08:11:39 +0000 (00:11 -0800)]
Implement non-pointer hash tables.
These are like strong eq? hash tables but they don't rehash after gc.
Chris Hanson [Fri, 10 Feb 2017 08:03:24 +0000 (00:03 -0800)]
Implement much smarter code generation for UCD tables.
New generator generates character sets for binary-valued properties.
For code-point valued properties, it uses fixnum hash tables.
It also uses fixnum hash tables for the numeric-type property.
The end result of this is a considerable reduction in code size.
Chris Hanson [Fri, 10 Feb 2017 06:18:45 +0000 (22:18 -0800)]
Add header and explanatory comment to names.
Chris Hanson [Fri, 10 Feb 2017 06:14:53 +0000 (22:14 -0800)]
Add metadata to all of the XML properties.
Chris Hanson [Thu, 9 Feb 2017 08:12:52 +0000 (00:12 -0800)]
Correctly implement character case conversions and R7RS char sets.
Chris Hanson [Thu, 9 Feb 2017 08:10:50 +0000 (00:10 -0800)]
Optimize the ucd tables a bit.
Need to reconsider the boolean tables, which will be smaller and might be faster
as char sets.
Chris Hanson [Thu, 9 Feb 2017 07:47:57 +0000 (23:47 -0800)]
Change the ucd converter to store raw prop files in a standard place.
These files are being checked in, so it shouldn't be necessary to regenerate
them until the UCD is updated to a new version.
Chris Hanson [Wed, 8 Feb 2017 08:27:07 +0000 (00:27 -0800)]
Fix typo in previous change.
Chris Hanson [Wed, 8 Feb 2017 08:21:45 +0000 (00:21 -0800)]
Implement "computed" character sets.
Also define Unicode symbol characters.
Chris Hanson [Wed, 8 Feb 2017 06:29:17 +0000 (22:29 -0800)]
Add value conversions to the UCD property code generator.
This translates the string values into something more sensible for Scheme.
Chris Hanson [Wed, 8 Feb 2017 04:39:08 +0000 (20:39 -0800)]
Implement char-general-category.
Chris Hanson [Wed, 8 Feb 2017 04:35:19 +0000 (20:35 -0800)]
Add in the first Unicode property table: gc.
Chris Hanson [Wed, 8 Feb 2017 04:34:37 +0000 (20:34 -0800)]
Change the way boot inits work to accomodate packages with multiple files.
Chris Hanson [Wed, 8 Feb 2017 04:30:02 +0000 (20:30 -0800)]
Refactor both the stratifier and the code generator.
The stratifier now avoids the use of bit strings and just manipulates the ranges
appropriately as it groups them. At the end it expands all the ranges so that
the nodes have minimum structure. The code generator was modified to accept the
new input form.
The code generator has been changed to put all the terminal nodes at the
beginning of the table, and to hash-cons new non-terminal nodes. It turns out
that there was a lot of duplication in the nodes, so this saves a bunch of
space.
Chris Hanson [Wed, 8 Feb 2017 04:23:41 +0000 (20:23 -0800)]
Fix nasty bug: modifying a hash table could scramble its buckets.
Chris Hanson [Tue, 7 Feb 2017 05:49:15 +0000 (21:49 -0800)]
Fix bug: typo broke linear dispatch coding.
Chris Hanson [Mon, 6 Feb 2017 05:39:36 +0000 (21:39 -0800)]
Some efficiency and layout improvements.
Chris Hanson [Mon, 6 Feb 2017 05:38:02 +0000 (21:38 -0800)]
Change pp to treat all define-FOO symbols like define.
Chris Hanson [Mon, 6 Feb 2017 04:50:22 +0000 (20:50 -0800)]
Fix bug: root definition had wrong arguments.
Chris Hanson [Mon, 6 Feb 2017 03:49:17 +0000 (19:49 -0800)]
A bunch of cleanups to code generator.
Chris Hanson [Mon, 6 Feb 2017 02:59:11 +0000 (18:59 -0800)]
Initial implementation of UCD converter.
Chris Hanson [Sat, 4 Feb 2017 21:39:29 +0000 (13:39 -0800)]
Fix bug in ttyio that causes premature exit on pipe/file input.
Matt Birkholz [Sat, 4 Feb 2017 00:38:41 +0000 (17:38 -0700)]
Use a large heap to build the system with LIAR/svm on a 32bit host.
The default heap (4096Kw) is exhausted compiling xml-parser.bin.
Matt Birkholz [Sat, 4 Feb 2017 00:17:13 +0000 (17:17 -0700)]
svm: Quiet warnings about access.
Matt Birkholz [Sat, 4 Feb 2017 00:13:49 +0000 (17:13 -0700)]
compiler/base/crsend.scm: Use a compiled compress procedure ASAP.
The interpreted compress is terribly slow.
Matt Birkholz [Fri, 3 Feb 2017 23:56:48 +0000 (16:56 -0700)]
Exit with non-zero status when Aborting!: out of memory...
...in --batch-mode. This is basically
93d3d5c, which was mistakenly
undone by
85c1fb4 because it assumed the abort resulted in an error
that would stop the REPL. Signaling an error after the restart and
cleanup is... tricky... so just %exit.
Matt Birkholz [Fri, 3 Feb 2017 20:23:28 +0000 (13:23 -0700)]
Undo
4e9e832; choose fixnum/bignum ops for u32s at compile-time.
This avoids irritating LIAR/i386 which signals an obscure error when
compiling (fix:<= object #xFFFFFFFF).
Matt Birkholz [Fri, 3 Feb 2017 18:51:24 +0000 (11:51 -0700)]
microcode/boot.c (BLOCKS_TO_BYTES): Incorrect name.
Chris Hanson [Fri, 3 Feb 2017 01:38:33 +0000 (17:38 -0800)]
Merge branch 'master' of git.sv.gnu.org:/srv/git/mit-scheme
Chris Hanson [Fri, 3 Feb 2017 01:37:59 +0000 (17:37 -0800)]
Reorganize and curate standard Scheme indentation rules.
Matt Birkholz [Thu, 2 Feb 2017 17:11:36 +0000 (10:11 -0700)]
Close-binary-input-port did not close its input buffer.
Matt Birkholz [Wed, 1 Feb 2017 07:30:11 +0000 (00:30 -0700)]
tests/unit-testing.scm: Add expectation to assert-error failure.
Recently bytevector-u8-ref did not signal a range error but returned a
random value. The failure report only said "value <random>". Now it
also includes the expected condition type(s).
Matt Birkholz [Wed, 1 Feb 2017 07:27:45 +0000 (00:27 -0700)]
Suppress 100+ useless pass 1 warnings about missing externs files.
Matt Birkholz [Wed, 1 Feb 2017 07:17:13 +0000 (00:17 -0700)]
svm: Make fixnum->integer instruction work with TC_FALSE fixnums.
Bytevectors store their length with TC_FALSE(?). Use the
FIXNUM_TO_LONG from liarc.h which does not assume TC_FIXNUM.
Chris Hanson [Tue, 31 Jan 2017 05:20:12 +0000 (21:20 -0800)]
Update XML code to use Unicode strings throughout.
I need this to be able to read the Unicode Character Database.
Chris Hanson [Tue, 31 Jan 2017 03:15:43 +0000 (19:15 -0800)]
Fix bug: ranges aren't necessarily code points.
Matt Birkholz [Tue, 31 Jan 2017 01:39:32 +0000 (18:39 -0700)]
svm: typo
Matt Birkholz [Tue, 31 Jan 2017 00:33:40 +0000 (17:33 -0700)]
Undo
d7f390f now that LIAR/svm is compiling constants properly(?).
Matt Birkholz [Tue, 31 Jan 2017 00:31:22 +0000 (17:31 -0700)]
svm: Fix handling of machine-constants that are larger than 32bits.
Matt Birkholz [Tue, 31 Jan 2017 00:26:39 +0000 (17:26 -0700)]
svm: Stub out bogus rtl:constant-cost copied from i386.
Matt Birkholz [Tue, 31 Jan 2017 00:21:19 +0000 (17:21 -0700)]
svm: Remove imports from (cross-reference).
Matt Birkholz [Mon, 30 Jan 2017 18:47:27 +0000 (11:47 -0700)]
Replace unbound ascii-char? with char->... stolen from LIAR/x86-64.
Matt Birkholz [Mon, 30 Jan 2017 17:52:00 +0000 (10:52 -0700)]
Fix infinite string input ports; add missing increment.
Chris Hanson [Mon, 30 Jan 2017 09:42:20 +0000 (01:42 -0800)]
Rework the UTF-8 codecs:
* Allow any scalar value to be used, as required by Unicode.
* Implement strict decoding as described in Unicode document.
* Change test cases to match new behavior.
Chris Hanson [Mon, 30 Jan 2017 09:41:13 +0000 (01:41 -0800)]
Change bucky-bit prefixes to prefer upper-case for output.
Also make sure that upper-case is accepted when case-folding is off.
Chris Hanson [Mon, 30 Jan 2017 09:40:19 +0000 (01:40 -0800)]
Implement char->scalar-value.
Chris Hanson [Mon, 30 Jan 2017 04:42:28 +0000 (20:42 -0800)]
Update documentation for param:parser-fold-case?.
Chris Hanson [Mon, 30 Jan 2017 04:41:20 +0000 (20:41 -0800)]
Implement #!fold-case and #!no-fold-case.
Chris Hanson [Mon, 30 Jan 2017 03:16:35 +0000 (19:16 -0800)]
Fix bug: ustrings may be equal but still have different type codes.
Also simplify implementations of eqv? and equal?, and remove eqv? handling of
empty vectors.
Chris Hanson [Mon, 30 Jan 2017 03:12:05 +0000 (19:12 -0800)]
Change string printer to generate R7RS-compatible strings.
Chris Hanson [Mon, 30 Jan 2017 03:08:41 +0000 (19:08 -0800)]
Change parser to respect fold-case? in various places.
Chris Hanson [Mon, 30 Jan 2017 03:00:38 +0000 (19:00 -0800)]
Change some of the parser's parameter names:
* Rename param:parser-canonicalize-symbols? to param:parser-fold-case?.
* Rename param:parser-enable-file-attributes-parsing? to
param:parser-enable-attributes?.
* Eliminate unnecessary *parser-enable-file-attributes-parsing?*
and *parser-keyword-style*.
* Change port properties to eliminate *...* and use new names.
Chris Hanson [Mon, 30 Jan 2017 02:40:53 +0000 (18:40 -0800)]
Refactor the character set abstraction:
* Clarify the use of "code point" versus "scalar value".
* Rename well-formed-scalar-value-list? to code-point-list? and broaden its
scope to allow characters, strings, and character sets.
* Rename scalar-values->char-set to char-set* and broaden its domain to include
any code-point-list?.
* Rename char-set->scalar-values to char-set->code-points.
* Implement char-in-set? which is char-member? with the args reversed. This
makes it consistent with scalar-value-in-char-set?. Deprecate char-member?.
* Implement char-set-union* and char-set-intersection*.
* Eliminate all of the "alphabet" names which are obsolete.
* Eliminate guarantee-char-set and error:not-char-set.
Chris Hanson [Mon, 30 Jan 2017 02:39:57 +0000 (18:39 -0800)]
Add substring indices to prefix/suffix tests.
Also simplify the implementations and fix a thinko in the suffix
implementations.
Chris Hanson [Mon, 30 Jan 2017 02:06:21 +0000 (18:06 -0800)]
Rewrite the character-name support to support unicode and case folding.
Also simplify the code a bit.
Chris Hanson [Mon, 30 Jan 2017 02:06:02 +0000 (18:06 -0800)]
Use boot inits in char.scm.
Chris Hanson [Mon, 30 Jan 2017 02:02:38 +0000 (18:02 -0800)]
Adjust tests to match changes to unicode-scalar-value?.
Also add checks of unicode-code-point?.
Chris Hanson [Mon, 30 Jan 2017 01:56:53 +0000 (17:56 -0800)]
Fix implementation of unicode-scalar-value? to not exclude non-characters.
Also implement unicode-code-point?.
Chris Hanson [Mon, 30 Jan 2017 01:53:36 +0000 (17:53 -0800)]
Implement \x<hex>; syntax for strings.
Chris Hanson [Sun, 29 Jan 2017 08:50:20 +0000 (00:50 -0800)]
Implement #\x... syntax for characters.
Chris Hanson [Sun, 29 Jan 2017 08:42:13 +0000 (00:42 -0800)]
Eliminate char->ascii and ascii->char, which were misnomers.
Change char-ascii? to be true only for 7-bit chars. Also change char-ascii? to
return a boolean and implement ascii-char?.
Chris Hanson [Sun, 29 Jan 2017 06:00:21 +0000 (22:00 -0800)]
Fix bug: would-block value only returned if nothing has been read.
Chris Hanson [Sun, 29 Jan 2017 04:26:35 +0000 (20:26 -0800)]
Simplify logic for printing generic I/O ports.
Chris Hanson [Sat, 28 Jan 2017 23:38:50 +0000 (15:38 -0800)]
Upgrade compound-predicate implementation with latest from book.
Also clean up the initialization sequence.