Chris Hanson [Sun, 23 Apr 2017 04:18:21 +0000 (21:18 -0700)]
Update to reflect minor changes to string operations.
Much more work is needed to recraft this chapter to cover immutability.
Chris Hanson [Sun, 23 Apr 2017 04:18:04 +0000 (21:18 -0700)]
Export mutable/immutable predicates.
Chris Hanson [Sun, 23 Apr 2017 04:15:24 +0000 (21:15 -0700)]
In substring, only return arg string if it's in NFC.
Chris Hanson [Sun, 23 Apr 2017 04:12:59 +0000 (21:12 -0700)]
Change substring? to call string->nfc on its arguments.
Chris Hanson [Sun, 23 Apr 2017 04:08:26 +0000 (21:08 -0700)]
Change string-match and string-search to require NFC inputs.
This is because comparison requires that the strings be in the same
normalization form, and these procedures return indices into the strings. We
can't normalize them after the call, because then the returned indices will
refer to strings that are potentially different from the arguments.
Since nearly all strings are in NFC by default this should not be a serious
drawback.
Additionally, the -ci versions of these procedures have been eliminated,
basically for the same reason. If the caller needs that functionality they
should call string-foldcase themselves.
Note that this doesn't affect comparisons that don't return indices.
Chris Hanson [Sun, 23 Apr 2017 03:41:11 +0000 (20:41 -0700)]
Change default result of string-builder to be NFC.
* Eliminate string-canonical-foldcase since string-foldcase now returns NFC.
* Don't return NFC strings from list->string and vector->string, instead return
verbatim strings.
Chris Hanson [Sun, 23 Apr 2017 03:01:15 +0000 (20:01 -0700)]
Merge branch 'master' of git.sv.gnu.org:/srv/git/mit-scheme
Chris Hanson [Sun, 23 Apr 2017 01:45:49 +0000 (18:45 -0700)]
Redefine substring as different from string-copy.
They are different in only one respect: string-copy always returns a mutable
string, while substring always returns an immutable string.
Chris Hanson [Sun, 23 Apr 2017 01:17:37 +0000 (18:17 -0700)]
Convert list->string, vector->string to use string-builder.
Chris Hanson [Sun, 23 Apr 2017 01:14:39 +0000 (18:14 -0700)]
Fix call to string-builder that was missed.
Chris Hanson [Sun, 23 Apr 2017 00:54:10 +0000 (17:54 -0700)]
Simplify string, string*, string-append, string-append*.
Chris Hanson [Sun, 23 Apr 2017 00:53:53 +0000 (17:53 -0700)]
Fix typo causing memory corruption.
Taylor R Campbell [Sat, 22 Apr 2017 14:27:44 +0000 (14:27 +0000)]
Fix typo.
XXX Obviously this needs an automatic test!
From mejja.
Chris Hanson [Sat, 22 Apr 2017 07:20:30 +0000 (00:20 -0700)]
Change string-copy to return legacy string only if arg is also legacy.
Chris Hanson [Sat, 22 Apr 2017 07:17:19 +0000 (00:17 -0700)]
Move NFC marking from canonical-composition to string->nfc.
Chris Hanson [Sat, 22 Apr 2017 07:05:56 +0000 (00:05 -0700)]
Significantly simplify string-builder.
* Eliminate options; now just optional buffer-length.
* Result type is specified at build rather than up front.
* Eliminate never-exported make-string-builder.
Chris Hanson [Fri, 21 Apr 2017 23:48:44 +0000 (16:48 -0700)]
Change string->nfc to return immutable value, and optimize a bit.
Chris Hanson [Fri, 21 Apr 2017 23:48:03 +0000 (16:48 -0700)]
Support TEST environment variable in "make check".
Also clean up output slightly.
Chris Hanson [Fri, 21 Apr 2017 23:22:11 +0000 (16:22 -0700)]
string->nfd: also convert mutable strings already in NFD.
Chris Hanson [Fri, 21 Apr 2017 23:03:18 +0000 (16:03 -0700)]
Change string->nfd to return immutable value.
Chris Hanson [Fri, 21 Apr 2017 22:33:19 +0000 (15:33 -0700)]
Change builder options to distinguish between mutable and legacy results.
Chris Hanson [Fri, 21 Apr 2017 22:04:17 +0000 (15:04 -0700)]
Rearrange and optimize. Also make ustring1 be zero-terminated.
Chris Hanson [Fri, 21 Apr 2017 22:03:49 +0000 (15:03 -0700)]
Mark ignored binding.
Chris Hanson [Fri, 21 Apr 2017 07:22:29 +0000 (00:22 -0700)]
Change Edwin's implementation of strings to work for all "string-ish" types.
Chris Hanson [Fri, 21 Apr 2017 07:21:41 +0000 (00:21 -0700)]
Add tagging support for unicode-string.
Also generate better error for unknown type codes.
Chris Hanson [Fri, 21 Apr 2017 07:21:14 +0000 (00:21 -0700)]
Change string primitives to uniformly support all "string-ish" types.
Chris Hanson [Fri, 21 Apr 2017 05:32:27 +0000 (22:32 -0700)]
Change string-builder to generate immutable strings by default.
Also fix bug in string->list assumed mutable inputs.
Chris Hanson [Thu, 20 Apr 2017 06:00:54 +0000 (23:00 -0700)]
Now that legacy string has the same layout as ustring1, merge handling of both.
Chris Hanson [Thu, 20 Apr 2017 00:44:44 +0000 (17:44 -0700)]
Allow string operations to take Unicode strings with 1 byte per CP.
Chris Hanson [Wed, 19 Apr 2017 05:18:24 +0000 (22:18 -0700)]
Change string comparisons to normalize to NFC prior to comparing.
The procedures that return index values have not been updated since it's not
obvious what to do with them. Comparison is meaningless for non-normalized
strings, so it's necessary that all comparisons be done between normalized
strings. This means either (a) require compared strings to be normalized before
calling the comparator, or (b) have the comparator do normalization on the
arguments. If (b) is chosen, then the returned index value will be wrong in the
case where the arguments aren't normalized, as it will refer to the normalized
strings, not the arguments.
I'm considering choosing (b) and changing the definitions of these procedures to
return a slice into the normalized strings instead of an index. However, the
upcoming implementation of immutable strings may make it simple for every
immutable string to be normalized, which may make (a) feasible.
For now I'm going to ignore this, which is fine as long as only ASCII strings
are compared.
Chris Hanson [Wed, 19 Apr 2017 04:57:52 +0000 (21:57 -0700)]
Rewrite string-builder for performance.
Chris Hanson [Wed, 19 Apr 2017 04:25:03 +0000 (21:25 -0700)]
Rewrite string copying for performance.
Chris Hanson [Wed, 19 Apr 2017 03:17:47 +0000 (20:17 -0700)]
More refactoring of unicode-string layout.
Taylor R Campbell [Tue, 18 Apr 2017 18:59:01 +0000 (18:59 +0000)]
Teach top-level clean target to clean tools too.
Chris Hanson [Mon, 17 Apr 2017 04:49:40 +0000 (21:49 -0700)]
A round of small changes in preparation for supporting immutable strings.
Chris Hanson [Mon, 17 Apr 2017 03:17:43 +0000 (20:17 -0700)]
Implement compiler support for new primitives.
Chris Hanson [Mon, 17 Apr 2017 02:08:22 +0000 (19:08 -0700)]
Change Unicode strings to store flag in type bits of length.
Chris Hanson [Mon, 17 Apr 2017 02:08:12 +0000 (19:08 -0700)]
D'oh! Hook up printer to new string type.
Chris Hanson [Mon, 17 Apr 2017 01:47:37 +0000 (18:47 -0700)]
Implement primitives to read and write type/datum of object in memory.
Chris Hanson [Mon, 17 Apr 2017 01:47:28 +0000 (18:47 -0700)]
Return end-index of TO from bytevector-copy!.
Taylor R Campbell [Sat, 15 Apr 2017 18:57:34 +0000 (18:57 +0000)]
No need for X in the liarc bootstrap build.
Taylor R Campbell [Sat, 15 Apr 2017 18:55:48 +0000 (18:55 +0000)]
Splice shell arguments with ${1+"$@"}.
Leave as "${@}" only where it is absolutely obvious there must be at
least one parameter anyway, e.g. because it is a full command line.
Chris Hanson [Fri, 14 Apr 2017 05:19:05 +0000 (22:19 -0700)]
Fix bug: primitive-byte-ref returns a fixnum, not a raw number.
Also clean up and reorganize open-coding of memory references.
Chris Hanson [Fri, 14 Apr 2017 05:18:57 +0000 (22:18 -0700)]
Fix typo.
Chris Hanson [Thu, 13 Apr 2017 06:21:29 +0000 (23:21 -0700)]
Change unicode string representation to be more compact and flexible.
The new design is more densely coded and provides for immutable strings with
different coding, as well as memoization of NFC/NFD status. However, in this
change only the standard 3-byte mutable representation is implemented.
Chris Hanson [Thu, 13 Apr 2017 05:24:20 +0000 (22:24 -0700)]
Implement select-on-bytes-per-word for gnerating word-length-specific code.
Chris Hanson [Thu, 13 Apr 2017 05:23:52 +0000 (22:23 -0700)]
Eliminate condition for open-coding integer->char.
Chris Hanson [Thu, 13 Apr 2017 05:23:28 +0000 (22:23 -0700)]
Make sure that unicode strings are self-evaluating.
Chris Hanson [Thu, 13 Apr 2017 04:18:27 +0000 (21:18 -0700)]
Strip down code generated for primitive memory references.
Chris Hanson [Wed, 12 Apr 2017 05:35:10 +0000 (22:35 -0700)]
Implement open-coding of byte-ref primitives.
Chris Hanson [Wed, 12 Apr 2017 05:34:32 +0000 (22:34 -0700)]
Implement more primitive refs, and restrict to pointers only.
Chris Hanson [Wed, 12 Apr 2017 04:46:43 +0000 (21:46 -0700)]
Fix compilation issue.
Chris Hanson [Wed, 12 Apr 2017 04:46:38 +0000 (21:46 -0700)]
Implement allocate-nm-vector.
Chris Hanson [Wed, 12 Apr 2017 04:21:07 +0000 (21:21 -0700)]
Allocate new type unicode-string.
Chris Hanson [Wed, 12 Apr 2017 04:20:41 +0000 (21:20 -0700)]
Implement bytes-per-object.
Chris Hanson [Mon, 10 Apr 2017 04:08:57 +0000 (21:08 -0700)]
Eliminate unused multi-byte procedures.
No need to support a bunch of code that may never be used.
Chris Hanson [Sat, 1 Apr 2017 05:17:20 +0000 (22:17 -0700)]
Add 'copy? option to string-builder.
Chris Hanson [Fri, 31 Mar 2017 04:31:39 +0000 (21:31 -0700)]
Merge branch 'master' of git.sv.gnu.org:/srv/git/mit-scheme
Chris Hanson [Fri, 31 Mar 2017 04:30:55 +0000 (21:30 -0700)]
Fix bug: string output port must copy input strings.
Chris Hanson [Thu, 30 Mar 2017 06:31:37 +0000 (23:31 -0700)]
Fix bugs: typos caught by the macos compiler.
Chris Hanson [Wed, 29 Mar 2017 05:17:35 +0000 (22:17 -0700)]
Add documentation for a few of the more recent string procedures.
Chris Hanson [Wed, 29 Mar 2017 05:02:22 +0000 (22:02 -0700)]
Fix string-for-primitive: it wasn't handling slices.
Chris Hanson [Wed, 29 Mar 2017 04:57:20 +0000 (21:57 -0700)]
Optimize string-in-nfX? since it's important that these be fast.
Chris Hanson [Wed, 29 Mar 2017 04:52:44 +0000 (21:52 -0700)]
Normalize strings prior to hashing so equivalent sequences hash the same.
I've arbitrarily chosen NFD because its faster than NFC, but a case could be
made that NFC is preferable.
Chris Hanson [Wed, 29 Mar 2017 03:15:11 +0000 (20:15 -0700)]
Eliminate Hangul Jamo from canonical cm/dm tables.
This makes the bands about 1 MB smaller.
Chris Hanson [Wed, 29 Mar 2017 01:16:07 +0000 (18:16 -0700)]
Implement algorithmic Hangul Jamo compose/decompose.
Chris Hanson [Tue, 28 Mar 2017 06:47:03 +0000 (23:47 -0700)]
Fix code-generation bug in fast-division.
Apparently this code was insufficiently tested.
Chris Hanson [Mon, 27 Mar 2017 03:59:27 +0000 (20:59 -0700)]
Change NFC_QC to be a boolean-valued table and exploit that.
Chris Hanson [Mon, 27 Mar 2017 03:46:57 +0000 (20:46 -0700)]
Have string builder track max code point written.
This is used for two distinct purposes in the finisher.
Chris Hanson [Sun, 26 Mar 2017 23:12:04 +0000 (16:12 -0700)]
Change string-builder to normalize to NFC by default.
Chris Hanson [Sun, 26 Mar 2017 20:50:46 +0000 (13:50 -0700)]
Change symbols to be in NFC.
Chris Hanson [Sun, 26 Mar 2017 20:45:13 +0000 (13:45 -0700)]
Working NFC implementation.
Chris Hanson [Sat, 25 Mar 2017 22:19:56 +0000 (15:19 -0700)]
Initial draft of NFC support; still need to write composition.
Chris Hanson [Sat, 25 Mar 2017 22:19:21 +0000 (15:19 -0700)]
Add NFC_QC and Comp_EX tables.
Chris Hanson [Mon, 20 Mar 2017 03:22:29 +0000 (20:22 -0700)]
Synthesize canonical-dm table and use it to speed up decomposition.
Chris Hanson [Mon, 20 Mar 2017 00:53:51 +0000 (17:53 -0700)]
Fix bug in canonical-ordering algorithm.
Chris Hanson [Mon, 20 Mar 2017 00:53:25 +0000 (17:53 -0700)]
Refactor test to make it easier to see the failures.
Chris Hanson [Mon, 20 Mar 2017 00:52:38 +0000 (17:52 -0700)]
Boost default stack size -- I'm tired of blowing out the stack.
Chris Hanson [Sun, 19 Mar 2017 20:20:31 +0000 (13:20 -0700)]
D'oh! String normalization tests were broken, which explains why they pass.
Chris Hanson [Sun, 19 Mar 2017 08:16:22 +0000 (01:16 -0700)]
Squeeze a little more space out of the tables.
Chris Hanson [Sun, 19 Mar 2017 08:03:54 +0000 (01:03 -0700)]
Implement decomposition-type table and use it for correct NFD conversion.
Chris Hanson [Sun, 19 Mar 2017 03:49:04 +0000 (20:49 -0700)]
Further compress the size of the UCD tables.
As of this latest set of changes the total size seems in the range of a megabyte
or so, which is much better than the 4-5 megabytes of earlier revisions.
Chris Hanson [Sun, 19 Mar 2017 03:46:59 +0000 (20:46 -0700)]
Add a bunch of converters to/from bytevectors.
Chris Hanson [Sun, 19 Mar 2017 02:47:29 +0000 (19:47 -0700)]
Fix some bugs in vector->string.
Chris Hanson [Sun, 19 Mar 2017 02:34:17 +0000 (19:34 -0700)]
Add hack to force printing chars in old format; can be eliminated after 9.3.
Chris Hanson [Sun, 19 Mar 2017 02:13:29 +0000 (19:13 -0700)]
More simplification.
Chris Hanson [Sun, 19 Mar 2017 02:08:25 +0000 (19:08 -0700)]
Simplify parse-atom to not fold case.
Chris Hanson [Sun, 19 Mar 2017 00:08:31 +0000 (17:08 -0700)]
Use ucd-X-value directly in ustring.
Chris Hanson [Sat, 18 Mar 2017 21:34:38 +0000 (14:34 -0700)]
Convert all of the UCD tables to use bitwise tries.
Chris Hanson [Sat, 18 Mar 2017 21:34:15 +0000 (14:34 -0700)]
Rework the character parser to handle backslash reasonably.
Chris Hanson [Sat, 18 Mar 2017 04:41:18 +0000 (21:41 -0700)]
Add u16/u32 equivalents to bytevector.
Chris Hanson [Wed, 15 Mar 2017 05:49:00 +0000 (22:49 -0700)]
Add draft of inversion-map code generator.
Chris Hanson [Mon, 13 Mar 2017 01:57:45 +0000 (18:57 -0700)]
Update explanation of HIGH range.
Chris Hanson [Mon, 13 Mar 2017 01:53:53 +0000 (18:53 -0700)]
Rename "signal" to "inversion list" since that's the accepted name.
Chris Hanson [Sat, 11 Mar 2017 09:12:25 +0000 (01:12 -0800)]
Change normalization test to use characters instead of integers.
Chris Hanson [Sat, 11 Mar 2017 09:10:01 +0000 (01:10 -0800)]
Speed up reading of #\x... characters.
Chris Hanson [Sat, 11 Mar 2017 08:42:21 +0000 (00:42 -0800)]
Use string-builder instead of call-with-output-string.
Chris Hanson [Sat, 11 Mar 2017 08:34:39 +0000 (00:34 -0800)]
Implement test case for string->nfd.
Chris Hanson [Fri, 10 Mar 2017 07:37:19 +0000 (23:37 -0800)]
Fix symbols using now-illegal syntax.
Chris Hanson [Fri, 10 Mar 2017 07:07:23 +0000 (23:07 -0800)]
Rewrite parser so that it supports Unicode input.