Pull Unicode segmentation support out of string and rewrite.
authorChris Hanson <org/chris-hanson/cph>
Tue, 3 Dec 2019 07:44:41 +0000 (23:44 -0800)
committerChris Hanson <org/chris-hanson/cph>
Mon, 9 Dec 2019 09:49:28 +0000 (01:49 -0800)
commite376bc50cc0464783ca6eee480d091b1e5993bfb
tree4067eca2f8d26194fbaba0ce9a9932f8e1342af3
parent6f599cc8227737d533b1cf44e9dbd05d20b3a864
Pull Unicode segmentation support out of string and rewrite.

This is a nearly complete reimplementation, with a simpler and faster DFA,
providing a fold-like interface.

The describing rules are nearly identical to those in UAX #29, which makes them
much easier to write and understand.  Also, there's a debugging feature that
shows how the DFA evolves for a given string.
src/runtime/make.scm
src/runtime/runtime.pkg
src/runtime/string.scm
src/runtime/ucd-grapheme.scm [new file with mode: 0644]
src/runtime/ucd-segmentation.scm [new file with mode: 0644]
src/runtime/ucd-word.scm [new file with mode: 0644]
tests/check.scm
tests/runtime/test-string.scm
tests/runtime/test-ucd-data/segmentation-support.scm [new file with mode: 0644]
tests/runtime/test-ucd-grapheme.scm [new file with mode: 0644]
tests/runtime/test-ucd-word.scm [new file with mode: 0644]