birchwood-abbey.net Git - mit-scheme.git/commit

author	Chris Hanson <org/chris-hanson/cph>
	Tue, 16 Mar 2021 05:05:25 +0000 (22:05 -0700)
committer	Chris Hanson <org/chris-hanson/cph>
	Sat, 10 Apr 2021 21:42:40 +0000 (14:42 -0700)
commit	23756ac2eb18256f8b8c6c3ec45310ea70073132
tree	3bcb2bf275f2e0fe58655ba562ca40fca0268199	tree \| snapshot
parent	146d6bd372b2f31abde89b2fc0859e8cb68b186b	commit \| diff

Implement grapheme/word-break changes for UCD 13.

This is a complete reimplementation of the segmentation code, since the old
model wasn't able to cope with the recent changes.  There are a couple of
problems remaining:

1. The evolver interface was designed to do incremental generation of breaks.
The new design doesn't permit that, since it implements an NFA with speculative
branches.  It could be changed to do the breaks in batches when speculations
collapse into certainties, but it is certainly simpler to accept all the breaks
at once.

2. The speculative branches are somewhat wasteful: many of them have identical
prefixes, which means we're updating several branches in parallel rather than
having a shared prefix and splitting branches only when necessary.  I'm working
on an optimization that will take care of this.

src/runtime/ucd-grapheme.scm		diff \| blob \| history
src/runtime/ucd-segmentation.scm		diff \| blob \| history
src/runtime/ucd-word.scm		diff \| blob \| history