[squeak-dev] MultiCharacterScanner>addCharToPresentation: and conversion to pre-composed unicode code points

Sun Sep 22 17:50:08 UTC 2013

As I understand it, MultiCharacterScanner is transforming a String of
decomposed unicode into a string of pre-composed unicode code points, with
help of UnicodeCompositionStream.
It store the result in presentation.

As I understand it, this was necessary because some keyboard/vm do produce
such decomposed sequences.
I presume this once helped measuring and displaying those codes with fonts
having only pre-composed codes.

First remark, this is a pity that the base character comes first, before
the diacritical.
This forces the composition algorithm to look ahead.
We can't change it, it's a standard, but I wonder the motivation for such
ordering...
Ref: http://www.unicode.org/standard/principles.html

Second remark, transforming unicodes sequence to a canonical form is not
only useful for measuring/displaying text.
It's usefull for comparing strings (for equality, for collation, ...)
So the transformation could happen somewhere else than at display time.
Unicode define standard ways to do it, and bad news,
UnicodeCompositionStream is not conforming.
Ref: https://en.wikipedia.org/wiki/Unicode_equivalence

Third remark, I wonder if this composition is really necessary at all for
measuring/displaying.
Doesn't unicode fonts provide special kerning pairs for those diacriticals?
I couldn't find good references on this one...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20130922/b9ac2e18/attachment.htm