[squeak-dev] MultiCharacterScanner>addCharToPresentation: and conversion to pre-composed unicode code points

Bert Freudenberg bert at freudenbergs.de
Mon Sep 23 12:41:43 UTC 2013


On 2013-09-23, at 14:10, Bert Freudenberg <bert at freudenbergs.de> wrote:

> 
> On 2013-09-22, at 19:50, Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> wrote:
> 
>> As I understand it, MultiCharacterScanner is transforming a String of decomposed unicode into a string of pre-composed unicode code points, with help of UnicodeCompositionStream.
>> It store the result in presentation.
>> 
>> As I understand it, this was necessary because some keyboard/vm do produce such decomposed sequences.
>> I presume this once helped measuring and displaying those codes with fonts having only pre-composed codes.
>> 
>> First remark, this is a pity that the base character comes first, before the diacritical.
>> This forces the composition algorithm to look ahead.
>> We can't change it, it's a standard, but I wonder the motivation for such ordering...
>> Ref: http://www.unicode.org/standard/principles.html
>> 
>> Second remark, transforming unicodes sequence to a canonical form is not only useful for measuring/displaying text.
>> It's usefull for comparing strings (for equality, for collation, ...)
>> So the transformation could happen somewhere else than at display time.
>> Unicode define standard ways to do it, and bad news, UnicodeCompositionStream is not conforming.
>> Ref: https://en.wikipedia.org/wiki/Unicode_equivalence
> 
> Yep.
> 
>> Third remark, I wonder if this composition is really necessary at all for measuring/displaying.
>> Doesn't unicode fonts provide special kerning pairs for those diacriticals?
>> I couldn't find good references on this one...
> 
> 
> This would work if we had the diacriticals in our fonts and if rendering glyphs would take into account kerning info. Neither is the case currently, so the next-best thing was compositing which allows us to use the pre-composed Latin-1 characters.
> 
> Just paste this into Squeak:
> 
> 	A + combining diaeresis: Ä
> 	Precomposed: Ä
> 
> Both look the same in my email client but in Squeak I get: 
> 
> 
> which indicates the presentation thing is not working currently. In case this doesn't make it through via email, the combining diaeresis is Character value: 16r0308.
> 
> - Bert -


... and it appears my email client normalizes before sending. Anyway, try this then:

	{$A. Character value: 16r0308} as: String

if you then copy the result into a word processor it would look okay again.

- Bert -


-------------- next part --------------
Skipped content of type multipart/related


More information about the Squeak-dev mailing list