[squeak-dev] Re: how to create an UTF-8 character

Andreas Raab andreas.raab at gmx.de
Mon Sep 29 22:56:23 UTC 2008


Bert Freudenberg wrote:
> A character also encodes a language-tag (a.k.a. leading char) but we all 
> seem to agree that's a bad idea, it was done to allow easier migration 
> of old code (for many eastern languages a code point and a font is not 
> enough for rendering, you also need to know the language).

I wouldn't necessarily call it a bad idea. It is incomplete, for sure, 
but it is one of the ways one can deal with this problem. Even though I 
prefer having language information in text attributes the language tag 
per se wouldn't cause problems if the code would be able to deal with 
its absence. E.g., if one could use strings with "just unicode" I 
wouldn't mind having the ability to add the language tag for 
disambiguation where necessary (issues of equality etc. notwithstanding 
which is why I think using text attributes is the better way to go).

The problem is that too much code relies on both the presence as well as 
particular values for certain code points and simply breaks if it isn't 
filled in "correctly". As such the language tag seems to be mostly 
redundant with certain code points. I guess one way to get over this is 
to add a preference that leaves out the language tag and just try 
running that way to see what and where it breaks.

Cheers,
   - Andreas



More information about the Squeak-dev mailing list