[squeak-dev] Re: how to create an UTF-8 character
Andreas Raab
andreas.raab at gmx.de
Mon Sep 29 22:56:23 UTC 2008
Bert Freudenberg wrote:
> A character also encodes a language-tag (a.k.a. leading char) but we all
> seem to agree that's a bad idea, it was done to allow easier migration
> of old code (for many eastern languages a code point and a font is not
> enough for rendering, you also need to know the language).
I wouldn't necessarily call it a bad idea. It is incomplete, for sure,
but it is one of the ways one can deal with this problem. Even though I
prefer having language information in text attributes the language tag
per se wouldn't cause problems if the code would be able to deal with
its absence. E.g., if one could use strings with "just unicode" I
wouldn't mind having the ability to add the language tag for
disambiguation where necessary (issues of equality etc. notwithstanding
which is why I think using text attributes is the better way to go).
The problem is that too much code relies on both the presence as well as
particular values for certain code points and simply breaks if it isn't
filled in "correctly". As such the language tag seems to be mostly
redundant with certain code points. I guess one way to get over this is
to add a preference that leaves out the language tag and just try
running that way to see what and where it breaks.
Cheers,
- Andreas
More information about the Squeak-dev
mailing list
|