Unicode support

Lawson English english at primenet.com
Wed Sep 22 21:56:47 UTC 1999


agree at carltonfields.com said:

>What does this have to do with Strings?
> > What we do here is use utf-8/unicode for internal storage and > use 
that
>as a pivot to convert among all the other > encodings.  You can  generally
>resolve the issues with > unicode  code point conflicts by  then
specifying a
>font with > the correct glyphs for the language you  are trying to >
render. 
>How that information might be associated with  a > given String still has
>me thinking and so far I don't have a > good  solution. 
>
>And you can continue to do that with GeneralStrings, or define yourself a
>subclass that does this automatically for you. 
>

Thereby making Squeak useful only where roman characters or at least
mono-directional characters are the norm.

Take Hebrew, which is actually a *BI*-directional language -alpha
characters go right to left, while the standard Hindu-Arabic numbering
system is still left-to-right- requiring that each individual programmer
localize Squeak for Israel would be silly.

At the very least, a meta-string convention must be defined to provide
bi-directional string-handling, and better still, a convention must be
defined to allow arbitrary character handling.

Consider languages where *TRI*-glyphs are common, such as Korean, or where
the visual ordering of the printed word is different than the textual
ordering, such as Sanskrit and Hindu-Urdi (I believe).

The multi-language issues have been hashed out several times: GX Typography
led to Taligent Typography and ATSUI. Taligent Typography led to the
international text-handling routines in Java.

Check out 

<http://developer.apple.com/techpubs/mac/GXTypography/GXTypography-2.html>

for more info on GX typography.

It's a rather large manual, but it addresses just about EVERY international
text issue that you can imagine, from the simplest to the most complicated
DTP application.


While it might be a tad ambitious to expect Squeak text-handling to attain
the level of GX-level typography, certainly an attempt has to be made to
understand the issues, and the GX typography manual covers them better than
anything else that I have seen.

-------------------------------------------------------------------------
Lawson English. Squeak, snore, etc.
Check out <http://www.squeak.org>
-------------------------------------------------------------------------





More information about the Squeak-dev mailing list