Unicode support

agree at carltonfields.com agree at carltonfields.com
Wed Sep 22 20:59:53 UTC 1999


> Thereby making Squeak useful only where roman characters or at least
> mono-directional characters are the norm.

This has been said several times already.  Why would it be so?  I didn't suggest that Squeak should not support bi-directional languages, only that the GeneralString interface should not provide semantic word-based functionality.  If the operations are not sufficient that the bi-directional support would be possible, it is clearly inadequate.  But noone has said that -- just that it doesn't have a DoWhatIMeanPickTheWordsoutHoweverRepresentedAsIIntendThemToBePicked command.  I consider that simply too hard to write (and to type :-)).

As noted, the present String operation does not have an English language word operation, yet remains substantially useful here in the U.S.A. and in other English-speaking nations.  Why is the failure in a general purpose class to include protocol for an artificial intelligence capable of picking apart German multi-words a failure of design?
 

> Consider languages where *TRI*-glyphs are common, such as > Korean, or where
> the visual ordering of the printed word is different than the textual
> ordering, such as Sanskrit and Hindu-Urdi (I believe).
> > The multi-language issues have been hashed out several times: > GX Typography
> led to Taligent Typography and ATSUI. Taligent Typography led to the
> international text-handling routines in Java.

This discussion seems to have gotten too religious in nature.  I simply have asked the question, "What is String."  Answers along the lines of, "It ought to be this, but that's virtually impossible to do," seems to me nonresponsive to the question.

>While it might be a tad ambitious to expect Squeak text-handling to attain
>the level of GX-level typography, certainly an attempt has to be made to
>understand the issues, and the GX typography manual covers them better than
>anything else that I have seen.

The String type in Squeak isn't capable of English language typography, yet it is highly usable nevertheless.  It provides no word-semantics (except for the spelling support).  Certainly no one can responsibly say it is useless for English language work.  It is, of course, a fair criticism that ASCII cannot support most of the real needs for foreign languages, but this is not to say that a generalized character representation and its string container object needs to do more for those foreign language than ASCII strings do for English.

By the way, English ALSO has colocations of words, which are semantically meaningless when taken solely as word tokens, yet noone seriously is suggesting that the String class must or even should look up words in a dictionary for colocations before tokenization.





More information about the Squeak-dev mailing list