Re: [squeak-dev] Re: how to create an UTF-8 character

28 Sep 2008


      2008/9/28, Andreas Raab andreas.raab@gmx.de:
...
Yoshiki Ohshima wrote:
...
For that kind of web applications and servers that deals with stuff
outside of Squeak, it doesn't serve a good purpose, because editting,
displaying etc. are out of scope.  Needless to say, the original idea
was to make Squeak to be the dynamic, interactive, multilingualized,
environment so there is mismatch.  Web applications etc. historically
comes after the goal.
Which wouldn't be a problem if the code was able to handle the data
properly. Unfortunately, the effects of an "invalid" leading char are
very, very strange (everything from crashing the scanner to raising
weird errors in comparisons, character access etc). As it stands, an
application that uses non-Latin characters off the web is best off by
keeping everything in UTF-8.
BTW, one way to deal with this properly is by providing a leading char
upon input conversion (i.e., utf8ToSqueak would then insert the proper
leading chars for each character). As a matter of fact, I think this is
what Unicode class>>value: should do (instead of substituting the
environmental leading char).
...
If you need to retain these extra information, sending the strings
without going through UTF-8 conversion makes more sense.
Or provide it via additional attributes. I still think that language
information would best be modeled by a text attribute - in which case we
have a plain Unicode implementation for strings as well as the ability
to provide the disambiguation in text where required.
Cheers,

Andreas