[squeak-dev] Re: how to create an UTF-8 character
Andreas Raab
andreas.raab at gmx.de
Sun Sep 28 17:45:00 UTC 2008
Yoshiki Ohshima wrote:
> For that kind of web applications and servers that deals with stuff
> outside of Squeak, it doesn't serve a good purpose, because editting,
> displaying etc. are out of scope. Needless to say, the original idea
> was to make Squeak to be the dynamic, interactive, multilingualized,
> environment so there is mismatch. Web applications etc. historically
> comes after the goal.
Which wouldn't be a problem if the code was able to handle the data
properly. Unfortunately, the effects of an "invalid" leading char are
very, very strange (everything from crashing the scanner to raising
weird errors in comparisons, character access etc). As it stands, an
application that uses non-Latin characters off the web is best off by
keeping everything in UTF-8.
BTW, one way to deal with this properly is by providing a leading char
upon input conversion (i.e., utf8ToSqueak would then insert the proper
leading chars for each character). As a matter of fact, I think this is
what Unicode class>>value: should do (instead of substituting the
environmental leading char).
> If you need to retain these extra information, sending the strings
> without going through UTF-8 conversion makes more sense.
Or provide it via additional attributes. I still think that language
information would best be modeled by a text attribute - in which case we
have a plain Unicode implementation for strings as well as the ability
to provide the disambiguation in text where required.
Cheers,
- Andreas
More information about the Squeak-dev
mailing list
|