2008/9/28, Andreas Raab andreas.raab@gmx.de:
Yoshiki Ohshima wrote:
For that kind of web applications and servers that deals with stuff outside of Squeak, it doesn't serve a good purpose, because editting, displaying etc. are out of scope. Needless to say, the original idea was to make Squeak to be the dynamic, interactive, multilingualized, environment so there is mismatch. Web applications etc. historically comes after the goal.
Which wouldn't be a problem if the code was able to handle the data properly. Unfortunately, the effects of an "invalid" leading char are very, very strange (everything from crashing the scanner to raising weird errors in comparisons, character access etc). As it stands, an application that uses non-Latin characters off the web is best off by keeping everything in UTF-8.
BTW, one way to deal with this properly is by providing a leading char upon input conversion (i.e., utf8ToSqueak would then insert the proper leading chars for each character). As a matter of fact, I think this is what Unicode class>>value: should do (instead of substituting the environmental leading char).
If you need to retain these extra information, sending the strings without going through UTF-8 conversion makes more sense.
Or provide it via additional attributes. I still think that language information would best be modeled by a text attribute - in which case we have a plain Unicode implementation for strings as well as the ability to provide the disambiguation in text where required.
Cheers,
- Andreas