[squeak-dev] Re: how to create an UTF-8 character

Andreas Raab andreas.raab at gmx.de
Sun Sep 28 17:45:00 UTC 2008


Yoshiki Ohshima wrote:
>   For that kind of web applications and servers that deals with stuff
> outside of Squeak, it doesn't serve a good purpose, because editting,
> displaying etc. are out of scope.  Needless to say, the original idea
> was to make Squeak to be the dynamic, interactive, multilingualized,
> environment so there is mismatch.  Web applications etc. historically
> comes after the goal.

Which wouldn't be a problem if the code was able to handle the data 
properly. Unfortunately, the effects of an "invalid" leading char are 
very, very strange (everything from crashing the scanner to raising 
weird errors in comparisons, character access etc). As it stands, an 
application that uses non-Latin characters off the web is best off by 
keeping everything in UTF-8.

BTW, one way to deal with this properly is by providing a leading char 
upon input conversion (i.e., utf8ToSqueak would then insert the proper 
leading chars for each character). As a matter of fact, I think this is 
what Unicode class>>value: should do (instead of substituting the 
environmental leading char).

>   If you need to retain these extra information, sending the strings
> without going through UTF-8 conversion makes more sense.

Or provide it via additional attributes. I still think that language 
information would best be modeled by a text attribute - in which case we 
have a plain Unicode implementation for strings as well as the ability 
to provide the disambiguation in text where required.

Cheers,
   - Andreas



More information about the Squeak-dev mailing list