[squeak-dev] Re: how to create an UTF-8 character

Philippe Marschall philippe.marschall at gmail.com
Sun Sep 28 17:59:48 UTC 2008


2008/9/28, Andreas Raab <andreas.raab at gmx.de>:
> Yoshiki Ohshima wrote:
>>   For that kind of web applications and servers that deals with stuff
>> outside of Squeak, it doesn't serve a good purpose, because editting,
>> displaying etc. are out of scope.  Needless to say, the original idea
>> was to make Squeak to be the dynamic, interactive, multilingualized,
>> environment so there is mismatch.  Web applications etc. historically
>> comes after the goal.
>
> Which wouldn't be a problem if the code was able to handle the data
> properly. Unfortunately, the effects of an "invalid" leading char are
> very, very strange (everything from crashing the scanner to raising
> weird errors in comparisons, character access etc). As it stands, an
> application that uses non-Latin characters off the web is best off by
> keeping everything in UTF-8.
>
> BTW, one way to deal with this properly is by providing a leading char
> upon input conversion (i.e., utf8ToSqueak would then insert the proper
> leading chars for each character). As a matter of fact, I think this is
> what Unicode class>>value: should do (instead of substituting the
> environmental leading char).
>
>>   If you need to retain these extra information, sending the strings
>> without going through UTF-8 conversion makes more sense.
>
> Or provide it via additional attributes. I still think that language
> information would best be modeled by a text attribute - in which case we
> have a plain Unicode implementation for strings as well as the ability
> to provide the disambiguation in text where required.

+1

Cheers
Philippe



More information about the Squeak-dev mailing list