[squeak-dev] Re: how to create an UTF-8 character

Yoshiki Ohshima yoshiki at vpri.org
Sat Sep 27 19:39:50 UTC 2008


At Sat, 27 Sep 2008 10:14:39 -0700,
Andreas Raab wrote:
> 
> Philippe Marschall wrote:
> > 2008/9/27 stephane ducasse <stephane.ducasse at free.fr>:
> >> do I understand correctly that such a aString is a sequence of unicode
> >> codepoints?
> > 
> > Plus leading char. If you look at UTF8TextConverter it will give every
> > incoming character with an index higher than 255 the language of the
> > image. I don't need to explain why this is problematic in the context
> > of a web application, do I?
> 
> Actually, it *is* worthwhile to explain this. The problem is that since 
> UTF-8 doesn't have the notion of a leading char there is no way to tag 
> incoming data correctly. The leading char will be taken from the running 
> image, so an image running in the US (like our servers) will tag input 
> coming from Chinese browsers as Latin1. In these situations the leading 
> char isn't just useless, it is actively misleading.

  For that kind of web applications and servers that deals with stuff
outside of Squeak, it doesn't serve a good purpose, because editting,
displaying etc. are out of scope.  Needless to say, the original idea
was to make Squeak to be the dynamic, interactive, multilingualized,
environment so there is mismatch.  Web applications etc. historically
comes after the goal.

  If you need to retain these extra information, sending the strings
without going through UTF-8 conversion makes more sense.

-- Yoshiki



More information about the Squeak-dev mailing list