[squeak-dev] Re: how to create an UTF-8 character

Andreas Raab andreas.raab at gmx.de
Sat Sep 27 17:14:39 UTC 2008


Philippe Marschall wrote:
> 2008/9/27 stephane ducasse <stephane.ducasse at free.fr>:
>> do I understand correctly that such a aString is a sequence of unicode
>> codepoints?
> 
> Plus leading char. If you look at UTF8TextConverter it will give every
> incoming character with an index higher than 255 the language of the
> image. I don't need to explain why this is problematic in the context
> of a web application, do I?

Actually, it *is* worthwhile to explain this. The problem is that since 
UTF-8 doesn't have the notion of a leading char there is no way to tag 
incoming data correctly. The leading char will be taken from the running 
image, so an image running in the US (like our servers) will tag input 
coming from Chinese browsers as Latin1. In these situations the leading 
char isn't just useless, it is actively misleading.

Cheers,
   - Andreas



More information about the Squeak-dev mailing list