Philippe Marschall wrote:
2008/9/27 stephane ducasse stephane.ducasse@free.fr:
do I understand correctly that such a aString is a sequence of unicode codepoints?
Plus leading char. If you look at UTF8TextConverter it will give every incoming character with an index higher than 255 the language of the image. I don't need to explain why this is problematic in the context of a web application, do I?
Actually, it *is* worthwhile to explain this. The problem is that since UTF-8 doesn't have the notion of a leading char there is no way to tag incoming data correctly. The leading char will be taken from the running image, so an image running in the US (like our servers) will tag input coming from Chinese browsers as Latin1. In these situations the leading char isn't just useless, it is actively misleading.
Cheers, - Andreas