[squeak-dev] UTF-8

Philippe Marschall philippe.marschall at gmail.com
Sun Mar 29 14:49:44 UTC 2009


2009/3/29 Janko Mivšek <janko.mivsek at eranova.si>:
> Philippe Marschall pravi:
>
>>> Look at this Aida/Scribo  multilingual demo served from Squeak image:
>>> http://demo.bioskop.fr/wiki/wiki.html, see specially Japanese and
>>> Russian  text. Even Japanese urls are working correctly:
>>> http://demo.bioskop.fr/wiki/%E3%83%86%E3%82%B9%E3%83%88.html
>>
>> That's just external representation, that tells absolutely nothing
>> about internal representation and the implementation. I could easily
>> the the same result on a Squeak 3.7.
>
> For this you need WideStrings and proper UTF-8 converter.

No you don't. You just need to emit the right bytes. The simplest way
to achive this is return 1:1 what was inserted. This works well as
long as you don't need any String semantics. This is for example what
DabbleDB does.

> Does Squeak 3.7 has that?
>
>>> About leading character, I even don't know what is that, except in
>>> theory. That is, I never encounter this character as a problem when
>>> porting Aida and its i8n support to Squeak.
>>
>> How can you seriously say everything is working fine when in practice
>> you can't say what is happening and don't know how Strings and
>> Characters work in Squeak? I find that quite dubious hyping.
>
> Not hype at all but pure reality. And coming from country where we
> already need Unicode characters above 256, you can be sure that I know
> what I'm talking about.

Then tell us what leadingChar you use. And tell us how you address the
issue that #= takes the leadingChar into account.

> If there would be some problem, I would be the
> first encountering it.

No, as I said as long as you're just outputting the input you won't.

> But there are no problems with Unicode strings
> prepared by Aida, so why should I bother? This is like a premature
> optimization for me.

What, getting semantics of #= right is premature optimization? Having
a working String protocol is premature optimization?

> Note also that Masashi Umezawa, a Japanese guy, made a preview and few
> modifications to Aida to work well with Japanese writing, in all aspects
> from Urls to the content. Because of his work I'm therefore even more
> sure that we did the Unicode support right!

Then tell us how it works and how it addresses the leadingChar issues
outlined in this thread.

Cheers
Philippe



More information about the Squeak-dev mailing list