Yoshiki Ohshima wrote:
Hm ... lemme try this ... ah, interesting. It appears that I can make the Umlauts work on Unix correctly if and only if:
- I fix the above method to return UTF8TextConverter in every case [*1]
- I use -pathenc MacRoman -textenc MacRoman
Which makes no sense to me since neither the path nor the text encoding is MacRoman but it appears to work. Huh?
Yes, on Unix VM, another historical mishappen caused it; "MacRoman" still means "no conversion" so that if the image passes UTF-8 string, the UTF-8 string is passed to system calls.
Playing around a little it appears as if the Unix VM always converts path names with the assumption that Squeak uses MacRoman in the image and only -pathenc affects the translation between file system and the image (i.e., -textenc has *no* effect on path name translation whatsoever). Can someone confirm this? It would explain why -pathenc MacRoman works (since like you say it's really the "no conversion" flag) if combined with a proper file name converter in the image.
[*1] And that of course reminds me that nobody has really made any comment on why the hell we still deal with all of these nonsensical legacy encodings and don't just go straight to UTF-8 in the VM interface which would simplify *lots* of cruft in the code.
Well, nobody tried to change stuff on the all platforms at once. Windows is doing ok with 3.10 VM and OLPC Etoys image (there is still code that deals with older VM... typical installation for people is to install stuff from squeakland.org and then use Etoys image).
What encoding options are being used on OLPC? Do non-ascii file names, clipboard, drag and drop etc. work on OLPC?
Cheers, - Andreas