Unix VM path encodings

Andreas Raab andreas.raab at gmx.de
Sun Dec 30 12:09:16 UTC 2007


Yoshiki Ohshima wrote:
>> Hm ... lemme try this ... ah, interesting. It appears 
>> that I can make the Umlauts work on Unix correctly if and only if:
>> * I fix the above method to return UTF8TextConverter in every case [*1]
>> * I use -pathenc MacRoman -textenc MacRoman
>> Which makes no sense to me since neither the path nor the text encoding 
>> is MacRoman but it appears to work. Huh?
> 
>   Yes, on Unix VM, another historical mishappen caused it; "MacRoman"
> still means "no conversion" so that if the image passes UTF-8 string,
> the UTF-8 string is passed to system calls.

Playing around a little it appears as if the Unix VM always converts 
path names with the assumption that Squeak uses MacRoman in the image 
and only -pathenc affects the translation between file system and the 
image (i.e., -textenc has *no* effect on path name translation 
whatsoever). Can someone confirm this? It would explain why -pathenc 
MacRoman works (since like you say it's really the "no conversion" flag) 
if combined with a proper file name converter in the image.

>> [*1] And that of course reminds me that nobody has really made any 
>> comment on why the hell we still deal with all of these nonsensical 
>> legacy encodings and don't just go straight to UTF-8 in the VM interface 
>> which would simplify *lots* of cruft in the code.
> 
> Well, nobody tried to change stuff on the all platforms at once.
> Windows is doing ok with 3.10 VM and OLPC Etoys image (there is still
> code that deals with older VM... typical installation for people is to
> install stuff from squeakland.org and then use Etoys image).

What encoding options are being used on OLPC? Do non-ascii file names, 
clipboard, drag and drop etc. work on OLPC?

Cheers,
   - Andreas



More information about the Squeak-dev mailing list