[Vm-dev] Unix VM path encodings

Andreas Raab andreas.raab at gmx.de
Sun Dec 30 07:32:04 UTC 2007


Hi -

Due to a bug reported against Qwaq Forums I needed to look into how the 
Unix VM encodes file and path names and got terribly confused. My test 
case was to create a file with an Umlaut("Jürgen") and to see what both 
Squeak and the Unix shell reports with varying settings of -pathenc and 
-textenc.

I started with the assumption that since the file system I was running 
this on is UTF-8 the default settings (-textenc MacRoman and -pathenc 
UTF-8) ought to be correct. However, the result was very surprising. The 
file name was reported incorrectly both in the file list as well as by 
the OS - the file list reported "J?" (truncated after the question mark) 
and the Unix shell reported "J?rgen" but with a "funky ?" (the glyph is 
hard to describe without a screenshot; it was neither an umlaut nor a 
regular question mark).

Playing with the settings I could not find any combination that resulted 
in a consistent representation for all the different views - either the 
Unix shell was off or Squeak's view was off no matter how I set those 
encodings. Can someone explain to me how I need to set these values to 
get a consistent view on file names both from Squeak and Unix?

Cheers,
   - Andreas


More information about the Vm-dev mailing list