[Vm-dev] Mac and Unix file/directory/clipboard interface?

John M McIntosh johnmci at smalltalkconsulting.com
Sun Jun 3 05:00:33 UTC 2007

ok the mac carbon vm, and I believe with the unix os-x vm  let you  
specify what format the file/directory/drag-drop information is in .

By default the os-x carbon vm uses macroman because of issues with  
the file list dialog and how it assumes it knows what the file/ 
directory  names should be translated in various
version of Squeak.

For Sophie we use UTF8, Plopp I think they use UTF8, Scratch I  
believe is MacRoman

I'll note from http://en.wikipedia.org/wiki/UTF-8

The Mac OS X Operating System uses canonically decomposed Unicode,  
encoded using UTF-8 for file names in the filesystem.
So saying it's UTF8 is well not quite all the picture when it comes  
to UTF8.

In early May I applied some fixes to the Mac Carbon VM to address  
issues with pre-composed versus canonically decomposed Unicode UTF8  
translation based on suggestions from
Tetsuya Hayashi and further testing.

> 			sqMacUnixFileInterface.c		Tetsuya HAYASHI, tetha at st.rim.or.jp,  
> tetha at mac.com  I've found the latest mac vm (or recent version)  
> fails to normalize UTF file name.
> 										It seems to be the function convertChars() of  
> sqMacUnixFileInterface.c, which normalizes only decompose when  
> converting squeak string to unix,
> 										but I think it needs pre-combined when unix string to  
> squeak, and I noticed normalization form should be canonical  
> (exactly should be
> 										 kCFStringNormalizationFormC) for pre-combined.

I cannot say if this is also an issue with the unix VM.

As for the clipboard the old primitives assume macroman. The extended  
os-x clipboard plugin lets you pass any character format you wish  
based on mime-type.  Should that be
text, utf-8, utf-32, utf-16 or RTF? mmm no perhaps TIFF/PNG or JPEG

On Jun 2, 2007, at 9:34 PM, Andreas Raab wrote:

> Hi Folks -
> Since I just went through all of this, can someone explain to me  
> what string encoding the Unix and Mac VMs use for interfacing the  
> file, directory and clipboard functions? If these are all UTF-8  
> based (which I suspect) then should we just define that *all*  
> strings passed to the VM are to be interpreted as UTF-8 and any VM  
> or function that doesn't deal with UTF-8 correctly is considered  
> broken and needs fixing? It strikes me as a nice, elegant solution  
> to solve this problem once and forever.
> Comments, anyone?
> Cheers,
>   - Andreas

John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com

John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com

More information about the Vm-dev mailing list