[Vm-dev] Mac and Unix file/directory/clipboard interface?
John M McIntosh
johnmci at smalltalkconsulting.com
Sun Jun 3 05:00:33 UTC 2007
ok the mac carbon vm, and I believe with the unix os-x vm let you
specify what format the file/directory/drag-drop information is in .
By default the os-x carbon vm uses macroman because of issues with
the file list dialog and how it assumes it knows what the file/
directory names should be translated in various
version of Squeak.
For Sophie we use UTF8, Plopp I think they use UTF8, Scratch I
believe is MacRoman
I'll note from http://en.wikipedia.org/wiki/UTF-8
The Mac OS X Operating System uses canonically decomposed Unicode,
encoded using UTF-8 for file names in the filesystem.
So saying it's UTF8 is well not quite all the picture when it comes
to UTF8.
In early May I applied some fixes to the Mac Carbon VM to address
issues with pre-composed versus canonically decomposed Unicode UTF8
translation based on suggestions from
Tetsuya Hayashi and further testing.
> sqMacUnixFileInterface.c Tetsuya HAYASHI, tetha at st.rim.or.jp,
> tetha at mac.com I've found the latest mac vm (or recent version)
> fails to normalize UTF file name.
> It seems to be the function convertChars() of
> sqMacUnixFileInterface.c, which normalizes only decompose when
> converting squeak string to unix,
> but I think it needs pre-combined when unix string to
> squeak, and I noticed normalization form should be canonical
> (exactly should be
> kCFStringNormalizationFormC) for pre-combined.
I cannot say if this is also an issue with the unix VM.
As for the clipboard the old primitives assume macroman. The extended
os-x clipboard plugin lets you pass any character format you wish
based on mime-type. Should that be
text, utf-8, utf-32, utf-16 or RTF? mmm no perhaps TIFF/PNG or JPEG
On Jun 2, 2007, at 9:34 PM, Andreas Raab wrote:
> Hi Folks -
>
> Since I just went through all of this, can someone explain to me
> what string encoding the Unix and Mac VMs use for interfacing the
> file, directory and clipboard functions? If these are all UTF-8
> based (which I suspect) then should we just define that *all*
> strings passed to the VM are to be interpreted as UTF-8 and any VM
> or function that doesn't deal with UTF-8 correctly is considered
> broken and needs fixing? It strikes me as a nice, elegant solution
> to solve this problem once and forever.
>
> Comments, anyone?
>
> Cheers,
> - Andreas
--
========================================================================
===
John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===
--
========================================================================
===
John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===
More information about the Vm-dev
mailing list