Oh, how interesting. I had no idea that there is UTF-8 and UTF-8. So much for my proposal, I guess ;-)
Cheers, - Andreas
John M McIntosh wrote:
ok the mac carbon vm, and I believe with the unix os-x vm let you specify what format the file/directory/drag-drop information is in .
By default the os-x carbon vm uses macroman because of issues with the file list dialog and how it assumes it knows what the file/directory names should be translated in various version of Squeak.
For Sophie we use UTF8, Plopp I think they use UTF8, Scratch I believe is MacRoman
I'll note from http://en.wikipedia.org/wiki/UTF-8
The Mac OS X Operating System uses canonically decomposed Unicode, encoded using UTF-8 for file names in the filesystem. So saying it's UTF8 is well not quite all the picture when it comes to UTF8.
In early May I applied some fixes to the Mac Carbon VM to address issues with pre-composed versus canonically decomposed Unicode UTF8 translation based on suggestions from Tetsuya Hayashi and further testing.
sqMacUnixFileInterface.c Tetsuya HAYASHI,
tetha@st.rim.or.jp, tetha@mac.com I've found the latest mac vm (or recent version) fails to normalize UTF file name. It seems to be the function convertChars() of sqMacUnixFileInterface.c, which normalizes only decompose when converting squeak string to unix, but I think it needs pre-combined when unix string to squeak, and I noticed normalization form should be canonical (exactly should be kCFStringNormalizationFormC) for pre-combined.
I cannot say if this is also an issue with the unix VM.
As for the clipboard the old primitives assume macroman. The extended os-x clipboard plugin lets you pass any character format you wish based on mime-type. Should that be text, utf-8, utf-32, utf-16 or RTF? mmm no perhaps TIFF/PNG or JPEG
On Jun 2, 2007, at 9:34 PM, Andreas Raab wrote:
Hi Folks -
Since I just went through all of this, can someone explain to me what string encoding the Unix and Mac VMs use for interfacing the file, directory and clipboard functions? If these are all UTF-8 based (which I suspect) then should we just define that *all* strings passed to the VM are to be interpreted as UTF-8 and any VM or function that doesn't deal with UTF-8 correctly is considered broken and needs fixing? It strikes me as a nice, elegant solution to solve this problem once and forever.
Comments, anyone?
Cheers,
- Andreas
--
John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ===========================================================================
--
John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ===========================================================================