[Vm-dev] Mac and Unix file/directory/clipboard interface?

Andreas Raab andreas.raab at gmx.de
Sun Jun 3 05:48:16 UTC 2007


Oh, how interesting. I had no idea that there is UTF-8 and UTF-8. So 
much for my proposal, I guess ;-)

Cheers,
   - Andreas

John M McIntosh wrote:
> 
> ok the mac carbon vm, and I believe with the unix os-x vm  let you 
> specify what format the file/directory/drag-drop information is in .
> 
> By default the os-x carbon vm uses macroman because of issues with the 
> file list dialog and how it assumes it knows what the file/directory  
> names should be translated in various
> version of Squeak.
> 
> For Sophie we use UTF8, Plopp I think they use UTF8, Scratch I believe 
> is MacRoman
> 
> I'll note from http://en.wikipedia.org/wiki/UTF-8
> 
> The Mac OS X Operating System uses canonically decomposed Unicode, 
> encoded using UTF-8 for file names in the filesystem.
> So saying it's UTF8 is well not quite all the picture when it comes to 
> UTF8.
> 
> In early May I applied some fixes to the Mac Carbon VM to address issues 
> with pre-composed versus canonically decomposed Unicode UTF8 translation 
> based on suggestions from
> Tetsuya Hayashi and further testing.
> 
>>             sqMacUnixFileInterface.c        Tetsuya HAYASHI, 
>> tetha at st.rim.or.jp, tetha at mac.com  I've found the latest mac vm (or 
>> recent version) fails to normalize UTF file name.
>>                                         It seems to be the function 
>> convertChars() of sqMacUnixFileInterface.c, which normalizes only 
>> decompose when converting squeak string to unix,
>>                                         but I think it needs 
>> pre-combined when unix string to squeak, and I noticed normalization 
>> form should be canonical (exactly should be
>>                                          kCFStringNormalizationFormC) 
>> for pre-combined.
> 
> 
> I cannot say if this is also an issue with the unix VM.
> 
> 
> As for the clipboard the old primitives assume macroman. The extended 
> os-x clipboard plugin lets you pass any character format you wish based 
> on mime-type.  Should that be
> text, utf-8, utf-32, utf-16 or RTF? mmm no perhaps TIFF/PNG or JPEG
> 
> 
> On Jun 2, 2007, at 9:34 PM, Andreas Raab wrote:
> 
>> Hi Folks -
>>
>> Since I just went through all of this, can someone explain to me what 
>> string encoding the Unix and Mac VMs use for interfacing the file, 
>> directory and clipboard functions? If these are all UTF-8 based (which 
>> I suspect) then should we just define that *all* strings passed to the 
>> VM are to be interpreted as UTF-8 and any VM or function that doesn't 
>> deal with UTF-8 correctly is considered broken and needs fixing? It 
>> strikes me as a nice, elegant solution to solve this problem once and 
>> forever.
>>
>> Comments, anyone?
>>
>> Cheers,
>>   - Andreas
> 
> -- 
> ===========================================================================
> John M. McIntosh <johnmci at smalltalkconsulting.com>
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ===========================================================================
> 
> 
> 
> 
> -- 
> ===========================================================================
> John M. McIntosh <johnmci at smalltalkconsulting.com>
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ===========================================================================
> 
> 


More information about the Vm-dev mailing list