[Vm-dev] Re: Unix VM path encodings

John M McIntosh johnmci at smalltalkconsulting.com
Sun Dec 30 10:45:48 UTC 2007


On Dec 30, 2007, at 2:11 AM, Yoshiki Ohshima wrote:

>
>> Hm ... lemme try this ... ah, interesting. It appears
>> that I can make the Umlauts work on Unix correctly if and only if:
>> * I fix the above method to return UTF8TextConverter in every case  
>> [*1]
>> * I use -pathenc MacRoman -textenc MacRoman
>> Which makes no sense to me since neither the path nor the text  
>> encoding
>> is MacRoman but it appears to work. Huh?
>
>  Yes, on Unix VM, another historical mishappen caused it; "MacRoman"
> still means "no conversion" so that if the image passes UTF-8 string,
> the UTF-8 string is passed to system calls.

Er, well I'm not sure that's quite accurate? In looking at  
sqUnixCharConv.c  it seems to say that if the
text encoding is macroman and the path encoding is macroman the
translation from unix path to squeak would be macroman to macroman so  
nothing would happen.

Convert(sq,ux, Path, sqTextEncoding, uxPathEncoding, 1, 0);	//  
normalised paths for HFS+
Convert(ux,sq, Path, uxPathEncoding, sqTextEncoding, 0, 0);

in
sqInt dir_Lookup(char *pathString, sqInt pathStringLength, sqInt index,
/* outputs: */  char *name, sqInt *nameLength, sqInt *creationDate,  
sqInt *modificationDate,
		sqInt *isDirectory, squeakFileOffsetType *sizeIfFile)

we find
  *nameLength= ux2sqPath(dirEntry->d_name, nameLen, name, MAXPATHLEN,  
0);


However I note the carbon vm does
   if (norm) // HFS+ imposes Unicode2.1 decomposed UTF-8 encoding on  
all path elements
     CFStringNormalize(str, kCFStringNormalizationFormD); // canonical  
decomposition
   else
     CFStringNormalize(str, kCFStringNormalizationFormC); // pre- 
combined

but the unix VM does not do this, which I think is an error based on:

See
	From: 	tetha at st.rim.or.jp
	Subject: 	[Vm-dev] Patch for filename normalization of mac vm

	Date: 	March 11, 2007 8:21:06 PM PDT (CA)

	To: 	vm-dev at lists.squeakfoundation.org

> Hi,
>
> I've found the latest mac vm (or recent version) fails to normalize  
> UTF file name.
> It seems to be the function convertChars() of  
> sqMacUnixFileInterface.c, which normalizes only decompose when  
> converting squeak string to unix, but I think it needs pre-combined  
> when unix string to squeak, and I noticed normalization form should  
> be canonical (exactly should be kCFStringNormalizationFormC) for pre- 
> combined.
>
> Patch (diff format of xcode tool) for this problem is attached to  
> this mail.
>
> Regards,
> --
> Tetsuya HAYASHI, tetha at st.rim.or.jp, tetha at mac.com




PS I note if you feed CFStringCreateWithBytes  bad data, why it  
returns NULL, then the lurking CFStringCreateMutableCopy core dumps  
the VM.  That's why I check for it in the carbon vm. Normally you  
won't see
this issue unless you get creative...
--
= 
= 
= 
========================================================================
John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
= 
= 
= 
========================================================================





More information about the Squeak-dev mailing list