[Vm-dev] Re: Unix VM path encodings
John M McIntosh
johnmci at smalltalkconsulting.com
Sun Dec 30 10:45:48 UTC 2007
On Dec 30, 2007, at 2:11 AM, Yoshiki Ohshima wrote:
>
>> Hm ... lemme try this ... ah, interesting. It appears
>> that I can make the Umlauts work on Unix correctly if and only if:
>> * I fix the above method to return UTF8TextConverter in every case
>> [*1]
>> * I use -pathenc MacRoman -textenc MacRoman
>> Which makes no sense to me since neither the path nor the text
>> encoding
>> is MacRoman but it appears to work. Huh?
>
> Yes, on Unix VM, another historical mishappen caused it; "MacRoman"
> still means "no conversion" so that if the image passes UTF-8 string,
> the UTF-8 string is passed to system calls.
Er, well I'm not sure that's quite accurate? In looking at
sqUnixCharConv.c it seems to say that if the
text encoding is macroman and the path encoding is macroman the
translation from unix path to squeak would be macroman to macroman so
nothing would happen.
Convert(sq,ux, Path, sqTextEncoding, uxPathEncoding, 1, 0); //
normalised paths for HFS+
Convert(ux,sq, Path, uxPathEncoding, sqTextEncoding, 0, 0);
in
sqInt dir_Lookup(char *pathString, sqInt pathStringLength, sqInt index,
/* outputs: */ char *name, sqInt *nameLength, sqInt *creationDate,
sqInt *modificationDate,
sqInt *isDirectory, squeakFileOffsetType *sizeIfFile)
we find
*nameLength= ux2sqPath(dirEntry->d_name, nameLen, name, MAXPATHLEN,
0);
However I note the carbon vm does
if (norm) // HFS+ imposes Unicode2.1 decomposed UTF-8 encoding on
all path elements
CFStringNormalize(str, kCFStringNormalizationFormD); // canonical
decomposition
else
CFStringNormalize(str, kCFStringNormalizationFormC); // pre-
combined
but the unix VM does not do this, which I think is an error based on:
See
From: tetha at st.rim.or.jp
Subject: [Vm-dev] Patch for filename normalization of mac vm
Date: March 11, 2007 8:21:06 PM PDT (CA)
To: vm-dev at lists.squeakfoundation.org
> Hi,
>
> I've found the latest mac vm (or recent version) fails to normalize
> UTF file name.
> It seems to be the function convertChars() of
> sqMacUnixFileInterface.c, which normalizes only decompose when
> converting squeak string to unix, but I think it needs pre-combined
> when unix string to squeak, and I noticed normalization form should
> be canonical (exactly should be kCFStringNormalizationFormC) for pre-
> combined.
>
> Patch (diff format of xcode tool) for this problem is attached to
> this mail.
>
> Regards,
> --
> Tetsuya HAYASHI, tetha at st.rim.or.jp, tetha at mac.com
PS I note if you feed CFStringCreateWithBytes bad data, why it
returns NULL, then the lurking CFStringCreateMutableCopy core dumps
the VM. That's why I check for it in the carbon vm. Normally you
won't see
this issue unless you get creative...
--
=
=
=
========================================================================
John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================
More information about the Squeak-dev
mailing list
|