FFI wide-character type?

John M McIntosh johnmci at smalltalkconsulting.com
Fri Aug 4 07:09:21 UTC 2006


Sigh, too much backward/forward compatibility here.

a) In Sophie most of the interesting testable conversion calls  
convert from/to UTF16 for access to mac system api. That appears to  
work and
of course in reviewing this I see the converter uses  
nextPut:toStream: to deal with the BOM and stuff 16 bits as needed  
for each character.

b) When I use foo convertToWithConverter:  
(MacRomanUnicodeTextConverter new), where foo contains characters  
that are unicode 32 after conversion I get a WideString

Ah, but when I say 'abc' convertToWithConverter:  
(MacRomanUnicodeTextConverter new),  why I get the ByteString 'abc'

I'll mutter things at this point.

Ok it seems that on the
WriteStream>>nextPut:
	<primitive: 66>
	((collection class == ByteString) and: [
		anObject isCharacter and:[anObject isOctetCharacter not]]) ifTrue: [
			collection _ (WideString from: collection).
			^self nextPut: anObject.

Oh, how clever, if the primitive fails it looks to see if the  
collection we're writing to is a ByteString, if so and it's a  
character that is > 255 why lets convert everything to
a WideString. Then of course my testing using data which would result  
in a a character mapped > 255 always produces a WideString.

Mmm ok, what if I  make
String>>convertFromToWideStringWithConverter: converter

	| readStream writeStream c |
	readStream _ self readStream.
	writeStream _ WideString new writeStream.
	converter ifNil: [^ self].
	[readStream atEnd] whileFalse: [
		c _ converter nextFromStream: readStream.
		c ifNotNil: [writeStream nextPut: c] ifNil: [^ writeStream contents]
	].
	^ writeStream contents

Fine, lets test, oops fails... I get a String back.

How curious, let see
writeStream contents invokes

WideString>>copyFrom: start to: stop

	| n |
	n _ super copyFrom: start to: stop.
	n isOctetString ifTrue: [^ n asOctetString].
	^ n.

Which invokes isOctetString
which cheerfully scans the entire string to see if any character  
values are > 255 if not then
why it converts the WideString we have into a String and returns that  
how clever but total breaks what I want to happen.

Mmm

Fine, I'm sure there is a reason for all this, but I'd rather keep my  
WideString as a WideString not have it compressed to a String as side  
effects of working on it.

So create a class UTF32String for lack of a better name.
Add this method
isOctetString
	^false

Go back and change

convertFromToWideStringWithConverter:
to say
convertFromToUTF32StringWithConverter:
and alter one line
writeStream _ UTF32String new writeStream.

Then we get a UTF32 wide string, that stays as a wide string, and  
sending asByteArray to my 'abcd' example gets us a 16 byte object.

I'll ask for comment. I

'm sure now I'll sit up and think of the side effects in Sophie about  
*thinking* I've converted things to a WideString, yet it's silently  
in most cases just a String.


PS MacRomanUnicodeTextConverter is a converter we added for Sophie  
that does macroman to unicode, versus the illl named  
MacRomanTextConverter which does
conversion from macroman to something else (latin1?)


On 3-Aug-06, at 7:21 PM, Ron Teitelbaum wrote:

> I found a solution that allows me to use ascii characters.
>
> The problem I was having is that I had a squeak string 'MY' that I  
> needed to
> send to an FFI call which was expecting a wide character.  In C++ I  
> just
> needed to do L"MY" to get it to work (MY represents the specific  
> store that
> I want opened in this case my personal certificates).  When I tried  
> the code
> wideStringMangled := string convertFromWithConverter:
> (Latin1TextConverter new).  I received a string that was the same  
> as the
> regular 'MY' string in squeak.  So that didn't help any.  My  
> attempts to
> fake the system out didn't work either.  I tried:
>
> 'L\"MY\"'
> 'L"MY"'
> 'LMLY'
> "LMLYL\0"
>
> But nothing satisfied the dll that I was passing in a wide string.  I
> finally found a define in the dll that allowed me to use an ascii  
> string.
> That was a long battle.
>
> What is really needed is a patch to FFI for wide characters.  At  
> least this
> problem is solved for now (unless you want to access a store that  
> has a
> Japanese name!).
>
> Thanks for your help!!
>
> Ron Teitelbaum
--
======================================================================== 
===
John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===





More information about the Squeak-dev mailing list