[Vm-dev] Unicode clipboard

John M McIntosh johnmci at smalltalkconsulting.com
Tue May 22 14:56:11 UTC 2007

You're welcome to look at the

in the mac os tree / plugins/ClipboardExtended  to see how we  
extended the clipboard logic for Sophie.

Higher up the extended clipboard class uses mimetype information to  
indicate the data type, at the lower level it's up to the plugin to
determine what for example ioReadClipboardData: clipboard format:  
format  means
where clipboard is a 32bit value (address), and format is a string  

Likely the method that is not clear is the
ioGetClipboardFormat: clipboard formatNumber: formatNumber

on the macintosh you can have an item on the clipboard in many  
formats, such as a string in utf8, utf16, ascii, macroman
The ioGetClipboardFormat: formatNumber: returns each format type  
based on the index number formatNumber.

We used the results of that data  which we converted back to  
mimetypes to decide the best format for reading the clipboard.
Each platform has helper methods to convert the platform format data  
to a mimetype, so for example in windows we had

		at: 49510 put: 'text/rtf' asMIMEType;
		at: 1 put: 'text/plain' asMIMEType; "CF_TEXT"
		at: 2 put: 'image/bmp' asMIMEType; "CF_BITMAP"
		at: 12 put: 'audio/wave' asMIMEType; "CF_WAVE"
		at: 13 put: 'text/unicode' asMIMEType; "CF_UNICODETEXT"
		at: 16 put: 'CF_LOCALE'; "CF_LOCALE"

I will note for Windows we used FFI to make the required calls and  
did not build a plugin.

So for example for textual data we would process either mime types of  
rtf, utf8, unicode, or plain

Later you use the
ioReadClipboardData: clipboard format: format
to actually return the data object.

I'll note for reading unicode on the mac it came across as UTF16 with  
no byte order mark, so our read WideString method that returned  
WideString data did:

	| bytes |
	"utf16 plain text has no bom"

	bytes := self readClipboardData: 'public.utf16-plain-text'.
	^bytes ifNil: [bytes] ifNotNil:
		[bytes asString convertFromWithConverter: (UTF16TextConverter new  
useLittleEndian: (SmalltalkImage current endianness = #little)

on reading we did the following and supplied a byte order mark.

addWideStringClipboardData: aString
	| ba  |

	self clearClipboard.
	ba := aString convertToWithConverter: (UTF16TextConverter new  
useByteOrderMark: true).
	self addClipboardData: ba dataFormat: 'public.utf16-plain-text'

On May 22, 2007, at 3:18 AM, Chris Petsos wrote:

>> Michael Rueger wrote:
>>> Chris Petsos wrote:
>>>> Any quick ideas on how we can handle unicode text from and to the
>>>> system clipboard with Squeak?
>>> There has been some work done in Sophie, currently being  
>>> integrated with
>>> the OLPC image.
>> I'm working only for X11 (linux) with the OLPC.
>> If you want try on Mac or Win32 soon, see System-Clipboard-Extended
>> category in Sophie.
>> - Takashi
>> From what i saw System-Clipboard-Extended package uses UTF  
>> Converters for
> the internal representation of the data.
> The thing is that we are trying to create a VM where the internal
> representation of the characters will be Unicode.
> This means that the VM we use is sending unicode charcodes to the  
> image, we
> use unicode fonts etc...
> So, a UTF interpreted string will not display properly in our  
> image. Unless,
> we use interpreters for our Unicode chars...
> I think we will have to patch the VM again so that the clipboard  
> related
> methods send again unicode streams to the image.
> Don't know which solution of the two is more desirable...
> The related methods that are called when putting to or getting  
> something
> from the clipboard are
>     int clipboardSize(void)
>     int clipboardWriteFromAt(int count, int byteArrayIndex, int  
> startIndex)
>     int clipboardReadIntoAt(int count, int byteArrayIndex, int  
> startIndex)
> in
>     sqWin32Window.c
> Any help on that Diomidis?
> Christos.

John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com

More information about the Vm-dev mailing list