[HACK] Unicode keyboard input and fonts

danil osipchuk danil at mtsnet.ru
Mon Jun 19 13:29:19 UTC 2006


Yoshiki-san,

First let me express gratitude for all work you have done for m17n.  
Certainly it is a great work.

>>
>> As for me, stock image and VMs definitely are not enabled for Russian.
>
>   Please keep it in mind that "Unicodizing" and "enable language YYY"
> are different issues (not only in Squeak, but in any systems that
> try to deal with them.).  Current Squeak is Unicodized, but many
> languages are wanting to be implemented.

Yes, I understand this completely and in fact my complain originates from  
an attempt to implement such support for Russian. I succeeded with it but  
I'm not happy with things I had to do for it and how I've done them. It  
was painful all the way down. I think most of the problems are VM-related.

What I mean when asking if we are fully 'unicodized'? I expected that we  
will have a unified approach for text handling in Squeak - all of the text  
inside of squeak is, well, in 'squeak' format (as in various converters  
idioms). But it seems that we have a hybrid of an old charset-aware and  
new unicode implementation and one must be very careful when dealing with  
it. I'm not talking about access to external resource, it is obviously  
right thing to have multiple charset representations for them.
But how about this one:

CP1251ClipboardInterpreter>>fromSystemClipboard: aString

	| result converter |
	
	result := WriteStream on: (String new: aString size).
	converter := CP1251TextConverter new.
	aString do: [:each |
		result nextPut: (converter toSqueak: each macToSqueak) asCharacter.
	].
	
	^ result contents.


Note the #macToSqueak in above. One could argue that clipboard is an  
'external' resource which happen to be in mac encoding and therefore  
deserves special care, but why we actually should ever do things like this  
if we are fully unicode compliant? This trick have been copied from  
someone's else language-environment, I'd probably never come up with this  
on my own. Not all converters for some reason use this macToSqueak  
conversion - this is another interesting point to consider.

About Unix VMs. Please correct me if I'm wrong but is not everyone who did  
implementation for his language modified VM in one way or another? I've  
found that stock VM doesn't work for me (I've tried FreeBSD and Kubuntu) -  
when language is Russian, key chars don't find their way into the image  
(and I tried all reasonable combinations of command-line switches and  
environment variables).
When I'm changing x2sqKey (it is x2sqKeyPlain)  to x2sqKeyInput it starts  
working:

///in sqUnixX11.c
typedef int (*x2sqKey_t)(XKeyEvent *xevt);

static int x2sqKeyPlain(XKeyEvent *xevt);
static int x2sqKeyInput(XKeyEvent *xevt);

static x2sqKey_t x2sqKey= x2sqKeyInput;

I didn't manage to make copy/paste between Unix-VM and outer world work  
(just run out of steam when experimenting with it).

On windows VM I had to modify sqWin32Window.c. When user copies Russian  
text in squeak into clipboard text is being corrupted *unless* current  
keyboard layout is Russian. This happens because windows doesn't know  
anything about locale of the data being copied. So modification in  
sqWin32Window.c:

hLocale = GlobalAlloc(GMEM_MOVEABLE | GMEM_DDESHARE, sizeof(DWORD));
pLocale = (DWORD *) GlobalLock(hLocale);
*pLocale = GetUserDefaultLCID();
GlobalUnlock(hLocale);
SetClipboardData(CF_LOCALE, hLocale);

I'm not sure that everyone will be happy with it, but at least it works  
for Russian.


>   In Squeak, each language requires a few methods get implemented, and
> basically the native speakers need to yell what fonts they want to use
> for their language.
>
>   (By definition of Unicode, there is no single font that can make
> everybody happy for Unicode.  Not only in Squeak, but in any systems
> that try to deal with them.)

This is an interesting point, because I certainly used to think about  
unicode fonts as about something what suites all of the users at once.
I guess that this is where leadingChar comes from (because I still don't  
know what leadingChar is needed for and how to use it correctly)?

>   For example, what font do you want to use Russian.  For performance
> reason, it would be nice that there is a set of bitmap fonts in
> different size that matches the Accuny fonts, and also a TT font for
> some other purposes.

Most of the time I've spent was in the battle with fonts. I did see  
TTCFontReader and it was obviously used by you and others, so it must be  
useful.  It seems that Bert somehow managed to 'hack' it (btw, it was the  
word 'hack' in the heading of Bert's original message what triggered my  
response, because it is the most hacking activity I was ever been involved  
:)). But I didn't manage to do anything with it, so I had to almost  
completely dissect TTFontReader  and to reassemble it so I could read  
ttf-fonts (hence  
http://map.squeak.org/accountbyid/2bf29ca7-cb92-4c16-ae18-6b271117a660/package/2c1a81e1-4e86-40c8-90b5-824adc4263c5).  
TTC sub-hierarchies has gone as a result, so my changes again are not  
compatible with main distribution. I've seen for at least two times people  
asking on the list 'how do I read my Indian or whatever ttf font into the  
image' and nobody answered, so I just did it myself.

The net effect of all above is that I've managed to add support for  
Russian but I've ended with system which doesn't seem to be compatible  
with Japan (for instance) environment both on vm and image levels. It is  
certainly may indicate that I misunderstood concepts, but the fact is:  
adding support for another language is not just a matter of adding of a  
couple of the LanguageEnvoronment derived methods and classes into the  
system.

>
> -- Yoshiki
>
>


	Danil



More information about the Squeak-dev mailing list