Multilingual Squeak

Mark Wai mwai at ibm.net
Tue Mar 23 05:02:43 UTC 1999


At 09:03 PM 3/21/99 +0900, you wrote:

>> > From: "Michael S. Klein" <mklein at alumni.caltech.edu>
>> >
>> > What about comparing two strings?
>> >
>> > If you got two strings each with one character that is the same code  
>> > point in unicode, but the strings are in diferent encodings, are the  
>> > strings equal, or not.
>> >
>> > In other words, for example (from the first Han character in Unicode) 
>> >
>> > is U+4e00  =  G(5027) ?
>> > is G(5027)  = J(1676) ?
>> >
>> > U means Unicode 2.0
>> > G means GB 2312-80
>> > J means JIS X 0208-1990
>
>  They are 'not equal' to each other on the system which I
>think of.  I think this is reasonable.

If you compare the pure unicode value (i.e. convert the character before
comparison), of course, they are equal.  But in reality, they are *not*
equal and they should not be equal because they represent two different
distinct characters.   If you are curious about all these conversion
stuffs, get a multi-lingual character mapping software (e.g. TwinBridge)
and a multi-lingual input device (e.g. PenMaster).  

I think unicode would be a good start if one wants to have multilingual
Squeak because languages like Russian, Japanese and Chinese have 2 bytes
character representation so unicode will come handy.   However, all this
can buy you is in the 'display' level.  If you want to do multilingual 'all
the way' -- a Squeak system that is completely running in a foreign
language (from top to bottom), you got to change the VM and the compiler by
allowing things such as Symbol, method source, CompiledMethod,
MethodDictionary etc., to support at least 2 bytes encoding. It could be
done (although it is lots of work) and I think you might see something like
this come out in the reasonably near future.

Mike and OHSHIMA, I guess most of the readers in this thread will not be
interested in this issue, we can take it off line if you want so that we
don't flood this mail list.






--
Mark Wai
Wator InnoVision





More information about the Squeak-dev mailing list