UTF8 Squeak

Janko Mivšek janko.mivsek at eranova.si
Thu Jun 7 21:49:29 UTC 2007


Hi Yoshiki,

Yoshiki Ohshima wrote:
>>>> 1. internally everything is in 16bit Unicode, without any additionally
>>>>     encoding info attached to strings
>>>   If they use 16-bit per char, how do they deal with surrogated pairs?
>> I looked once again and there is actually a FourByteString too. This 
>> probably answer your question.
> 
>   Probably, yes.
> 
>   So, the question to you is that if you have a system with 8-bit
> ByteString and 32-bit WideString in year 2007, would you add a class
> to represent 16-bit string to that system?

I would say yes, because for most countries 16-bit is enough and 32-bit 
is then just a waste of memory. And I just noticed that WideString is 
actually fixed to 4 bytes. I would therefore think about renaming it to 
ForByteString and add TwoByteString (or similar names). For user these 
are always Strings anyway, as SmallIntegers and LargeIntegers are always 
Integers.

> 
>> VW also support Japanese locale well.
> 
>   Oh, yes.  I know it.  In fact, the internationalization of
> VisualWorks was done by a company that is my former employee. (The
> work was done way before I joined, though).  I have seen some apps and
> developers of the system.
> 
>   However, there is a reason to call our stuff m17n, instead of i18n.
> It might be still an aspiration to it, but supporting one language at
> a time "sort of localed based idea" is not enough for "real"
> multilingualization, where you would like to mix strings from
> different languages freely.

I strongly agree and therefore a well thought-out effort to solve i18n 
well in Squeak is a must. For me also, because I still need to find out 
how to port Aida/Web i18n support to Squeak ...

Best regards
JAnko


-- 
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si



More information about the Squeak-dev mailing list