multilingual Squeak (Re: Must _ go like the Dodo?)

ohshima at is.titech.ac.jp ohshima at is.titech.ac.jp
Tue Mar 16 08:30:43 UTC 1999


  Hi,

> >  This could be another source of the frame, but I want to
> >say that Unicode is almost useless for 'internationalization'
> >and 'multilingualization'.  It is just a compromized way to
> >'localization' easier.
> 
> I'm not sure of the meaning of these terms - can you expand a little?

  I was too flurried to send the last email.  Let me explain
a bit more.

> My understanding is that localisation is at its simplest where I can choose
> a default and get all of my menus in French; 'internationisation' and
> 'multilingualisation' are much more...

  Ah yes, the problem is not only for menus, but also
inputing, justification, file-io, comparion and more.

  The biggest problem of Unicode is the existence of 'many-
to-one mapping.' The interpretation of the 16bit code
depends on the unencoded external environment.

  Suppose there is an (imagenary) multilingual Smalltalk.
On the system, the instances of Character should carry
enough information about the character itself.  But, as I
wrote above, if the internal representation would be
Unicode, this couldn't be true.

  On a truly multilingualized platform, only very low level
routines aware the encoding/glyph/... detail.  The high
level routines doesn't have to care the detail unless they
are going to handle the multilingual issue.

> As I understand it, Mule in Emacs can read and write various text files
> encoded in, eg, Latin-1 or Chinese-BIG5, but then translates those files to
> an internal multibyte format.  Is there any reason why that internal format
> shouldn't be Unicode? Is there a better alternative?

  The reason is the 'many-to-one mapping problem' which was
introduced in order to restrict the things to fit into 16
bit (small!) space.  Restricting to 16 bit space did make
sense when the memory is not cheap, but not now.

  And the other problem is Unocode's 'monolithic' notion.
Some (non-european) countries (including Japan) are revising
the character set and/or encoding scheme standard but they
found that reflecting the new standard to Unicode is
difficult.

  The better alternative is something like an aggregation of
local encodings which Mule employs. (Yes, Unicode can be one
of the local encoding.)

                                             OHSHIMA Yoshiki
                Dept. of Mathematical and Computing Sciences
                               Tokyo Institute of Technology 





More information about the Squeak-dev mailing list