String

Fri Mar 17 06:56:55 UTC 2000

  Hi,

  Usually I don't like to write this kind of stuff by
myself, but I think I have to point out that already there
is (at least) an attempt to make Squeak be capable of
multi-byte string and XML parser based on it. 

  That includes:

  * automatic conversion of Character and MultiCharacter and
    vice versa (like Number hierarchy).
  * Automatic convertion of String to MultiString and vice
    versa. (though String and MultiString doesn't share same 
    anscestor)
  * Adding new encoding is (I hope) simple.
  * an XML parser (by John Dancun).

  As for Latin-1'nizing, several people (including me) did
it.  As far as I know, the first attempt (by me) was done by
creating several missing fonts by (my) hand and I'd say the
quality of fonts are not so good.  However, after that other
people like Jay Carlson did it based on reading BDF fonts.
Probably Henrik Gedenryd would provide much more "beautiful"
glyphs.

  I think now there are enough stuff to move Latin-1 based
Squeak.

  Finally, note that even Unicoders don't think
"16-bit == 1 character" abstraction works.

  If some multilingual system is build based on such
abstraction, the "string" would not be indexable.  Namely,
#at: and #at:put can't be executed in O(1) time.

  I think "30-bit <--> 8-bit" automatic conversion is most
practical and feasible answer in Squeak.

  See
http://www.is.titech.ac.jp/~ohshima/squeak/squeak-multilingual-e.html
and
http://www.pitt.edu/~jddst19/

  -- Yoshiki