UTF8 Squeak

Yoshiki Ohshima yoshiki at squeakland.org
Thu Jun 7 19:49:37 UTC 2007


> Each String object should specify its encoding scheme.  UTF-8 should be the
> default, but all commonly-encounterd encodings should be supported, and
> should all be useable at once (in different String instances.) When a
> Character is reified from a String, it should use the Unicode code point
> values (full 32-bit value.)  Ideally, the encoding of a String should be a
> function of an associated Strategy object, and not be based on having
> different subclasses of String.

  Is this better than using UTF32 throught the image for all Strings?
One reason would be that for some chars in domestic encodings, the
round-trip conversion is not exactly guaranteed; so you can avoid that
problem in this way.  But ohter than that, encodings only matters when
the system is interfacing with the outside world.  So, the internal
representation can be uniform, I think.

  Would you write all comparison methods for all of combinations of
different encodings?

-- Yoshiki



More information about the Squeak-dev mailing list