UTF8 Squeak

Yoshiki Ohshima yoshiki at squeakland.org
Thu Jun 7 22:06:34 UTC 2007


  Janko,

> >   So, the question to you is that if you have a system with 8-bit
> > ByteString and 32-bit WideString in year 2007, would you add a class
> > to represent 16-bit string to that system?
> 
> I would say yes, because for most countries 16-bit is enough and 32-bit 
> is then just a waste of memory. And I just noticed that WideString is 
> actually fixed to 4 bytes. I would therefore think about renaming it to 
> ForByteString and add TwoByteString (or similar names). For user these 
> are always Strings anyway, as SmallIntegers and LargeIntegers are always 
> Integers.

  Similar deal in Squeak, too.  The system does the auto coertion
between WideString and ByteString, and the user doesn't have to deal
with them not all the time.

  Adding 16-bit is surely an option.  At the same time, there is
similar but different POV: "because for most users 8-bit is enough and
32-bit version is used not so frequently anyway".  There is no "right"
answer, but different trade-offs.  (That is why this problem is
interesting^^;)

  And actually, adding more general character object that doesn't rely
on a particular bit-representation (and therefore can go beyond
32-bit), and make the strings be array of such characters will be
better eventually.

-- Yoshiki



More information about the Squeak-dev mailing list