3.7 moving to beta tomorrowish

Ned Konz ned at bike-nomad.com
Wed Mar 31 00:14:19 UTC 2004


On Tuesday 30 March 2004 3:57 pm, Bill Schwab wrote:
> An overly blunt way to look at unicode is that it offers us an opportunity
> to double the storage requirements for all of our text.  In fact, one
> device that I have encountered uses "unicode" (it likely predates the
> standards), and ends up doing precisely that - each character it sends is
> followed by a gratuitous zero, in a world where every byte truly counts
> thanks to bandwidth restrictions.

Anyone who sends UCS-2 (2 bytes per character) or UTF-16 over the wire is 
probably doing something wrong if most of the bytes are zero.

UTF-8 is the preferred character encoding for this. If your characters are all 
in the Latin-1 set, you don't spend any extra bytes.

> I understand the value of unicode, and want Squeak to embrace it.  However,
> is unicode something that many of us would want to disable most of the
> time?  I ask because, if true, we might want another solution to the
> underscore/:= collision.

We don't usually deal with strings made up of bytes in Squeak (that is, once 
we get past raw file data and stuff coming in from the operating system); we 
deal with Strings of Characters.

As a result, having most Strings just the way they are now (and choosing an 
encoding for their Characters) shouldn't cost anything (or at least very 
much; there are some primitives like the CharacterScanner and font rendering 
that would have to know this).

You should look at Yoshiki's work. He adds new kinds of String and Character 
(as I recall); these carry their encoding with them. Since he's also 
concerned about translation and about other potential problems with Unicode 
and Asian languages (look up "Han Unification" for some pointers), these 
Strings (and Characters?) also can carry more information about desired 
rendering, language of origin, etc. But most of the Strings and Characters in 
the Squeak image should remain untouched.

-- 
Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE



More information about the Squeak-dev mailing list