3.7 moving to beta tomorrowish
Ned Konz
ned at bike-nomad.com
Wed Mar 31 00:14:19 UTC 2004
On Tuesday 30 March 2004 3:57 pm, Bill Schwab wrote:
> An overly blunt way to look at unicode is that it offers us an opportunity
> to double the storage requirements for all of our text. In fact, one
> device that I have encountered uses "unicode" (it likely predates the
> standards), and ends up doing precisely that - each character it sends is
> followed by a gratuitous zero, in a world where every byte truly counts
> thanks to bandwidth restrictions.
Anyone who sends UCS-2 (2 bytes per character) or UTF-16 over the wire is
probably doing something wrong if most of the bytes are zero.
UTF-8 is the preferred character encoding for this. If your characters are all
in the Latin-1 set, you don't spend any extra bytes.
> I understand the value of unicode, and want Squeak to embrace it. However,
> is unicode something that many of us would want to disable most of the
> time? I ask because, if true, we might want another solution to the
> underscore/:= collision.
We don't usually deal with strings made up of bytes in Squeak (that is, once
we get past raw file data and stuff coming in from the operating system); we
deal with Strings of Characters.
As a result, having most Strings just the way they are now (and choosing an
encoding for their Characters) shouldn't cost anything (or at least very
much; there are some primitives like the CharacterScanner and font rendering
that would have to know this).
You should look at Yoshiki's work. He adds new kinds of String and Character
(as I recall); these carry their encoding with them. Since he's also
concerned about translation and about other potential problems with Unicode
and Asian languages (look up "Han Unification" for some pointers), these
Strings (and Characters?) also can carry more information about desired
rendering, language of origin, etc. But most of the Strings and Characters in
the Squeak image should remain untouched.
--
Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE
More information about the Squeak-dev
mailing list
|