3.7 moving to beta tomorrowish
Yoshiki Ohshima
Yoshiki.Ohshima at acm.org
Wed Mar 31 00:18:59 UTC 2004
Bill,
> An overly blunt way to look at unicode is that it offers us an
> opportunity to double the storage requirements for all of our text.
> In fact, one device that I have encountered uses "unicode" (it
> likely predates the standards), and ends up doing precisely that -
> each character it sends is followed by a gratuitous zero, in a world
> where every byte truly counts thanks to bandwidth restrictions.
Ah, this is... I don't know what to say...
First, Unicode is not 16-bit character set. It is 21-bit character
set.
Second, let's clarify that the external representation and the
internal representation are different beast. My stuff mainly uses the
UTF-8 for the external representation. As long as your code is mainly
written in pure ASCII plus a few other characters such as up arrow and
left arrow, the difference in size is just ignorable.
For the internal representation, my stuff use 8-bit per character
representation for a String that solely contains latin1 characters.
For those Strings that contains non-latin1 chars, it actually uses
32-bit representation. It is similar to the SmallInteger and
LargeInteger implicit conversion and the user shouldn't have to care
about the difference. The typical memory footprint surely will go up
a bit, but as you can imagine, not much.
> I understand the value of unicode, and want Squeak to embrace it.
> However, is unicode something that many of us would want to disable
> most of the time? I ask because, if true, we might want another
> solution to the underscore/:= collision.
Does this explanation add some thought to your observation?
-- Yoshiki
More information about the Squeak-dev
mailing list
|