String hierarchy (was: UTC-8 (was ...))

Mark van Gulik ghoul6 at home.com
Wed Mar 22 00:52:16 UTC 2000


>> (BTW, if you have immutability, why care one way or another about
>> uniqueness?
>
> O(1) Equality testing.  Very useful for some things (like selectors).

In my language Avail I have the best of both worlds.  Strings (and most
other structures, for that matter) compare by equality, which can require
O(n) to check, but if the strings are determined to be equal, one is turned
into an indirection to the other.  Do *not* try this in other languages:
Strings in Avail do not have identity (nor do any other immutable objects in
Avail, so this trick works for Avail's immutable tuples, immutable sets,
immutable maps, etc).  Oh yeah, these things all have nice stable hash
values which are computed as needed then cached in the objects.  Comparing
two strings (or sets, etc) first checks the objects' addresses, then checks
the hash values (if both are present or easily computed), and finally, if
the hashes were absent or equal, the strings are compared the slow way.  The
hashing is very good (32 bits, well distributed), so it's astronomically
unlikely that unequal strings have the same hash value (approximately one in
four billion unequal string comparisons, in fact), so it's almost *never*
that the characters of a string are compared to the characters of another
string (at least without a corresponding merge afterward).  Hashing takes
maybe a half-dozen instructions per character, so that's not an issue
either.

The reason I said it was the best of both worlds:  If you never use the
strings as keys in a map (Dictionary), nor compare strings to each other
extensively, the hash values never need to be computed.  If you *do* compare
strings a lot, they'll quickly merge together and have the same performance
as Symbols.


<plug>
You can see a (slightly aged) version of Avail with descriptive prose at:

    http://www.ericsworld.com/Mark/HTML/Avail.html

Avail project status:  The dynamic translator is reimplemented (supporting
simple inlining and primitive folding).  The incremental garbage collector
is working (except for a C++ bug that I've almost isolated with massive
assertions).  In the next few weeks (ok, maybe a month) I'll have the
dynamic translator itself translated to C++, at which point I'll create a
new release.  I'll probably release for Linux 386 at the same time or soon
after the Mac and Windows releases.

</plug>





More information about the Squeak-dev mailing list