[BUG] Unscalability in collections.

Lex Spoon lex at cc.gatech.edu
Tue Oct 9 04:42:16 UTC 2001


Hmmmm.  An alternative is to add a new method like #hashBits to query
the VM, and then for #identityHash to do the scaling up.  This would let
IdentitySet also benefit from the change.

To rehash everything, you can do:

	Smalltalk allObjectsDo: [ :o | o rehash ]


Also, as a nice bit of prescience, sets get rehashed whenever objects
are loaded from the network or from disk.

So am I missing anything?  I'd say go for it!


By the way, huge hash tables still aren't going to be *wonderful*.  But
they will at least be better. IHMO, a really nice way to do this, which
requires a lot of work, would be a combination of two things:

	1. Let objects have an optional extra word in their header with a large
32-bit (or so) hash.  This word is allocated lazily, only when it is
queried; the same forwarding pointers used for becomeForward: could be
used to do the lazy allocation.

	2. Change #hash to #hash:, and let the extra argument be the size of
the hash table.  By default this method could call #hash, but for
sophisticated objects, it can compute a different hash depending on the
size of the collection it is in.  Likewise for #identityHash and
#identityHash:.

Well, I think it's a neat solution, but, I'm not going to implement it. 
Not only is it tricky work, but there is a workaround: for each class
that gets used in huge collections (typically not many at all), add an
extra instance variable to hold a larger hash value that is randomly
generated.  The more often you do this, the less often it looks.


-Lex




More information about the Squeak-dev mailing list