[BUG] Unscalability in collections.
Lex Spoon
lex at cc.gatech.edu
Tue Oct 9 04:42:16 UTC 2001
Hmmmm. An alternative is to add a new method like #hashBits to query
the VM, and then for #identityHash to do the scaling up. This would let
IdentitySet also benefit from the change.
To rehash everything, you can do:
Smalltalk allObjectsDo: [ :o | o rehash ]
Also, as a nice bit of prescience, sets get rehashed whenever objects
are loaded from the network or from disk.
So am I missing anything? I'd say go for it!
By the way, huge hash tables still aren't going to be *wonderful*. But
they will at least be better. IHMO, a really nice way to do this, which
requires a lot of work, would be a combination of two things:
1. Let objects have an optional extra word in their header with a large
32-bit (or so) hash. This word is allocated lazily, only when it is
queried; the same forwarding pointers used for becomeForward: could be
used to do the lazy allocation.
2. Change #hash to #hash:, and let the extra argument be the size of
the hash table. By default this method could call #hash, but for
sophisticated objects, it can compute a different hash depending on the
size of the collection it is in. Likewise for #identityHash and
#identityHash:.
Well, I think it's a neat solution, but, I'm not going to implement it.
Not only is it tricky work, but there is a workaround: for each class
that gets used in huge collections (typically not many at all), add an
extra instance variable to hold a larger hash value that is randomly
generated. The more often you do this, the less often it looks.
-Lex
More information about the Squeak-dev
mailing list
|