Horrible unscalability, how do I change hashBitsOf: ?

Scott A Crosby crosby at qwes.math.cmu.edu
Sat Oct 6 23:36:26 UTC 2001


On Sat, 6 Oct 2001 danielv at netvision.net.il wrote:

> Anyway, two things to do when you care about the performance of big
> hashed collections, are use the Identity* variant, or the Pluggable*
> variant. The firsts' #scanFor: has a little built trick to get that
> O(n/2^b) behavior back, for the usual b(its) = 12 case. The second lets

Exactly, but by doing the bitshift as I suggest, that behavior will come
to Sets&Dictionaries automatically. BTW, its actually roughly:

 O(n/2^b * 1/(1-load))

Where load varies between .38, and .75, I believe?

> Either of these routes could serve someone whose application requires
> scalable speed. If you make dictionaries predictably fast, without
> having to mess with this sort thing, that'll be great. But note that
> keeping all the different cases working fast is not trivial.

Its a single bitshift, it fundamentally changes no code in any collection,
no annoying edge cases or anything else. I already submitted the change,
with the message that started this thread, but actually getting it in the
image is beyond my knowledge of the full squeak runtime.

Thanks for the Pluggable* info.

If I'm faced with this problem, I'll use Identity* as workarounds.

>
> Both routes I pointed out are slower for small, well hashed
> dictionaries, where calculating a hash might be as expensive as doing
> the one or two comparisons actually needed. In my current image, all but
> 1 are < 4096 items big, and the only rascal is the SymbolTable, whose
> hash seems to still function okay...

Correct. Strings/Symbols have their own custom hash function. Though for
small symbol dictionaries, it may be faster to use
identityset/identitydictionary instead of the sets/dictionaries. (Given
profiling of macrobenchmark2, and assumign that it is only using small
symbol dictionaries, that might possibly speed up that macrobenchmark by a
few percent.)

>
> BTW, I've looked around, and though Squeak spends some time scanning
> those Sets and Dictionaries and Strings, there doesn't seem to be any
> primitive for doing scanning. Maybe this could help the common case?
>

Doubt it, especially in the case of user-defined equality functions.

Thanks for the other stuff; I'm fairly new to squeak and smalltalk.

Scott


--
No DVD movie will ever enter the public domain, nor will any CD. The last CD
and the last DVD will have moldered away decades before they leave copyright.
This is not encouraging the creation of knowledge in the public domain.





More information about the Squeak-dev mailing list