Hashtable (was Re: Dictionaries broken in 3.9a)
Marcus Denker
denker at iam.unibe.ch
Mon Sep 19 10:20:38 UTC 2005
Am 19.09.2005 um 09:26 schrieb Andreas Raab:
> Avi Bryant wrote:
>
>> Seconded. It uses quite a bit more memory than a standard
>> Dictionary,
>> but the performance appears to be impressively flat while growing to
>> very large sizes, even when using the anemic #identityHash.
>>
>
> But you are aware that the current behavior is the result of a very
> particular set initialization that is easily fixed, yes? Changing
> the initial set capacity from 3 to 5 will *dramatically* improve
> the behavior ;-)
>
The problem Phillippe solved is another: He made hash collisions
cheaper. In Squeak, we have only 4096 different hash values, so if a
collection get larger,
collisions will occur. Now the current implementation has a very
simple, but expensive collision resolving: It just takes the next
free slot and puts
the value there. When searching, it takes the slot calculated by the
hash, looks if the entry really is there, if not, it will do a linear
search starting at the
calculated position.
The problem now is that this means that solving one collission will
make another slot a collision, too. So dictionaries degenerate to a
linear search quite fast,
even if the hash is nicely distributed and spread correctly.
Phillipe uses a linked list for the solving collisions: If there is a
collision, the slot will contain the head of a linked list. So if we
have a nicely equally distributed
hash in a 40.000 element dictionary, this will lead to 4096 10-
element linked lists, accessible via the hash. In this case we just
need to do a linear seach
the 10-element linked list, which still is quite fast. And even if
the hash value or spread is not perfect, this we make sure that we
don't destroy other slots
with each collision.
Marcus
More information about the Squeak-dev
mailing list
|