Hashtable (was Re: Dictionaries broken in 3.9a)

Mon Sep 19 10:20:38 UTC 2005

Am 19.09.2005 um 09:26 schrieb Andreas Raab:

> Avi Bryant wrote:
>
>> Seconded.  It uses quite a bit more memory than a standard  
>> Dictionary,
>> but the performance appears to be impressively flat while growing to
>> very large sizes, even when using the anemic #identityHash.
>>
>
> But you are aware that the current behavior is the result of a very  
> particular set initialization that is easily fixed, yes? Changing  
> the initial set capacity from 3 to 5 will *dramatically* improve  
> the behavior ;-)
>

The problem Phillippe solved is another: He made hash collisions  
cheaper. In Squeak, we have only 4096 different hash values, so if a  
collection get larger,
collisions will occur. Now the current implementation has a very  
simple, but expensive collision resolving: It just takes the next  
free slot and puts
the value there. When searching, it takes the slot calculated by the  
hash, looks if the entry really is there, if not, it will do a linear  
search starting at the
calculated position.

The problem now is that this means that solving one collission will  
make another slot a collision, too. So dictionaries degenerate to a  
linear search quite fast,
even if the hash is nicely distributed and spread correctly.

Phillipe uses a linked list for the solving collisions: If there is a  
collision, the slot will contain the head of a linked list. So if we  
have a nicely equally distributed
hash in a 40.000 element dictionary, this will lead to 4096 10- 
element linked lists, accessible via the hash. In this case we just  
need to do a linear seach
the 10-element linked list, which still is quite fast. And even if  
the hash value or spread is not perfect, this we make sure that we  
don't destroy other slots
with each collision.

      Marcus