Am 19.09.2005 um 09:26 schrieb Andreas Raab:
Avi Bryant wrote:
Seconded. It uses quite a bit more memory than a standard Dictionary, but the performance appears to be impressively flat while growing to very large sizes, even when using the anemic #identityHash.
But you are aware that the current behavior is the result of a very particular set initialization that is easily fixed, yes? Changing the initial set capacity from 3 to 5 will *dramatically* improve the behavior ;-)
The problem Phillippe solved is another: He made hash collisions cheaper. In Squeak, we have only 4096 different hash values, so if a collection get larger, collisions will occur. Now the current implementation has a very simple, but expensive collision resolving: It just takes the next free slot and puts the value there. When searching, it takes the slot calculated by the hash, looks if the entry really is there, if not, it will do a linear search starting at the calculated position.
The problem now is that this means that solving one collission will make another slot a collision, too. So dictionaries degenerate to a linear search quite fast, even if the hash is nicely distributed and spread correctly.
Phillipe uses a linked list for the solving collisions: If there is a collision, the slot will contain the head of a linked list. So if we have a nicely equally distributed hash in a 40.000 element dictionary, this will lead to 4096 10- element linked lists, accessible via the hash. In this case we just need to do a linear seach the 10-element linked list, which still is quite fast. And even if the hash value or spread is not perfect, this we make sure that we don't destroy other slots with each collision.
Marcus