[squeak-dev] Expanding hash tables (was: 4.1 - hashed collections still a problem)

Tom Rushworth tom_rushworth at mac.com
Mon Apr 5 00:57:45 UTC 2010


I don't know how to help with the choice of hash functions for  
different data domains, but assuming you can find a good function for  
your domain, there are much better ways to store the values than  
choosing a bucket based on the key modulo a prime.  Take a look at  
Split-Order-List hash tables as an example.  I'm a total Squeak noob,  
so this code probably has something to offend just about everyone's  
sensibilities :), but it does demonstrate the algorithm.  It could be  
further improved by using Igor's approach to key collisions (equal  
keys get a sublist of their own, which keeps them out of the way of  
searches for other keys), but it is probably much better for the code  
to be "polished" by better Squeakers than myself :).

This doesn't do anything to solve the hash function problem, but it  
does let you stop looking for primes...

I also have no idea how to put this on SqueakMap or SqueakSource, or  
even which one to put it on, so I've attached it here.  (I'd be happy  
for pointers on "best practice".)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Collections-SOLHashTables.st.zip
Type: application/zip
Size: 11182 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20100404/7900526b/Collections-SOLHashTables.st.zip
-------------- next part --------------



I've used the Objective-C version of this with a load factor of 6 to  
store English text:
188743 input words, 167752 duplicates, 20991 retained, elapsed time  
0.391754 seconds.
bucket size histogram, 4050 buckets, 20991 elements:
value:    count
     0:       15
     1:      106
     2:      311
     3:      505
     4:      684
     5:      737
     6:      638
     7:      444
     8:      297
     9:      185
    10:       77
    11:       36
    12:       12
    13:        2
    14:        1

--
Tom Rushworth





More information about the Squeak-dev mailing list