[squeak-dev] Expanding hash tables (was: 4.1 - hashed collections
still a problem)
Tom Rushworth
tom_rushworth at mac.com
Mon Apr 5 00:57:45 UTC 2010
I don't know how to help with the choice of hash functions for
different data domains, but assuming you can find a good function for
your domain, there are much better ways to store the values than
choosing a bucket based on the key modulo a prime. Take a look at
Split-Order-List hash tables as an example. I'm a total Squeak noob,
so this code probably has something to offend just about everyone's
sensibilities :), but it does demonstrate the algorithm. It could be
further improved by using Igor's approach to key collisions (equal
keys get a sublist of their own, which keeps them out of the way of
searches for other keys), but it is probably much better for the code
to be "polished" by better Squeakers than myself :).
This doesn't do anything to solve the hash function problem, but it
does let you stop looking for primes...
I also have no idea how to put this on SqueakMap or SqueakSource, or
even which one to put it on, so I've attached it here. (I'd be happy
for pointers on "best practice".)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Collections-SOLHashTables.st.zip
Type: application/zip
Size: 11182 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20100404/7900526b/Collections-SOLHashTables.st.zip
-------------- next part --------------
I've used the Objective-C version of this with a load factor of 6 to
store English text:
188743 input words, 167752 duplicates, 20991 retained, elapsed time
0.391754 seconds.
bucket size histogram, 4050 buckets, 20991 elements:
value: count
0: 15
1: 106
2: 311
3: 505
4: 684
5: 737
6: 638
7: 444
8: 297
9: 185
10: 77
11: 36
12: 12
13: 2
14: 1
--
Tom Rushworth
More information about the Squeak-dev
mailing list
|