I don't know how to help with the choice of hash functions for different data domains, but assuming you can find a good function for your domain, there are much better ways to store the values than choosing a bucket based on the key modulo a prime. Take a look at Split-Order-List hash tables as an example. I'm a total Squeak noob, so this code probably has something to offend just about everyone's sensibilities :), but it does demonstrate the algorithm. It could be further improved by using Igor's approach to key collisions (equal keys get a sublist of their own, which keeps them out of the way of searches for other keys), but it is probably much better for the code to be "polished" by better Squeakers than myself :).
This doesn't do anything to solve the hash function problem, but it does let you stop looking for primes...
I also have no idea how to put this on SqueakMap or SqueakSource, or even which one to put it on, so I've attached it here. (I'd be happy for pointers on "best practice".)
I've used the Objective-C version of this with a load factor of 6 to store English text: 188743 input words, 167752 duplicates, 20991 retained, elapsed time 0.391754 seconds. bucket size histogram, 4050 buckets, 20991 elements: value: count 0: 15 1: 106 2: 311 3: 505 4: 684 5: 737 6: 638 7: 444 8: 297 9: 185 10: 77 11: 36 12: 12 13: 2 14: 1
-- Tom Rushworth