[Vm-dev] String hash function

Ralph Boland rpboland at gmail.com
Fri Apr 14 06:37:14 UTC 2017


I had an application where I needed to store arrays into sets
and performance depended heavily on this operation.
Standard array classes were unacceptable because hashing was either
too poor (then) or too expensive (now).

So I created my own array class (say HashArray, I forget the name now).
HashArray stored a hash value rather than compute one.
The hash value of an array was the XOR of its elements (which in my
application have good hash values).
This worked very well but came at a price:  each time an element was
added or removed or changed
for a HashArray instance the hash value had to be recomputed  (O(1) cost).
For my application this wasn't a problem so my solution worked well.
Notes:
   1) In my case two arrays containing the same elements should have
the same hash value.
   Your situation might be different.  If so you could also add the
the hash value some value
    associated with the array itself.
   2) While contained in a set the array couldn't be changed (didn't
happen in my application) but this
    is true for most objects anyway since its has value cannot change;
just something to be aware of.
   3) In my application the elements of the array also stored their
hash values.  The has values were generated using
   a random number generator.  This could be done in my application
but obviously can't always be done.
   This gave very good hashing for my arrays.

If you are having performance problems with array classes because of
hashing perhaps you should try a scheme
similar to mine.  Whether or not such a class should be part of
Squeak, I don't know.  Perhaps there should be
a package of HashArray like classes that could be loaded.

Ralph Boland


More information about the Vm-dev mailing list