[squeak-dev] hashing question

Chris Muller asqueaker at gmail.com
Wed Nov 3 18:16:59 UTC 2010


Hi all, assuming I have a special-kind of array that can grow without
enumerating the entire old array, is there any way to implement a
Dictionary that can grow without enumerating its internal array?  I'll
elaborate.

In our own Dictionary, when we grow there are two reasons why we must
enumerate the entire old array:

  1) because Array's in Squeak can't grow; for that we have
OrderedCollection.  So we must allocate a new Array and copy the
existing elements to the new-sized array.
  2) during this time, the elements are redistributed based on modulo
the new array's size.

I have a special kind of Array (let's call it SpecialArray) that can
grow without having to re-allocate (because it's on disk and files can
grow).  I want to use this as the basis for a new kind of Dictionary,
but I want to avoid the need to _redistribute_ the elements.

One idea is to pre-allocate as large a SpecialArray as I think I'll
ever need (say, 30 million) and simply don't grow.  At 6-bytes per
pointer, that's a 90MB file, but at least it won't ever get any
bigger.  "Disk space is cheap," and, in most cases, this actually is
fine for my needs.

Still, it prevents one from having a lot of instances of SpecialArray
if they wanted, and one doesn't always know what will be "more than
enough".

Is there a more elegant solution that could make use of SpecialArray's
ability to grow, but not redistribute?  It seems like not, since not
only we need to use the new "upper half" of the Array for new
elements, we still need to find elements in the "lower-half" (the
first half).  The size of the internal array is used in the hash key
calculation, so that would be a problem for finding the old elements..

 - Chris

PS - I thought I remember someone posting a book about hashing
algorithms, but I cannot find it..



More information about the Squeak-dev mailing list