[squeak-dev] hashing question

Tom Rushworth tom_rushworth at mac.com
Wed Nov 3 20:03:56 UTC 2010


Hi Chris,

Look at the ideas in a Split-Order-List hash table, which allows expansion without re-hashing.  I wrote a sample implementation in Squeak a while back:

-------- begin quote (of my own outgoing mail, a few months ago)

I've created a project on squeaksouce for it, and saved one further round of refactoring/development, but then I ran out of time to take it much further.  Look for the project:

http://www.squeaksource.com/SOLHashTable.html

The class comments contain a reference to the main paper I used to implement the algorithm.  There's a reasonable amount of testing code somewhere in there too, but my not knowing the ins and outs of the Squeak test framework means the test code is organized as "just a bunch of code" :).
------------- end quote

You'll want to look at the main paper (see the class comment) and then follow the the references back as far as you need to in order to understand the SOL.  Or just play with the squeak implementation :).  The main paper is all about using SOL for lock-free hashing, but you don't need the lock free stuff, just the non-enumeration for expansion. 

On 2010-Nov-3, at 11:16, Chris Muller wrote:

> Hi all, assuming I have a special-kind of array that can grow without
> enumerating the entire old array, is there any way to implement a
> Dictionary that can grow without enumerating its internal array?  I'll
> elaborate.
> 
> In our own Dictionary, when we grow there are two reasons why we must
> enumerate the entire old array:
> 
>  1) because Array's in Squeak can't grow; for that we have
> OrderedCollection.  So we must allocate a new Array and copy the
> existing elements to the new-sized array.
>  2) during this time, the elements are redistributed based on modulo
> the new array's size.
> 
> I have a special kind of Array (let's call it SpecialArray) that can
> grow without having to re-allocate (because it's on disk and files can
> grow).  I want to use this as the basis for a new kind of Dictionary,
> but I want to avoid the need to _redistribute_ the elements.
> 
> One idea is to pre-allocate as large a SpecialArray as I think I'll
> ever need (say, 30 million) and simply don't grow.  At 6-bytes per
> pointer, that's a 90MB file, but at least it won't ever get any
> bigger.  "Disk space is cheap," and, in most cases, this actually is
> fine for my needs.
> 
> Still, it prevents one from having a lot of instances of SpecialArray
> if they wanted, and one doesn't always know what will be "more than
> enough".
> 
> Is there a more elegant solution that could make use of SpecialArray's
> ability to grow, but not redistribute?  It seems like not, since not
> only we need to use the new "upper half" of the Array for new
> elements, we still need to find elements in the "lower-half" (the
> first half).  The size of the internal array is used in the hash key
> calculation, so that would be a problem for finding the old elements..
> 
> - Chris
> 
> PS - I thought I remember someone posting a book about hashing
> algorithms, but I cannot find it..
> 

--
Tom Rushworth






More information about the Squeak-dev mailing list