[squeak-dev] hashing question

Levente Uzonyi leves at elte.hu
Wed Nov 3 18:55:50 UTC 2010


On Wed, 3 Nov 2010, Chris Muller wrote:

> Hi all, assuming I have a special-kind of array that can grow without
> enumerating the entire old array, is there any way to implement a
> Dictionary that can grow without enumerating its internal array?  I'll
> elaborate.

I guess you are looking for something like this:
http://en.wikipedia.org/wiki/Linear_hashing
http://en.wikipedia.org/wiki/Consistent_hashing


Levente

>
> In our own Dictionary, when we grow there are two reasons why we must
> enumerate the entire old array:
>
>  1) because Array's in Squeak can't grow; for that we have
> OrderedCollection.  So we must allocate a new Array and copy the
> existing elements to the new-sized array.
>  2) during this time, the elements are redistributed based on modulo
> the new array's size.
>
> I have a special kind of Array (let's call it SpecialArray) that can
> grow without having to re-allocate (because it's on disk and files can
> grow).  I want to use this as the basis for a new kind of Dictionary,
> but I want to avoid the need to _redistribute_ the elements.
>
> One idea is to pre-allocate as large a SpecialArray as I think I'll
> ever need (say, 30 million) and simply don't grow.  At 6-bytes per
> pointer, that's a 90MB file, but at least it won't ever get any
> bigger.  "Disk space is cheap," and, in most cases, this actually is
> fine for my needs.
>
> Still, it prevents one from having a lot of instances of SpecialArray
> if they wanted, and one doesn't always know what will be "more than
> enough".
>
> Is there a more elegant solution that could make use of SpecialArray's
> ability to grow, but not redistribute?  It seems like not, since not
> only we need to use the new "upper half" of the Array for new
> elements, we still need to find elements in the "lower-half" (the
> first half).  The size of the internal array is used in the hash key
> calculation, so that would be a problem for finding the old elements..
>
> - Chris
>
> PS - I thought I remember someone posting a book about hashing
> algorithms, but I cannot find it..
>
>



More information about the Squeak-dev mailing list