[squeak-dev] hashing question

Andres Valloud avalloud at smalltalk.comcastbiz.net
Wed Nov 3 18:56:37 UTC 2010


Depending on the application, you may also want to use hash buckets.

On 11/3/10 11:55 , Levente Uzonyi wrote:
> On Wed, 3 Nov 2010, Chris Muller wrote:
>
>> Hi all, assuming I have a special-kind of array that can grow without
>> enumerating the entire old array, is there any way to implement a
>> Dictionary that can grow without enumerating its internal array?  I'll
>> elaborate.
>
> I guess you are looking for something like this:
> http://en.wikipedia.org/wiki/Linear_hashing
> http://en.wikipedia.org/wiki/Consistent_hashing
>
>
> Levente
>
>>
>> In our own Dictionary, when we grow there are two reasons why we must
>> enumerate the entire old array:
>>
>>   1) because Array's in Squeak can't grow; for that we have
>> OrderedCollection.  So we must allocate a new Array and copy the
>> existing elements to the new-sized array.
>>   2) during this time, the elements are redistributed based on modulo
>> the new array's size.
>>
>> I have a special kind of Array (let's call it SpecialArray) that can
>> grow without having to re-allocate (because it's on disk and files can
>> grow).  I want to use this as the basis for a new kind of Dictionary,
>> but I want to avoid the need to _redistribute_ the elements.
>>
>> One idea is to pre-allocate as large a SpecialArray as I think I'll
>> ever need (say, 30 million) and simply don't grow.  At 6-bytes per
>> pointer, that's a 90MB file, but at least it won't ever get any
>> bigger.  "Disk space is cheap," and, in most cases, this actually is
>> fine for my needs.
>>
>> Still, it prevents one from having a lot of instances of SpecialArray
>> if they wanted, and one doesn't always know what will be "more than
>> enough".
>>
>> Is there a more elegant solution that could make use of SpecialArray's
>> ability to grow, but not redistribute?  It seems like not, since not
>> only we need to use the new "upper half" of the Array for new
>> elements, we still need to find elements in the "lower-half" (the
>> first half).  The size of the internal array is used in the hash key
>> calculation, so that would be a problem for finding the old elements..
>>
>> - Chris
>>
>> PS - I thought I remember someone posting a book about hashing
>> algorithms, but I cannot find it..
>>
>>
>
> .
>



More information about the Squeak-dev mailing list