[squeak-dev] Re: Hash changes

Levente Uzonyi leves at elte.hu
Mon Nov 30 22:48:47 UTC 2009


On Mon, 30 Nov 2009, Andreas Raab wrote:

> I'll be happy to do this but can someone summarize the expected effects of 
> those changes? The main thing I don't understand is why it is advantageous to 
> simply scale the id-hash given that it doesn't actually increase the range of 
> hashes, i.e., id-hash is still limited to 12 bits so no matter how much you 
> multiply these values the result will still only be 4096 distinct values. Why 
> is it advantageous to make these numbers artificially larger without 
> increasing the number of hash bits?
>

The following test is a modified version of Martin McClure's test 
(microbenchmark) the original can be found here: 
http://lists.gforge.inria.fr/pipermail/pharo-project/2009-October/015070.html
Run the test before and after the changes.

| test array |
Transcript open; cr.
test := Dictionary new.
[ test size >= 10000 ] whileFalse: [
 	Smalltalk garbageCollect.
 	array := Array new: 100 streamContents: [ :stream |
 		100 timesRepeat: [ stream nextPut: Object new ] ]. "Create 100 new Objects"
 	Transcript
 		print: [ "Add the 100 new Objects to the Dictionary as keys with nil values"
 			1 to: array size do: [ :index |
 				test at: (array at: index) put: nil ] ] timeToRun;
 		tab;
 		print: test size;
 		tab;
 		print: [ "Access the added Objects 1000 times"
 			1 to: 1000 do: [ :iteration |
 				1 to: array size do: [ :index |
 					test at: (array at: index) ] ] ] timeToRun;
 		tab;
 		print: [ "This should be substracted from the previous value"
 			1 to: 1000 do: [ :iteration |
 				1 to: array size do: [ :index |
 					array at: index ] ] ] timeToRun;
 		cr;
 		flush ]

The most important are the second (number of elements in the 
dictionary) and third column (time to access the 100 last added element 
1000 times) of the result printed to Transcript.

(I'm about to upload 2 more packages because the inlining trick 
used in the implementation doesn't work with SmallIntegers (LargeIntegers 
can be created while calculating the hash) and the enhancement for WeakSet 
growing is missing)

Levente

> Cheers,
>  - Andreas
>
> David T. Lewis wrote:
>> I think that this is important work. Most of the background discussion
>> took place on the Pharo list, so some may not have seen it. I personally
>> am not qualified to do anything with this, but hopefully others can
>> take a look at it and incorporate it in trunk if appropriate.
>> 
>> Dave
>> 
>> On Mon, Nov 30, 2009 at 06:26:21AM +0100, Levente Uzonyi wrote:
>>> Hi,
>>> 
>>> I uploaded the second part of Andr?s' hash changes to the inbox. Every 
>>> package requires a new mcm. The load order is:
>>> 
>>> Collections-ul.217 "Initialization"
>>> Kernel-ul.308 "Object >> #hash"
>>> Kernel-ul.309
>>> Kernel-ul.310
>>> Collections-ul.218 "IdentityDictionary"
>>> Collections-ul.219
>>> Collections-ul.220 "KeyedIdentitySet"
>>> Collections-ul.221
>>> Collections-ul.222 "WeakIdentityKeyDictionary"
>>> Collections-ul.223
>>> Collections-ul.224 "IdentitySet"
>>> Collections-ul.225
>>> System-ul.185 "SystemDictionary"
>>> System-ul.186
>>> System-ul.187
>>> Collections-ul.226 "Cleanup"
>>> Collections-ul.227
>>> 
>>> These packages modify the following:
>>> - Object >> #hash returns an integer between 0 and 1073479680 instead of
>>>   0 and 4096, though the range is not continuous
>>> - identity-based collections use the same range extension method as
>>>   Object >> #hash
>>> - SystemDictionary uses #hash instead of #identityHash
>>> - all preambles and postscripts are removed from Collections, Kernel and
>>>   System packages
>>> 
>>> Cheers,
>>> Levente
>>> 
>> 
>> 
>> 
>
>
>



More information about the Squeak-dev mailing list