Faster dictionaries?

Thu Jul 19 20:42:49 UTC 2007

On 19/07/07, Ian Piumarta <piumarta at gmail.com> wrote:
> On Jul 19, 2007, at 5:10 AM, sig wrote:
>
> > its a scratch implementation of dictionaries using linked list of
> > associations.
>
> I think the two 'at:ifAbsent:' methods neglect to send #value to the
> errorBlock?
>
no they dont :)

its just a shortcut. instead of putting:
^ bucket ifNil: [ aBlock value ]

i can put:

^ bucket ifNil: aBlock

since IfNil expects block as argument, #value is sent to aBlock
anyways. No reason to encapsulate it into another block.

> Otherwise the lookup performance ain't too shabby for relatively
> small tallies.  (For anyone who can decode this sentence: BucketDict
> is almost as fast as Pepsi's SlotDictionary when used as the basis
> for lexical environments in the Jolt compiler.  This is my usual
> 'quasi-realistic' benchmark for a Dictionary.)
>
> One classic optimisation would be to use an ArrayedCollection of some
> kind (instead of a linked list) for the buckets.  When #findKeyOrNil:
> hits and the association is not the first element in the bucket then
> exchange it with the element immediately preceding it.  Associations
> related to popular keys will gradually migrate to the front of their
> buckets.

i tempted to implement something like:
#findKeyOrNil: key withPrevious: aBlock

so, when it finds association with corresponding key, it sends #value:
to aBlock with previous association in linked list just before
returning a result.
Then it can be easily reordered. But i'm not sure if this needed at all.

Dicts are very sensitive to rehash frequencies.

in given sample:

| sd md r |
Transcript clear.
r := OrderedCollection new.
ByteString allInstancesDo: [:each | r add: each ].
sd := BDictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to bd: ',
	[ r do: [:each | sd at: each put: each ] ] timeToRun printString.
sd := Dictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to sd: ',
	[ r do: [:each | sd at: each put: each ] ] timeToRun printString.

i got following results:
Making  'tally >= (sz *4)' in #checkForOverflow, i got:

time to add all strings to bd: 3184
time to add all strings to sd: 3010

Making  'tally >= (sz )' in #checkForOverflow, i got:

time to add all strings to bd: 6286
time to add all strings to sd: 2991

Making tally >= sz * 2

time to add all strings to bd: 3162
time to add all strings to sd: 2985

And please , note i made these dicts mainly for replacing Magma ones,
which working with huge sets and poorly hashed keys  taken
IdentityDictionaries:

time to add all strings to bd: 4146
time to add all strings to sd: 7151

PS. In attachment new version.. I will put it into mantis too.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Collections-Unordered-FastDicts.st
Type: application/octet-stream
Size: 28920 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20070719/47faacdf/Collections-Unordered-FastDicts.obj