[squeak-dev] [Pharo-dev] Why do we have SmallDictionary?

Chris Muller ma.chris.m at gmail.com
Sat Jun 30 20:19:41 UTC 2018


>> This discussion has limited its context to comparing *in-memory*
>> Dictionary implementations, where open-addressing has an advantage.
>> Using Magma reveals another context where chained Dictionary's can have
>> some advantages:  When they're large, persistent, and shared in a multi-user
>> system -- e.g., when grow rehashing is not an option.
>
>
> B-trees were designed for that use case.
> If growing is not an option, then your dictionary's performance will degrade
> as its size increases, won't it?

Yes, but only logarithmically.  But I was not making a
performance-based comparison, only a possibility-based one.

>> Not only do chained Dictionary's make that scenario possible, they improve
>> the concurrency -- different users can add/change/remove from the Dictionary
>> as long as they occur at different keys.  With regular
>> Dictionary's, concurrency guarantees a commit conflict.
>
> If you created a new, larger dictionary on grow and lazily copied the
> entries from the smaller to the larger upon access, then I think you could
> use open addressing hash tables.

That's an interesting idea, but what if it grows beyond "the larger"
again while still many entries in the smaller?  Add a third which is
doubled in size again?  That sounds like it starts to intrude into the
efficiency of open-addressing (both computation and space), but the
idea still sparks some interesting design thoughts in my head..

> Also, I don't see the concurrency issue with open addressing, because as
> long as different users modify different chains, the case of concurrency is
> the same as what you described, isn't it?

Magma has a class called "MagmaPreallocatedDictionary" which does
exactly this.  It's like a regular Dictionary, except for its internal
'array', it utilizes a MagmaArray, which is just like an Array except
large and persistent and can grow in place (so able to maintain its
identity).

However, this alone was not enough to implement your proposal because
of the 'tally'.  Clients would each be trying to propose their
specific value for that variable = conflict.  Thankfully, Magma has
yet another special class called MagmaCounter which fits that role
beautifully.  Designed for multi-user systems, a MagmaCounter provides
a concurrent counter.  Instances start at a #value of 0, after which
any number of sessions may simultaneously increment or decrement (even
by more than 1), but no one can actually hard set its value.

Best,
  Chris


More information about the Squeak-dev mailing list