[squeak-dev] [Pharo-dev] Why do we have SmallDictionary?

Clément Bera bera.clement at gmail.com
Sat Jun 30 07:39:48 UTC 2018


Thanks for the detailed answer. I now understand better why Smalltalk
dictionaries are this way.

Best,

On Sat, Jun 30, 2018 at 2:05 AM, Levente Uzonyi <leves at caesar.elte.hu>
wrote:

> On Fri, 29 Jun 2018, Clément Bera wrote:
>
> Do you guys agree with this?
>>
>
> Mostly. I don't think open addressing's performance is proporional to
> loadFactor / (1 - loadFactor). That formula would be correct if you had a
> an array filled up to loadFactor randomly, but linear probing creates
> longer chains due to clustering, so it's worse than that in practice.
> Here's a small table based on my measurements:
>
> lf      formula measurement
> 0.5     1.0     ~1.5
> 0.55    1.222   ~1.96
> 0.6     1.5     ~2.62
> 0.65    1.857   ~3.56
> 0.7     2.333   ~5.06
> 0.75    3.0     ~7.5
> 0.8     4.0     ~12.0
>
> In Squeak load factor normally varies between 0.5 and 0.75, but it can be
> anything between 0.0 and 0.8 (only compaction can push load factor above
> 0.75), so performance can differ quite a bit. Based on these numbers 0.7
> sounds like a reasonable upper limit - something we might want to consider.
> You might think that linear probing's clustering effect costs too much,
> but because of cache locality this is probably still faster than linked
> lists or double hashing.
>
> It is also irrelevant if you can store more elements than the size of the
> hash table, because that breaks the precondition of the use of hash tables,
> therefore the contract too. So, the promised O(1) runtime of operations is
> also gone in that case.
>
> Finally, removals do not clog the hash tables using open addressing unless
> you use lazy deletion[1], which is something you should avoid.
>
>
>> Their open addressing implementation is slightly different from the
>> Smalltalk one (they have deleted entries upon removal while we fix
>> collisions,
>> at least in Pharo 6).
>>
>
> Proactively removing the garbage - aka #fixCollisionsFrom: - takes O(chain
> length) time, which is amortized O(1) if your hash function is good enough.
> Lazy deletion degrades lookup performance over time, because lookups have
> to treat deleted entries as existing entries. It makes deletion somewhat
> quicker, because you can stop the lookup procedure as soon as you find your
> element, but what you save is amortized O(1).
>
> Their chaining implementation does not use balanced binary tree when a
>> bucket becomes large like Java's implementation.
>>
>
> Because hash tables in general don't do such thing. During my time at the
> university, it was one of the tricky questions why hash tables use linked
> lists instead of balanced binary trees for collision resolution.
> The answer is that you don't need them, because the contract of hash
> tables is that you'll (practically) not have long chains if your hash
> function is good enough.
> Using binary trees for collision resolution mitigate situations when your
> hash function is not good enough or when you don't have full control over
> your hash values. But those are fairly rare in practice, and you still lose
> the O(1) operation runtime cost.
> Binary trees also require your elements to have a total order defined on
> them.
>
>
>> I don't really understand why an open addressing implementation is
>> trickier to implement, in high level languages such as Smalltalk I may agree
>> that it might be easier to implement chaining, but in low level language
>> such as in the VM, IMO open addressing is easier since you've got only
>> one large chunk of memory to manage. Maybe it's because of hash
>> collisions, having good hash and good hash collection size
>> (See HashTableSizes) is
>> not that easy.
>>
>
> It's easy to make indexing mistakes, especially when wrap-around is
> involved. Also, deletion is tricky and is often left as an exercise for the
> reader. Even the wikipedia article[2] has quite complex pseudocode for it.
> The version you find in Squeak has a few hidden tricks too, but those are
> there for performance and the code would still work without them.
>
> Levente
>
> [1] https://en.wikipedia.org/wiki/Lazy_deletion
> [2] https://en.wikipedia.org/wiki/Open_addressing
>
>
>
>> I'm curious because I do more and more bare metal programming and I end
>> up having to implement this kind of things myself, I've always implemented
>> naive open addressing up until now without really understanding details.
>>
>> [1] http://www.algolist.net/Data_structures/Hash_table/Open_addressing
>>
>> On Sat, Jun 16, 2018 at 8:49 PM, Max Leske <maxleske at gmail.com> wrote:
>>
>>
>>       On 8 Jun 2018, at 09:48, Stéphane Rollandin wrote:
>>
>>             FWIW it seems there is no SmallDictionary in Squeak.
>>
>>
>>       Oh... Thanks Stéf, I wasn't aware of that.
>>
>>
>>
>>       On 8 June 2018, at 15:13, John Brant wrote:
>>
>>                         Is anyone aware of a reason for hanging on to
>> SmallDictionary? I'm also curious to know how
>>                         SmallDictionary came to be. There must have been
>> some advantage over Dictionary at some point in
>>                         the past.
>>
>>
>>                   It came from RB… the idea was that there are (in the
>> Refactoring engine) a lot of very small dictionaries with
>>                   <10 elements.
>>                   The idea is that for such dictionaries, the overhead of
>> hashing was higher than just linear search.
>>
>>
>> I created its ancestor in VW some 20+ years ago (as RBSmallDictionary).
>> It was used when doing pattern matching. When it performs a pattern
>> match against an AST, it puts the potential value of the pattern
>> variable in the dictionary. If the value is used later in the pattern,
>> then we can get the previous value and make sure that we have an
>> equivalent AST. This allows you to write patterns like:
>>
>>         `@a = `@a
>>
>> to find where someone has the same expression on both sides of the #=
>> message. Since most patterns have very few pattern variables, these
>> dictionaries won't hold many entries. Furthermore, we are likely to
>> abort the match when we have 0 entries.
>>
>> The original RBSmallDictionary had an #empty method that "emptied" the
>> dictionary without actually removing any elements -- it just set the
>> size to 0. In a general dictionary this would lead to memory leaks since
>> the previous values are still held by the dictionary. However, these
>> dictionaries were only used during the matching process and went away
>> after the process completed.
>>
>> Anyway, at the time when we converted our pattern matching code from
>> using the VW parser with our pattern matching extensions to use the new
>> RB parser with pattern matching, the time to run Smalllint on the image
>> was cut in half even though our parser was quite a bit slower than the
>> VW parser. I don't remember everything that was done, but I think that
>> most of the speedup came from having special pattern AST nodes and the
>> small dictionary.
>>
>>
>> John Brant
>>
>>
>>
>> Very interesting! Thanks John!
>>
>> As Marcus has mentioned before in this thread, it would make a lot of
>> sense to run benchmarks again. Actually, I think it would be nice to
>> have a benchmark suite for these cases, that would let us monitor the
>> performance and ensure that changes to the codebase don't have a
>> deteriorative effect. I'm not saying that it would be easy to make this
>> happen, writing proper benchmarks is hard (for me especially, as it
>> seems, given my utter failure to think of the edge cases before starting
>> this thread). Such a suite might also prevent these sorts of
>> questions on the mailing list in the future, or at least might make it
>> easier to answer them.
>>
>>
>>
>>
>> On 8 June 2018, at 13:01, Andres Valloud wrote:
>>
>>       In addition, open addressing with linear probing has superior cache
>> line read behavior (no indirection / random traversal, and
>>       if the first probe misses the second one was likely cached by the
>> first one).
>>
>>
>>
>> Ah, nice catch! Although that would require frequent access to the
>> dictionary / repeated access to the same items to have an effect,
>> wouldn't it?
>>
>>
>> On 8 Jun 2018, at 10:01, Clément Bera wrote:
>>
>>       Hi Max,
>>
>>       Theoretically, for a small number of elements, usually somewhere
>> between 3
>>       and 30 depending on implementations, a linear search is faster than
>> a hash
>>       search, especially in the Pharo dictionary hash search
>> implementation.
>>
>>       Efficient dictionary implementations are usually bucket-based. The
>>       dictionary holds a certain number of buckets, and based on the key
>> hash,
>>       the bucket where the key value is present is determined. Small
>> buckets are
>>       linear (arrays or linked list). Large buckets are typically
>> balanced binary
>>       trees (red-black trees typically). Under a certain number of
>> elements there
>>       is a single bucket, which means a linear search is performed, as
>> for the
>>       SmallDictionary. When it grows the dictionary search becomes a
>> combination
>>       between a hash search and a linear or tree search.
>>
>>       Pharo dictionary search is first hash-based, then all the buckets
>> are
>>       represented next to each other in the same arrays and a linear
>> search is
>>       performed there, leading to many collisions and slower search time
>>       (especially when the element is not found), sometimes the code
>> searches
>>       over multiple buckets because the dictionary is too full or there
>> are too
>>       many near-collisions. The memory consumption is competitive with the
>>       advanced implementations though (precise measurements would need to
>> be
>>       made).
>>
>>       Method dictionaries are represented differently to optimize the
>> look-up
>>       logic.
>>
>>       If you want to improve things and have one dictionary implementation
>>       instead of two, implement or look for a bucket based dictionary and
>> put it
>>       in the base image instead of Dictionary. This is quite some work
>> since
>>       there are many APIs to port. You can look at the Pinnochio
>> implementation,
>>       it's quite good but they've not implemented large buckets.
>>
>>
>> Thanks for the detailed explanations Clément and Levente. I'll probably
>> not add a new dictionary implementation ;)
>>
>>
>>
>>
>>       On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <maxleske at gmail.com>
>> wrote:
>>
>>             Hi,
>>
>>             I was messing around with SmallDictionary when I suddenly
>> realised that I
>>             can't find a single reason to use it over a normal
>> Dictionary. While its
>>             name and class comment imply that it is somehow an optimised
>> Dictionary, I
>>             don't see any measurement where that actually holds up. The
>> following was
>>             run in a Pharo 7 image on a recent VM (see below):
>>
>>             | d |
>>             d := SmallDictionary new.
>>             d sizeInMemory. "24"
>>             [100000 timesRepeat: [
>>                     1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun.
>> "0:00:00:05.226"
>>
>>             [100000 timesRepeat: [
>>                     d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>>
>>
>>
>>             | d |
>>             d := Dictionary new.
>>             d sizeInMemory. "16"
>>             [100000 timesRepeat: [
>>                     1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun.
>> "0:00:00:00.385"
>>             [100000 timesRepeat: [
>>                     d at: 48 ifAbsent: [] ] ] timeToRun.  "0:00:00:00.006"
>>
>>
>>             As you can see, SmallDictionary is 8 bytes larger per
>> instance and
>>             significantly faster while reading and writing (I know that
>> this isn't a
>>             good benchmark but it suffices to make my point).
>>
>>
>>             Is anyone aware of a reason for hanging on to
>> SmallDictionary? I'm also
>>             curious to know how SmallDictionary came to be. There must
>> have been some
>>             advantage over Dictionary at some point in the past.
>>
>>
>>             Cheers,
>>             Max
>>
>>
>>
>>
>>
>>             Image version: Pharo 7.0
>>             Build information: Pharo-7.0+alpha.build.961.sha.
>>             a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit)
>>
>>             VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
>>             4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
>>             StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
>>             2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
>>             VM: 201711262336 https://github.com/OpenSmallta
>> lk/opensmalltalk-vm.git $
>>             Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336
>>             https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
>>
>>             OS: macOS 10.13.5
>>             Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)
>>
>>
>>
>>
>>       --
>>       Clément Béra
>>       https://clementbera.github.io/
>>       https://clementbera.wordpress.com/
>>
>>
>>
>>
>>
>>
>>
>> --
>> Clément Béra
>> https://clementbera.github.io/https://clementbera.wordpress.com/
>>
>>


-- 
Clément Béra
https://clementbera.github.io/
https://clementbera.wordpress.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20180630/f129e7c5/attachment.html>


More information about the Squeak-dev mailing list