[squeak-dev] [Pharo-dev] Why do we have SmallDictionary?

Sat Jun 30 16:31:25 UTC 2018

This discussion has limited its context to comparing *in-memory* Dictionary
implementations, where open-addressing has an advantage.
Using Magma reveals another context where chained Dictionary's can have
some advantages:  When they're large, persistent, and shared in a
multi-user system -- e.g., when grow rehashing is not an option.

Not only do chained Dictionary's make that scenario possible, they improve
the concurrency -- different users can add/change/remove from the
Dictionary as long as they occur at different keys.  With regular
Dictionary's, concurrency guarantees a commit conflict.

 - Chris

On Fri, Jun 29, 2018 at 2:56 PM, Peter Crowther <peter at ozzard.org> wrote:

> In an environment with garbage collection, it's worth considering the
> overhead of one's representation on the memory allocator and the garbage
> collector.  Open addressing adds objects - the exact number of these varies
> - and hence adds GC overhead (= paying extra cycles for the life of the
> dictionary even if it never changes).  Each object also has some bytes of
> overhead for headers, so chaining becomes less efficient as collisions
> occur due to extra object allocations.
>
> Also, modern processors and memory systems are very good at reading
> forward in memory, as there's often a wide memory bus (and sometimes other
> optimisations).  This means it's often quick to read the next entry as it's
> already been pulled in due to adjacency.  By contrast, chaining puts
> collisions all over memory, reducing adjacency and increasing the bandwidth
> required to memory.
>
> I'm unconvinced that chaining dictionaries are sensible in modern
> languages with garbage collection on modern architectures where bandwidth
> to memory, and cache space, are both scarce resources.  The days of memory
> being truly random-access are long gone, and too few programmers realise
> the impact that can have.
>
> Cheers,
>
> - Peter
>
>
> On 29 June 2018 at 08:27, Clément Bera <bera.clement at gmail.com> wrote:
>
>> Hum, about dictionary implementations, I've just found this [1]:
>> Open addressing vs. chaining
>>   *Chaining* *Open addressing*
>> *Collision resolution* Using external data structure Using hash table
>> itself
>> *Memory waste* Pointer size overhead per entry (storing list heads in
>> the table) No overhead 1
>> *Performance dependence on table's load factor* Directly proportional Proportional
>> to (loadFactor) / (1 - loadFactor)
>> *Allow to store more items, than hash table size* Yes No. Moreover, it's
>> recommended to keep table's load factor below 0.7
>> *Hash function requirements* Uniform distribution Uniform distribution,
>> should avoid clustering
>> *Handle removals* Removals are ok Removals clog the hash table with
>> "DELETED" entries
>> *Implementation* Simple Correct implementation of open addressing based
>> hash table is quite tricky1 Hash tables with chaining can work
>> efficiently even with load factor more than 1. At the same time, tables
>> based on open addressing scheme require load factor not to exceed 0.7 to be
>> efficient. Hence, 30% of slots remain empty, which leads to obvious memory
>> waste.
>>
>> Do you guys agree with this?
>>
>> Their open addressing implementation is slightly different from the
>> Smalltalk one (they have deleted entries upon removal while we fix
>> collisions, at least in Pharo 6).
>> Their chaining implementation does not use balanced binary tree when a
>> bucket becomes large like Java's implementation.
>>
>> I don't really understand why an open addressing implementation is
>> trickier to implement, in high level languages such as Smalltalk I may
>> agree that it might be easier to implement chaining, but in low level
>> language such as in the VM, IMO open addressing is easier since you've got
>> only one large chunk of memory to manage. Maybe it's because of hash
>> collisions, having good hash and good hash collection size
>> (See HashTableSizes) is not that easy.
>>
>> I'm curious because I do more and more bare metal programming and I end
>> up having to implement this kind of things myself, I've always implemented
>> naive open addressing up until now without really understanding details.
>>
>> [1] http://www.algolist.net/Data_structures/Hash_table/Open_addressing
>>
>> On Sat, Jun 16, 2018 at 8:49 PM, Max Leske <maxleske at gmail.com> wrote:
>>
>>>
>>>
>>> On 8 Jun 2018, at 09:48, Stéphane Rollandin wrote:
>>>
>>> FWIW it seems there is no SmallDictionary in Squeak.
>>>>
>>>
>>> Oh... Thanks Stéf, I wasn't aware of that.
>>>
>>>
>>>
>>> On 8 June 2018, at 15:13, John Brant wrote:
>>>
>>> Is anyone aware of a reason for hanging on to SmallDictionary? I'm also
>>>>>> curious to know how SmallDictionary came to be. There must have been some
>>>>>> advantage over Dictionary at some point in the past.
>>>>>>
>>>>>
>>>>> It came from RB… the idea was that there are (in the Refactoring
>>>>> engine) a lot of very small dictionaries with <10 elements.
>>>>> The idea is that for such dictionaries, the overhead of hashing was
>>>>> higher than just linear search.
>>>>>
>>>>
>>>> I created its ancestor in VW some 20+ years ago (as RBSmallDictionary).
>>>> It was used when doing pattern matching. When it performs a pattern
>>>> match against an AST, it puts the potential value of the pattern
>>>> variable in the dictionary. If the value is used later in the pattern,
>>>> then we can get the previous value and make sure that we have an
>>>> equivalent AST. This allows you to write patterns like:
>>>>
>>>>         `@a = `@a
>>>>
>>>> to find where someone has the same expression on both sides of the #=
>>>> message. Since most patterns have very few pattern variables, these
>>>> dictionaries won't hold many entries. Furthermore, we are likely to
>>>> abort the match when we have 0 entries.
>>>>
>>>> The original RBSmallDictionary had an #empty method that "emptied" the
>>>> dictionary without actually removing any elements -- it just set the
>>>> size to 0. In a general dictionary this would lead to memory leaks since
>>>> the previous values are still held by the dictionary. However, these
>>>> dictionaries were only used during the matching process and went away
>>>> after the process completed.
>>>>
>>>> Anyway, at the time when we converted our pattern matching code from
>>>> using the VW parser with our pattern matching extensions to use the new
>>>> RB parser with pattern matching, the time to run Smalllint on the image
>>>> was cut in half even though our parser was quite a bit slower than the
>>>> VW parser. I don't remember everything that was done, but I think that
>>>> most of the speedup came from having special pattern AST nodes and the
>>>> small dictionary.
>>>>
>>>>
>>>> John Brant
>>>>
>>>
>>>
>>> Very interesting! Thanks John!
>>>
>>> As Marcus has mentioned before in this thread, it would make a lot of
>>> sense to run benchmarks again. Actually, I think it would be nice to have a
>>> benchmark suite for these cases, that would let us monitor the performance
>>> and ensure that changes to the codebase don't have a deteriorative effect.
>>> I'm not saying that it would be easy to make this happen, writing proper
>>> benchmarks is hard (for me especially, as it seems, given my utter failure
>>> to think of the edge cases before starting this thread). Such a suite might
>>> also prevent these sorts of questions on the mailing list in the future, or
>>> at least might make it easier to answer them.
>>>
>>>
>>>
>>>
>>> On 8 June 2018, at 13:01, Andres Valloud wrote:
>>>
>>> In addition, open addressing with linear probing has superior cache line
>>>> read behavior (no indirection / random traversal, and if the first probe
>>>> misses the second one was likely cached by the first one).
>>>>
>>>
>>>
>>> Ah, nice catch! Although that would require frequent access to the
>>> dictionary / repeated access to the same items to have an effect, wouldn't
>>> it?
>>>
>>>
>>>
>>> On 8 Jun 2018, at 10:01, Clément Bera wrote:
>>>
>>> Hi Max,
>>>>
>>>> Theoretically, for a small number of elements, usually somewhere
>>>> between 3
>>>> and 30 depending on implementations, a linear search is faster than a
>>>> hash
>>>> search, especially in the Pharo dictionary hash search implementation.
>>>>
>>>> Efficient dictionary implementations are usually bucket-based. The
>>>> dictionary holds a certain number of buckets, and based on the key hash,
>>>> the bucket where the key value is present is determined. Small buckets
>>>> are
>>>> linear (arrays or linked list). Large buckets are typically balanced
>>>> binary
>>>> trees (red-black trees typically). Under a certain number of elements
>>>> there
>>>> is a single bucket, which means a linear search is performed, as for the
>>>> SmallDictionary. When it grows the dictionary search becomes a
>>>> combination
>>>> between a hash search and a linear or tree search.
>>>>
>>>> Pharo dictionary search is first hash-based, then all the buckets are
>>>> represented next to each other in the same arrays and a linear search is
>>>> performed there, leading to many collisions and slower search time
>>>> (especially when the element is not found), sometimes the code searches
>>>> over multiple buckets because the dictionary is too full or there are
>>>> too
>>>> many near-collisions. The memory consumption is competitive with the
>>>> advanced implementations though (precise measurements would need to be
>>>> made).
>>>>
>>>> Method dictionaries are represented differently to optimize the look-up
>>>> logic.
>>>>
>>>> If you want to improve things and have one dictionary implementation
>>>> instead of two, implement or look for a bucket based dictionary and put
>>>> it
>>>> in the base image instead of Dictionary. This is quite some work since
>>>> there are many APIs to port. You can look at the Pinnochio
>>>> implementation,
>>>> it's quite good but they've not implemented large buckets.
>>>>
>>>
>>> Thanks for the detailed explanations Clément and Levente. I'll probably
>>> not add a new dictionary implementation ;)
>>>
>>>
>>>
>>>>
>>>>
>>>> On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <maxleske at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> I was messing around with SmallDictionary when I suddenly realised
>>>>> that I
>>>>> can't find a single reason to use it over a normal Dictionary. While
>>>>> its
>>>>> name and class comment imply that it is somehow an optimised
>>>>> Dictionary, I
>>>>> don't see any measurement where that actually holds up. The following
>>>>> was
>>>>> run in a Pharo 7 image on a recent VM (see below):
>>>>>
>>>>> | d |
>>>>> d := SmallDictionary new.
>>>>> d sizeInMemory. "24"
>>>>> [100000 timesRepeat: [
>>>>>         1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun.
>>>>> "0:00:00:05.226"
>>>>>
>>>>> [100000 timesRepeat: [
>>>>>         d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>>>>>
>>>>>
>>>>>
>>>>> | d |
>>>>> d := Dictionary new.
>>>>> d sizeInMemory. "16"
>>>>> [100000 timesRepeat: [
>>>>>         1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun.
>>>>> "0:00:00:00.385"
>>>>> [100000 timesRepeat: [
>>>>>         d at: 48 ifAbsent: [] ] ] timeToRun.  "0:00:00:00.006"
>>>>>
>>>>>
>>>>> As you can see, SmallDictionary is 8 bytes larger per instance and
>>>>> significantly faster while reading and writing (I know that this isn't
>>>>> a
>>>>> good benchmark but it suffices to make my point).
>>>>>
>>>>>
>>>>> Is anyone aware of a reason for hanging on to SmallDictionary? I'm also
>>>>> curious to know how SmallDictionary came to be. There must have been
>>>>> some
>>>>> advantage over Dictionary at some point in the past.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Max
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Image version: Pharo 7.0
>>>>> Build information: Pharo-7.0+alpha.build.961.sha.
>>>>> a69e72a97136bc3f93831584b6efa2b1703deb84 (32 Bit)
>>>>>
>>>>> VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
>>>>> 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
>>>>> StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
>>>>> 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
>>>>> VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git
>>>>> $
>>>>> Date: Mon Nov 27 00:36:29 2017 +0100 $ Plugins: 201711262336
>>>>> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
>>>>>
>>>>> OS: macOS 10.13.5
>>>>> Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Clément Béra
>>>> https://clementbera.github.io/
>>>> https://clementbera.wordpress.com/
>>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Clément Béra
>> https://clementbera.github.io/
>> https://clementbera.wordpress.com/
>>
>>
>>
>>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20180630/abc94c2b/attachment.html>