[squeak-dev] [Pharo-dev] Why do we have SmallDictionary?

Fri Jun 8 07:37:21 UTC 2018

On 8 Jun 2018, at 9:30, Max Leske wrote:

> (Sorry, hit send accidentally).
>
>
> On 8 Jun 2018, at 9:02, Marcus Denker wrote:
>
>>> On 8 Jun 2018, at 08:46, Max Leske <maxleske at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I was messing around with SmallDictionary when I suddenly realised 
>>> that I can't find a single reason to use it over a normal 
>>> Dictionary. While its name and class comment imply that it is 
>>> somehow an optimised Dictionary, I don't see any measurement where 
>>> that actually holds up. The following was run in a Pharo 7 image on 
>>> a recent VM (see below):
>>>
>>> | d |
>>> d := SmallDictionary new.
>>> d sizeInMemory. "24"
>>> [100000 timesRepeat: [
>>> 	1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"
>>>
>>> [100000 timesRepeat: [
>>> 	d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>>>
>>>
>>>
>>> | d |
>>> d := Dictionary new.
>>> d sizeInMemory. "16"
>>> [100000 timesRepeat: [
>>> 	1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
>>> [100000 timesRepeat: [
>>> 	d at: 48 ifAbsent: [] ] ] timeToRun.  "0:00:00:00.006"
>>>
>>>
>> can you try with <10 elements?
>
> The results are in!
> Note that I had to increase the repetitions by a factor or 100.
>
>
> | d |
> d := SmallDictionary new.
> [10000000 timesRepeat: [
> 	1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.532"
>
> [10000000 timesRepeat: [
> 	d at: 4 ifAbsent: [] ] ] timeToRun.  "0:00:00:00.686"
>
>
>
> | d |
> d := Dictionary new.
> [10000000 timesRepeat: [
> 	1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.102"
>
> [10000000 timesRepeat: [
> 	d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.567"
>
>
> So SmallDictionary is indeed very fast for only few elements. Once the 
> SmallDictionary has to grow its performance decreases rapidly, or at 
> least the grow operation takes a long time.
> Dictionary still outperforms SmallDictionary though, so I guess the 
> only reason left is space efficency.
>
>
>>>
>>> Is anyone aware of a reason for hanging on to SmallDictionary? I'm 
>>> also curious to know how SmallDictionary came to be. There must have 
>>> been some advantage over Dictionary at some point in the past.
>>>
>> It came from RB… the idea was that there are (in the Refactoring 
>> engine) a lot of very small dictionaries with <10 elements.
>> The idea is that for such dictionaries, the overhead of hashing was 
>> higher than just linear search.
>>
>> I am sure I did a benchmark back then, but of course this could have 
>> changed in the meantime.
>>
>> It would be nice to redo the benchmark.

That would be very nice, as I just realised, that the type of object 
used as key has an effect on the hashing time... There are a lot of 
variables here, unfortunately.

Max

>>
>> For size: does #sizeInMemory take the internal data into account? e.g 
>> for Dictionary there is an Array that contains Associations, you need 
>> to count those, too.
>
> No, I didn't check for that initially. When I did, it became apparent 
> that the larger the dictionary the more space you save with 
> SmallDictionary.
>
>>
>> In the end this is a hack anyway: if needed Dictionary could do 
>> things like this transparently… without the user having to decide. 
>> And the question is if this is not just a bad side effect of the 
>> particular implementation.
>
> Good point. Sounds like what Stefan Marr wrote in the paper where he 
> performed a field study of collection libraries ;)
>
> Max
>
>>
>> 	Marcus