[squeak-dev] [Pharo-dev] Why do we have SmallDictionary?
Max Leske
maxleske at gmail.com
Fri Jun 8 07:37:21 UTC 2018
On 8 Jun 2018, at 9:30, Max Leske wrote:
> (Sorry, hit send accidentally).
>
>
> On 8 Jun 2018, at 9:02, Marcus Denker wrote:
>
>>> On 8 Jun 2018, at 08:46, Max Leske <maxleske at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I was messing around with SmallDictionary when I suddenly realised
>>> that I can't find a single reason to use it over a normal
>>> Dictionary. While its name and class comment imply that it is
>>> somehow an optimised Dictionary, I don't see any measurement where
>>> that actually holds up. The following was run in a Pharo 7 image on
>>> a recent VM (see below):
>>>
>>> | d |
>>> d := SmallDictionary new.
>>> d sizeInMemory. "24"
>>> [100000 timesRepeat: [
>>> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"
>>>
>>> [100000 timesRepeat: [
>>> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>>>
>>>
>>>
>>> | d |
>>> d := Dictionary new.
>>> d sizeInMemory. "16"
>>> [100000 timesRepeat: [
>>> 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
>>> [100000 timesRepeat: [
>>> d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"
>>>
>>>
>> can you try with <10 elements?
>
> The results are in!
> Note that I had to increase the repetitions by a factor or 100.
>
>
> | d |
> d := SmallDictionary new.
> [10000000 timesRepeat: [
> 1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.532"
>
> [10000000 timesRepeat: [
> d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.686"
>
>
>
> | d |
> d := Dictionary new.
> [10000000 timesRepeat: [
> 1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.102"
>
> [10000000 timesRepeat: [
> d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.567"
>
>
> So SmallDictionary is indeed very fast for only few elements. Once the
> SmallDictionary has to grow its performance decreases rapidly, or at
> least the grow operation takes a long time.
> Dictionary still outperforms SmallDictionary though, so I guess the
> only reason left is space efficency.
>
>
>>>
>>> Is anyone aware of a reason for hanging on to SmallDictionary? I'm
>>> also curious to know how SmallDictionary came to be. There must have
>>> been some advantage over Dictionary at some point in the past.
>>>
>> It came from RB… the idea was that there are (in the Refactoring
>> engine) a lot of very small dictionaries with <10 elements.
>> The idea is that for such dictionaries, the overhead of hashing was
>> higher than just linear search.
>>
>> I am sure I did a benchmark back then, but of course this could have
>> changed in the meantime.
>>
>> It would be nice to redo the benchmark.
That would be very nice, as I just realised, that the type of object
used as key has an effect on the hashing time... There are a lot of
variables here, unfortunately.
Max
>>
>> For size: does #sizeInMemory take the internal data into account? e.g
>> for Dictionary there is an Array that contains Associations, you need
>> to count those, too.
>
> No, I didn't check for that initially. When I did, it became apparent
> that the larger the dictionary the more space you save with
> SmallDictionary.
>
>>
>> In the end this is a hack anyway: if needed Dictionary could do
>> things like this transparently… without the user having to decide.
>> And the question is if this is not just a bad side effect of the
>> particular implementation.
>
> Good point. Sounds like what Stefan Marr wrote in the paper where he
> performed a field study of collection libraries ;)
>
> Max
>
>>
>> Marcus
More information about the Squeak-dev
mailing list
|