[squeak-dev] [Pharo-dev] Why do we have SmallDictionary?

Max Leske maxleske at gmail.com
Fri Jun 8 07:30:45 UTC 2018


(Sorry, hit send accidentally).


On 8 Jun 2018, at 9:02, Marcus Denker wrote:

>> On 8 Jun 2018, at 08:46, Max Leske <maxleske at gmail.com> wrote:
>>
>> Hi,
>>
>> I was messing around with SmallDictionary when I suddenly realised 
>> that I can't find a single reason to use it over a normal Dictionary. 
>> While its name and class comment imply that it is somehow an 
>> optimised Dictionary, I don't see any measurement where that actually 
>> holds up. The following was run in a Pharo 7 image on a recent VM 
>> (see below):
>>
>> | d |
>> d := SmallDictionary new.
>> d sizeInMemory. "24"
>> [100000 timesRepeat: [
>> 	1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"
>>
>> [100000 timesRepeat: [
>> 	d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
>>
>>
>>
>> | d |
>> d := Dictionary new.
>> d sizeInMemory. "16"
>> [100000 timesRepeat: [
>> 	1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
>> [100000 timesRepeat: [
>> 	d at: 48 ifAbsent: [] ] ] timeToRun.  "0:00:00:00.006"
>>
>>
> can you try with <10 elements?

The results are in!
Note that I had to increase the repetitions by a factor or 100.


| d |
d := SmallDictionary new.
[10000000 timesRepeat: [
	1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.532"

[10000000 timesRepeat: [
	d at: 4 ifAbsent: [] ] ] timeToRun.  "0:00:00:00.686"



| d |
d := Dictionary new.
[10000000 timesRepeat: [
	1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.102"

[10000000 timesRepeat: [
	d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.567"


So SmallDictionary is indeed very fast for only few elements. Once the 
SmallDictionary has to grow its performance decreases rapidly, or at 
least the grow operation takes a long time.
Dictionary still outperforms SmallDictionary though, so I guess the only 
reason left is space efficency.


>>
>> Is anyone aware of a reason for hanging on to SmallDictionary? I'm 
>> also curious to know how SmallDictionary came to be. There must have 
>> been some advantage over Dictionary at some point in the past.
>>
> It came from RB… the idea was that there are (in the Refactoring 
> engine) a lot of very small dictionaries with <10 elements.
> The idea is that for such dictionaries, the overhead of hashing was 
> higher than just linear search.
>
> I am sure I did a benchmark back then, but of course this could have 
> changed in the meantime.
>
> It would be nice to redo the benchmark.
>
> For size: does #sizeInMemory take the internal data into account? e.g 
> for Dictionary there is an Array that contains Associations, you need 
> to count those, too.

No, I didn't check for that initially. When I did, it became apparent 
that the larger the dictionary the more space you save with 
SmallDictionary.

>
> In the end this is a hack anyway: if needed Dictionary could do things 
> like this transparently… without the user having to decide. And the 
> question is if this is not just a bad side effect of the particular 
> implementation.

Good point. Sounds like what Stefan Marr wrote in the paper where he 
performed a field study of collection libraries ;)

Max

>
> 	Marcus


More information about the Squeak-dev mailing list