OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk: Collections-mt.593.mcz)

Levente Uzonyi leves at elte.hu
Fri Jan 16 16:55:23 UTC 2015


On Fri, 16 Jan 2015, Eliot Miranda wrote:

>
>
> On Jan 16, 2015, at 7:53 AM, Bert Freudenberg <bert at freudenbergs.de> wrote:
>
>>
>>> On 16.01.2015, at 16:36, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>>
>>> Hi Marcel,
>>>
>>> On Jan 16, 2015, at 5:54 AM, Marcel Taeumel <marcel.taeumel at student.hpi.uni-potsdam.de> wrote:
>>>
>>>> Ah, okay. I was curious about the performance of iterating over an array vs.
>>>> the linked list:
>>>>
>>>> a := Array new: 1000.
>>>> l := LinkedList new.
>>>>
>>>> 1000 timesRepeat: [l add: (StackLink with: nil)].
>>>>
>>>> [a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"
>>>>
>>>> [l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"
>>>>
>>>> I expected the opposite! :-O Why is that so?
>>>
>>> Look at the implementations of do:.
>>>
>>> In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element.
>>>
>>> In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element.
>>
>> Making it all the more surprising that the linked list is twice as fast.
>
> Ugh.  I was blindsided.  I saw the numbers as times not rates.  Marcel, I'm sorry.  OK indeed the difference would be the cost of the bounds check in each array access.  Interesting to see at what point the poorer memory locality of the list makes it worse.  Difficult to measure given the locality is decided by history and the GC.

In general the locality of the links are worse. In the example they are 
allocated almost next to each other.
A slightly better example would be this:

| buffer |
buffer := Array new: 1000. "This will hold the filler objects."
#(1024 2048 4096 8192 16384 32768 65536) collect: [ :fillerSize |
 	| list |
 	list := LinkedList new.
 	1 to: 1000 do: [ :index |
 		list add: (StackLink with: nil).
 		buffer at: index put: (Array new: fillerSize) ].
 	Smalltalk garbageCollect.
 	fillerSize -> [ list do: [ :ea | 1 + 2 ] ] bench ]

{ "gap between nodes (x4 bytes)" -> "runs"
1024->'78,000 per second.'.
2048->'78,600 per second.'.
4096->'77,500 per second.'.
8192->'73,300 per second.'.
16384->'70,200 per second.'. 
32768->'66,400 per second.'.
65536->'56,400 per second.'}

(The result is '45,700 per second.' for Arrays)

The results depend on the CPU (and probably on the OS too), but the 
numbers are still pretty good. Probably a longer list would cause more 
problem for the caches.
The same for 10k elements in the list:

{
1024->'3,770 per second.'.
2048->'4,780 per second.'.
4096->'4,220 per second.'.
8192->'2,480 per second.'.
16384->'2,110 per second.'}

(The result is '4,380 per second.' for Arrays)

So the speed of a list depends on the cache locality of its elements. The 
performance is still comparable with arrays for practical cases, but may 
be both faster and slower depending on the cache locality.

Levente

>
>
>>
>> My guess would be that the two inst var reads are way more efficient than the at: primitive.
>>
>> - Bert -
>>
>>
>>
>
>


More information about the Squeak-dev mailing list