OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk: Collections-mt.593.mcz)

Eliot Miranda eliot.miranda at gmail.com
Fri Jan 16 21:39:40 UTC 2015


Hi Marcel,

On Fri, Jan 16, 2015 at 5:54 AM, Marcel Taeumel <
marcel.taeumel at student.hpi.uni-potsdam.de> wrote:

> Ah, okay. I was curious about the performance of iterating over an array
> vs.
> the linked list:
>
> a := Array new: 1000.
> l := LinkedList new.
>
> 1000 timesRepeat: [l add: (StackLink with: nil)].
>
> [a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"
>
> [l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"
>
> I expected the opposite! :-O Why is that so?
>
> Best,
> Marcel
>

OK, here's the profiling results, all on a .2GHz Core i7 MBP (and these
numbers are noisy; there's a lot else going on on my machine).  First,
using Cog and a slightly modified benchmark that uses the elements of the
collections so that the benchmark is "real", and so that each block does
the same ammount of work

| a l |
a := Array new: 1000 withAll: 1 -> 1.
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: 1)].
{ [| t | t := 0. a do: [:ea | t := t + ea key]. t] bench.
  [| t | t := 0. l do: [:ea | t := t + ea element]. t] bench} #('51,900 per
second.' '70,400 per second.')

Now the same code on Spur:
#('57,100 per second.' '63,300 per second.')

at: is faster, but (alas) LinkedList>>do: is slower because in Spur #==
requires a read barrier to check for possible forwarders, so in "[aLink ==
nil] whileFalse:" Spur has to fetch a field from aLink to ensure it is not
forwarded.

At the VM level we can see that Object>>at: is expensive, but the block
overhead dominates.  First Cog

/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.v3/Fast.app/Contents/MacOS/Squeak
 1/16/2015
eden size: 2,097,152  stack pages: 160  code size: 1,048,576

| a |
a := Array new: 1000 withAll: 1 -> 1.
[| t | t := 0. a do: [:ea | t := t + ea key]. t] bench

gc prior.  clear prior.
5.004 seconds; sampling frequency 1471 hz
7340 samples in the VM (7363 samples in the entire program)  99.69% of total

7302 samples in generated vm code 99.48% of entire vm (99.17% of total)
38 samples in vanilla vm code   0.52% of entire vm (  0.52% of total)

% of generated vm code (% of total) (samples) (cumulative)
30.33%    (30.08%) BlockClosure>>value: (2215) (30.33%)
18.47%    (18.32%) Object>>at: (1349) (48.81%)
16.83%    (16.69%) UndefinedObject>>DoIt  (1229) (65.64%)
14.43%    (14.31%) SequenceableCollection>>do: (1054) (80.07%)
  9.57%    (  9.49%) LookupKey>>key (699) (89.65%)
  6.35%    (  6.30%) SmallInteger>>+ (464) (96.00%)
  3.75%    (  3.72%) PIC at: (274) (99.75%)
  0.07%    (  0.07%) BlockClosure>>value (5) (99.82%)
  0.05%    (  0.05%) BlockClosure>>bench (4) (99.88%)
  0.04%    (  0.04%) PIC size (3) (99.92%)
  0.04%    (  0.04%) ceClosureCopyTrampoline (3) (99.96%)
  0.03%    (  0.03%) Time class>>mi...condClockValue (2) (99.99%)
  0.01%    (  0.01%) ArrayedCollection>>size (1) (100.0%)


| l |
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: 1)].
[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench

gc prior.  clear prior.
5.001 seconds; sampling frequency 1452 hz
7240 samples in the VM (7260 samples in the entire program)  99.72% of total

7186 samples in generated vm code 99.25% of entire vm (98.98% of total)
54 samples in vanilla vm code   0.75% of entire vm (  0.74% of total)

% of generated vm code (% of total) (samples) (cumulative)
29.22%    (28.93%) BlockClosure>>value: (2100) (29.22%)
26.32%    (26.05%) UndefinedObject>>DoIt (1891) (55.54%)
22.92%    (22.69%) LinkedList>>do: (1647) (78.46%)
  8.14%    (  8.06%) SmallInteger>>+ (585) (86.60%)
  6.76%    (  6.69%) StackLink>>element (486) (93.36%)
  6.46%    (  6.39%) Link>>nextLink (464) (99.82%)
  0.08%    (  0.08%) Time class>>...ndClockValue (6) (99.90%)
  0.04%    (  0.04%) BlockClosure>>bench (3) (99.94%)
  0.03%    (  0.03%) BlockClosure>>value (2) (99.97%)
  0.01%    (  0.01%) ceClosureCopyTrampoline (1) (99.99%)
  0.01%    (  0.01%) ceCreateNew...Trampoline (1) (100.0%)


and then Spur:

/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.spur/Fast.app/Contents/MacOS/Squeak
 1/16/2015
eden size: 4,101,312  stack pages: 160  code size: 1,048,576

| a |
a := Array new: 1000 withAll: 1 -> 1.
[| t | t := 0. a do: [:ea | t := t + ea key]. t] bench

gc prior.  clear prior.
5.002 seconds; sampling frequency 1471 hz
7340 samples in the VM (7359 samples in the entire program)  99.74% of total

7330 samples in generated vm code 99.86% of entire vm (99.61% of total)
10 samples in vanilla vm code   0.14% of entire vm (  0.14% of total)

% of generated vm code (% of total) (samples) (cumulative)
33.06%    (32.93%) BlockClosure>>value: (2423) (33.06%)
18.66%    (18.59%) UndefinedObject>>DoIt (1368) (51.72%)
17.97%    (17.90%) SequenceableCollection>>do: (1317) (69.69%)
13.33%    (13.28%) Object>>at: (977) (83.02%)
  8.84%    (  8.81%) SmallInteger>>+ (648) (91.86%)
  4.90%    (  4.88%) PIC at: (359) (96.75%)
  3.08%    (  3.07%) LookupKey>>key (226) (99.84%)
  0.05%    (  0.05%) ceSmallBlockContext (4) (99.89%)
  0.04%    (  0.04%) BlockClosure>>value (3) (99.93%)
  0.03%    (  0.03%) Time class>>mi...condClockValue (2) (99.96%)
  0.01%    (  0.01%) ArrayedCollection>>size (1) (99.97%)
  0.01%    (  0.01%) BlockClosure>>bench (1) (99.99%)
  0.01%    (  0.01%) EventSensor>>eventTickler (1) (100.0%)

| l |
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: 1)].
[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench

gc prior.  clear prior.
5.002 seconds; sampling frequency 1464 hz
7315 samples in the VM (7321 samples in the entire program)  99.92% of total

7307 samples in generated vm code 99.89% of entire vm (99.81% of total)
8 samples in vanilla vm code   0.11% of entire vm (  0.11% of total)

% of generated vm code (% of total) (samples) (cumulative)
27.33%    (27.28%) UndefinedObject>>DoIt (1997) (27.33%)
27.11%    (27.06%) LinkedList>>do: (1981) (54.44%)
24.39%    (24.34%) BlockClosure>>value: (1782) (78.83%)
11.15%    (11.13%) SmallInteger>>+ (815) (89.98%)
  5.30%    (  5.29%) Link>>nextLink (387) (95.28%)
  4.50%    (  4.49%) StackLink>>element (329) (99.78%)
  0.12%    (  0.12%) BlockClosure>>bench (9) (99.90%)
  0.07%    (  0.07%) Time class>>...ndClockValue (5) (99.97%)
  0.01%    (  0.01%) ceSmallBlockContext (1) (99.99%)
  0.01%    (  0.01%) BlockClosure>>value (1) (100.0%)


So to try and make sense of this we can say that the cost of at: vs the
cost of block evaluation is, in Cog
1349 + 274 / 2215 ~= 0.73

and in Spur
977 + 359 / 2423 ~= 0.55

at: being much faster in Spur because the object representation makes it
much simpler to do the type analysis (since Object>>at: works on any
indexable class) and bounds check.


and approximately the difference between the Array and the LinkedList comes
down to the relative costs of at: & SequenceableCollection>>#do: vs
nextLink & LinkedList>>#do:, in Cog

1349 + 274 + 1054 / (1647 + 464) ~= 1.27

and in Spur

977 + 359 / (1981 + 387) ~= 0.56


Now Sista will make a big difference because it won't just eliminate the
bounds checking overhead in at:, it'll also inline the block, so it'll be
much more like

| a |
a := Array new: 1000 withAll: 1 -> 1.
[| t | t := 0. 1 to: 1000 do: [:i| t := t + (a at: i) key]. t] bench

Cog: 72,700 per second.
Spur: 87,300 per second.

but faster still because the bounds check in at: is completely eliminated.
-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20150116/dbdc1612/attachment.htm


More information about the Squeak-dev mailing list