OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk:
Collections-mt.593.mcz)
Eliot Miranda
eliot.miranda at gmail.com
Fri Jan 16 21:39:40 UTC 2015
Hi Marcel,
On Fri, Jan 16, 2015 at 5:54 AM, Marcel Taeumel <
marcel.taeumel at student.hpi.uni-potsdam.de> wrote:
> Ah, okay. I was curious about the performance of iterating over an array
> vs.
> the linked list:
>
> a := Array new: 1000.
> l := LinkedList new.
>
> 1000 timesRepeat: [l add: (StackLink with: nil)].
>
> [a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"
>
> [l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"
>
> I expected the opposite! :-O Why is that so?
>
> Best,
> Marcel
>
OK, here's the profiling results, all on a .2GHz Core i7 MBP (and these
numbers are noisy; there's a lot else going on on my machine). First,
using Cog and a slightly modified benchmark that uses the elements of the
collections so that the benchmark is "real", and so that each block does
the same ammount of work
| a l |
a := Array new: 1000 withAll: 1 -> 1.
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: 1)].
{ [| t | t := 0. a do: [:ea | t := t + ea key]. t] bench.
[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench} #('51,900 per
second.' '70,400 per second.')
Now the same code on Spur:
#('57,100 per second.' '63,300 per second.')
at: is faster, but (alas) LinkedList>>do: is slower because in Spur #==
requires a read barrier to check for possible forwarders, so in "[aLink ==
nil] whileFalse:" Spur has to fetch a field from aLink to ensure it is not
forwarded.
At the VM level we can see that Object>>at: is expensive, but the block
overhead dominates. First Cog
/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.v3/Fast.app/Contents/MacOS/Squeak
1/16/2015
eden size: 2,097,152 stack pages: 160 code size: 1,048,576
| a |
a := Array new: 1000 withAll: 1 -> 1.
[| t | t := 0. a do: [:ea | t := t + ea key]. t] bench
gc prior. clear prior.
5.004 seconds; sampling frequency 1471 hz
7340 samples in the VM (7363 samples in the entire program) 99.69% of total
7302 samples in generated vm code 99.48% of entire vm (99.17% of total)
38 samples in vanilla vm code 0.52% of entire vm ( 0.52% of total)
% of generated vm code (% of total) (samples) (cumulative)
30.33% (30.08%) BlockClosure>>value: (2215) (30.33%)
18.47% (18.32%) Object>>at: (1349) (48.81%)
16.83% (16.69%) UndefinedObject>>DoIt (1229) (65.64%)
14.43% (14.31%) SequenceableCollection>>do: (1054) (80.07%)
9.57% ( 9.49%) LookupKey>>key (699) (89.65%)
6.35% ( 6.30%) SmallInteger>>+ (464) (96.00%)
3.75% ( 3.72%) PIC at: (274) (99.75%)
0.07% ( 0.07%) BlockClosure>>value (5) (99.82%)
0.05% ( 0.05%) BlockClosure>>bench (4) (99.88%)
0.04% ( 0.04%) PIC size (3) (99.92%)
0.04% ( 0.04%) ceClosureCopyTrampoline (3) (99.96%)
0.03% ( 0.03%) Time class>>mi...condClockValue (2) (99.99%)
0.01% ( 0.01%) ArrayedCollection>>size (1) (100.0%)
| l |
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: 1)].
[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench
gc prior. clear prior.
5.001 seconds; sampling frequency 1452 hz
7240 samples in the VM (7260 samples in the entire program) 99.72% of total
7186 samples in generated vm code 99.25% of entire vm (98.98% of total)
54 samples in vanilla vm code 0.75% of entire vm ( 0.74% of total)
% of generated vm code (% of total) (samples) (cumulative)
29.22% (28.93%) BlockClosure>>value: (2100) (29.22%)
26.32% (26.05%) UndefinedObject>>DoIt (1891) (55.54%)
22.92% (22.69%) LinkedList>>do: (1647) (78.46%)
8.14% ( 8.06%) SmallInteger>>+ (585) (86.60%)
6.76% ( 6.69%) StackLink>>element (486) (93.36%)
6.46% ( 6.39%) Link>>nextLink (464) (99.82%)
0.08% ( 0.08%) Time class>>...ndClockValue (6) (99.90%)
0.04% ( 0.04%) BlockClosure>>bench (3) (99.94%)
0.03% ( 0.03%) BlockClosure>>value (2) (99.97%)
0.01% ( 0.01%) ceClosureCopyTrampoline (1) (99.99%)
0.01% ( 0.01%) ceCreateNew...Trampoline (1) (100.0%)
and then Spur:
/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.spur/Fast.app/Contents/MacOS/Squeak
1/16/2015
eden size: 4,101,312 stack pages: 160 code size: 1,048,576
| a |
a := Array new: 1000 withAll: 1 -> 1.
[| t | t := 0. a do: [:ea | t := t + ea key]. t] bench
gc prior. clear prior.
5.002 seconds; sampling frequency 1471 hz
7340 samples in the VM (7359 samples in the entire program) 99.74% of total
7330 samples in generated vm code 99.86% of entire vm (99.61% of total)
10 samples in vanilla vm code 0.14% of entire vm ( 0.14% of total)
% of generated vm code (% of total) (samples) (cumulative)
33.06% (32.93%) BlockClosure>>value: (2423) (33.06%)
18.66% (18.59%) UndefinedObject>>DoIt (1368) (51.72%)
17.97% (17.90%) SequenceableCollection>>do: (1317) (69.69%)
13.33% (13.28%) Object>>at: (977) (83.02%)
8.84% ( 8.81%) SmallInteger>>+ (648) (91.86%)
4.90% ( 4.88%) PIC at: (359) (96.75%)
3.08% ( 3.07%) LookupKey>>key (226) (99.84%)
0.05% ( 0.05%) ceSmallBlockContext (4) (99.89%)
0.04% ( 0.04%) BlockClosure>>value (3) (99.93%)
0.03% ( 0.03%) Time class>>mi...condClockValue (2) (99.96%)
0.01% ( 0.01%) ArrayedCollection>>size (1) (99.97%)
0.01% ( 0.01%) BlockClosure>>bench (1) (99.99%)
0.01% ( 0.01%) EventSensor>>eventTickler (1) (100.0%)
| l |
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: 1)].
[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench
gc prior. clear prior.
5.002 seconds; sampling frequency 1464 hz
7315 samples in the VM (7321 samples in the entire program) 99.92% of total
7307 samples in generated vm code 99.89% of entire vm (99.81% of total)
8 samples in vanilla vm code 0.11% of entire vm ( 0.11% of total)
% of generated vm code (% of total) (samples) (cumulative)
27.33% (27.28%) UndefinedObject>>DoIt (1997) (27.33%)
27.11% (27.06%) LinkedList>>do: (1981) (54.44%)
24.39% (24.34%) BlockClosure>>value: (1782) (78.83%)
11.15% (11.13%) SmallInteger>>+ (815) (89.98%)
5.30% ( 5.29%) Link>>nextLink (387) (95.28%)
4.50% ( 4.49%) StackLink>>element (329) (99.78%)
0.12% ( 0.12%) BlockClosure>>bench (9) (99.90%)
0.07% ( 0.07%) Time class>>...ndClockValue (5) (99.97%)
0.01% ( 0.01%) ceSmallBlockContext (1) (99.99%)
0.01% ( 0.01%) BlockClosure>>value (1) (100.0%)
So to try and make sense of this we can say that the cost of at: vs the
cost of block evaluation is, in Cog
1349 + 274 / 2215 ~= 0.73
and in Spur
977 + 359 / 2423 ~= 0.55
at: being much faster in Spur because the object representation makes it
much simpler to do the type analysis (since Object>>at: works on any
indexable class) and bounds check.
and approximately the difference between the Array and the LinkedList comes
down to the relative costs of at: & SequenceableCollection>>#do: vs
nextLink & LinkedList>>#do:, in Cog
1349 + 274 + 1054 / (1647 + 464) ~= 1.27
and in Spur
977 + 359 / (1981 + 387) ~= 0.56
Now Sista will make a big difference because it won't just eliminate the
bounds checking overhead in at:, it'll also inline the block, so it'll be
much more like
| a |
a := Array new: 1000 withAll: 1 -> 1.
[| t | t := 0. 1 to: 1000 do: [:i| t := t + (a at: i) key]. t] bench
Cog: 72,700 per second.
Spur: 87,300 per second.
but faster still because the bounds check in at: is completely eliminated.
--
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20150116/dbdc1612/attachment.htm
More information about the Squeak-dev
mailing list
|