<div dir="ltr">Hi Marcel,<div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jan 16, 2015 at 5:54 AM, Marcel Taeumel <span dir="ltr"><<a href="mailto:marcel.taeumel@student.hpi.uni-potsdam.de" target="_blank">marcel.taeumel@student.hpi.uni-potsdam.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Ah, okay. I was curious about the performance of iterating over an array vs.<br>
the linked list:<br>
<br>
a := Array new: 1000.<br>
l := LinkedList new.<br>
<br>
1000 timesRepeat: [l add: (StackLink with: nil)].<br>
<br>
[a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"<br>
<br>
[l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"<br>
<br>
I expected the opposite! :-O Why is that so?<br>
<br>
Best,<br>
Marcel<br></blockquote><div><br></div><div>OK, here's the profiling results, all on a .2GHz Core i7 MBP (and these numbers are noisy; there's a lot else going on on my machine). First, using Cog and a slightly modified benchmark that uses the elements of the collections so that the benchmark is "real", and so that each block does the same ammount of work</div><div><br></div><div><div>| a l |</div><div>a := Array new: 1000 withAll: 1 -> 1.</div><div>l := LinkedList new.</div><div>1000 timesRepeat: [l add: (StackLink with: 1)].</div><div>{ [| t | t := 0. a do: [:ea | t := t + ea key]. t] bench.</div><div> [| t | t := 0. l do: [:ea | t := t + ea element]. t] bench} #('51,900 per second.' '70,400 per second.')</div></div><div><br></div><div>Now the same code on Spur:</div><div>#('57,100 per second.' '63,300 per second.')<br></div><div><br></div><div>at: is faster, but (alas) LinkedList>>do: is slower because in Spur #== requires a read barrier to check for possible forwarders, so in "[aLink == nil] whileFalse:" Spur has to fetch a field from aLink to ensure it is not forwarded.</div><div><br></div><div>At the VM level we can see that Object>>at: is expensive, but the block overhead dominates. First Cog</div><div><br></div><div><div>/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.v3/Fast.app/Contents/MacOS/Squeak 1/16/2015 </div><div>eden size: 2,097,152 stack pages: 160 code size: 1,048,576</div><div><br></div><div>| a |</div><div>a := Array new: 1000 withAll: 1 -> 1.</div><div>[| t | t := 0. a do: [:ea | t := t + ea key]. t] bench</div><div><br></div><div>gc prior. clear prior. </div><div>5.004 seconds; sampling frequency 1471 hz</div><div>7340 samples in the VM<span class="" style="white-space:pre">        </span>(7363 samples in the entire program) 99.69% of total</div><div><br></div><div>7302 samples in generated vm code 99.48% of entire vm (99.17% of total)</div><div>38 samples in vanilla vm code 0.52% of entire vm ( 0.52% of total)</div><div><br></div><div>% of generated vm code (% of total)<span class="" style="white-space:pre">                                </span>(samples) (cumulative)</div><div>30.33% (30.08%)<span class="" style="white-space:pre">        </span>BlockClosure>>value:<span class="" style="white-space:pre">                        </span>(2215)<span class="" style="white-space:pre">        </span>(30.33%)</div><div>18.47% (18.32%)<span class="" style="white-space:pre">        </span>Object>>at:<span class="" style="white-space:pre">                                        </span>(1349)<span class="" style="white-space:pre">        </span>(48.81%)</div><div>16.83% (16.69%)<span class="" style="white-space:pre">        </span>UndefinedObject>>DoIt<span class="" style="white-space:pre">        </span><span style="white-space:pre">        </span><span class="" style="white-space:pre">        </span>(1229)<span class="" style="white-space:pre">        </span>(65.64%)</div><div>14.43% (14.31%)<span class="" style="white-space:pre">        </span>SequenceableCollection>>do:<span class="" style="white-space:pre">        </span>(1054)<span class="" style="white-space:pre">        </span>(80.07%)</div><div> 9.57% ( 9.49%)<span class="" style="white-space:pre">        </span>LookupKey>>key<span class="" style="white-space:pre">                                </span>(699)<span class="" style="white-space:pre">        </span>(89.65%)</div><div> 6.35% ( 6.30%)<span class="" style="white-space:pre">        </span>SmallInteger>>+<span class="" style="white-space:pre">                                </span>(464)<span class="" style="white-space:pre">        </span>(96.00%)</div><div> 3.75% ( 3.72%)<span class="" style="white-space:pre">        </span>PIC at:<span class="" style="white-space:pre">                                                </span>(274)<span class="" style="white-space:pre">        </span>(99.75%)</div><div> 0.07% ( 0.07%)<span class="" style="white-space:pre">        </span>BlockClosure>>value<span class="" style="white-space:pre">                        </span>(5)<span class="" style="white-space:pre">                </span>(99.82%)</div><div> 0.05% ( 0.05%)<span class="" style="white-space:pre">        </span>BlockClosure>>bench<span class="" style="white-space:pre">                        </span>(4)<span class="" style="white-space:pre">                </span>(99.88%)</div><div> 0.04% ( 0.04%)<span class="" style="white-space:pre">        </span>PIC size<span class="" style="white-space:pre">                                                </span>(3)<span class="" style="white-space:pre">                </span>(99.92%)</div><div> 0.04% ( 0.04%)<span class="" style="white-space:pre">        </span>ceClosureCopyTrampoline<span class="" style="white-space:pre">                </span>(3)<span class="" style="white-space:pre">                </span>(99.96%)</div><div> 0.03% ( 0.03%)<span class="" style="white-space:pre">        </span>Time class>>mi...condClockValue<span style="white-space:pre">        </span>(2)<span class="" style="white-space:pre">                </span>(99.99%)</div><div> 0.01% ( 0.01%)<span class="" style="white-space:pre">        </span>ArrayedCollection>>size<span class="" style="white-space:pre">                </span><span style="white-space:pre">        </span>(1)<span class="" style="white-space:pre">                </span>(100.0%)</div></div><div><br></div><div><br></div><div><div>| l |</div><div>l := LinkedList new.</div><div>1000 timesRepeat: [l add: (StackLink with: 1)].</div><div>[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench</div><div><br></div><div><div>gc prior. clear prior. </div><div>5.001 seconds; sampling frequency 1452 hz</div><div>7240 samples in the VM<span class="" style="white-space:pre">        </span>(7260 samples in the entire program) 99.72% of total</div><div><br></div><div>7186 samples in generated vm code 99.25% of entire vm (98.98% of total)</div><div>54 samples in vanilla vm code 0.75% of entire vm ( 0.74% of total)</div><div><br></div><div>% of generated vm code (% of total)<span class="" style="white-space:pre">                        </span>(samples) (cumulative)</div><div>29.22% (28.93%)<span class="" style="white-space:pre">        </span>BlockClosure>>value:<span class="" style="white-space:pre">                </span>(2100)<span class="" style="white-space:pre">        </span>(29.22%)</div><div>26.32% (26.05%)<span class="" style="white-space:pre">        </span>UndefinedObject>>DoIt<span style="white-space:pre">        </span><span class="" style="white-space:pre">        </span>(1891)<span class="" style="white-space:pre">        </span>(55.54%)</div><div>22.92% (22.69%)<span class="" style="white-space:pre">        </span>LinkedList>>do:<span class="" style="white-space:pre">                        </span>(1647)<span class="" style="white-space:pre">        </span>(78.46%)</div><div> 8.14% ( 8.06%)<span class="" style="white-space:pre">        </span>SmallInteger>>+<span class="" style="white-space:pre">                        </span>(585)<span class="" style="white-space:pre">        </span>(86.60%)</div><div> 6.76% ( 6.69%)<span class="" style="white-space:pre">        </span>StackLink>>element<span class="" style="white-space:pre">                </span>(486)<span class="" style="white-space:pre">        </span>(93.36%)</div><div> 6.46% ( 6.39%)<span class="" style="white-space:pre">        </span>Link>>nextLink<span class="" style="white-space:pre">                        </span>(464)<span class="" style="white-space:pre">        </span>(99.82%)</div><div> 0.08% ( 0.08%)<span class="" style="white-space:pre">        </span>Time class>>...ndClockValue<span style="white-space:pre">        </span>(6)<span class="" style="white-space:pre">                </span>(99.90%)</div><div> 0.04% ( 0.04%)<span class="" style="white-space:pre">        </span>BlockClosure>>bench<span class="" style="white-space:pre">                </span>(3)<span class="" style="white-space:pre">                </span>(99.94%)</div><div> 0.03% ( 0.03%)<span class="" style="white-space:pre">        </span>BlockClosure>>value<span class="" style="white-space:pre">                </span>(2)<span class="" style="white-space:pre">                </span>(99.97%)</div><div> 0.01% ( 0.01%)<span class="" style="white-space:pre">        </span>ceClosureCopyTrampoline<span class="" style="white-space:pre">        </span>(1)<span class="" style="white-space:pre">                </span>(99.99%)</div><div> 0.01% ( 0.01%)<span class="" style="white-space:pre">        </span>ceCreateNew...Trampoline<span class="" style="white-space:pre">        </span>(1)<span class="" style="white-space:pre">                </span>(100.0%)</div></div></div><div><br></div><div><br></div><div>and then Spur:</div><div><br></div><div><div>/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.spur/Fast.app/Contents/MacOS/Squeak 1/16/2015 </div><div>eden size: 4,101,312 stack pages: 160 code size: 1,048,576</div><div><br></div><div><div>| a |</div><div>a := Array new: 1000 withAll: 1 -> 1.</div><div>[| t | t := 0. a do: [:ea | t := t + ea key]. t] bench</div></div><div><br></div><div>gc prior. clear prior. </div><div>5.002 seconds; sampling frequency 1471 hz</div><div>7340 samples in the VM<span class="" style="white-space:pre">        </span>(7359 samples in the entire program) 99.74% of total</div><div><br></div><div>7330 samples in generated vm code 99.86% of entire vm (99.61% of total)</div><div>10 samples in vanilla vm code 0.14% of entire vm ( 0.14% of total)</div><div><br></div><div>% of generated vm code (% of total)<span class="" style="white-space:pre">                                </span>(samples) (cumulative)</div><div>33.06% (32.93%)<span class="" style="white-space:pre">        </span>BlockClosure>>value:<span class="" style="white-space:pre">                        </span>(2423)<span class="" style="white-space:pre">        </span>(33.06%)</div><div>18.66% (18.59%)<span class="" style="white-space:pre">        </span>UndefinedObject>>DoIt<span class="" style="white-space:pre">                </span><span style="white-space:pre">        </span>(1368)<span class="" style="white-space:pre">        </span>(51.72%)</div><div>17.97% (17.90%)<span class="" style="white-space:pre">        </span>SequenceableCollection>>do:<span class="" style="white-space:pre">        </span>(1317)<span class="" style="white-space:pre">        </span>(69.69%)</div><div>13.33% (13.28%)<span class="" style="white-space:pre">        </span>Object>>at:<span class="" style="white-space:pre">                                        </span>(977)<span class="" style="white-space:pre">        </span>(83.02%)</div><div> 8.84% ( 8.81%)<span class="" style="white-space:pre">        </span>SmallInteger>>+<span class="" style="white-space:pre">                                </span>(648)<span class="" style="white-space:pre">        </span>(91.86%)</div><div> 4.90% ( 4.88%)<span class="" style="white-space:pre">        </span>PIC at:<span class="" style="white-space:pre">                                                </span>(359)<span class="" style="white-space:pre">        </span>(96.75%)</div><div> 3.08% ( 3.07%)<span class="" style="white-space:pre">        </span>LookupKey>>key<span class="" style="white-space:pre">                                </span>(226)<span class="" style="white-space:pre">        </span>(99.84%)</div><div> 0.05% ( 0.05%)<span class="" style="white-space:pre">        </span>ceSmallBlockContext<span class="" style="white-space:pre">                        </span>(4)<span class="" style="white-space:pre">                </span>(99.89%)</div><div> 0.04% ( 0.04%)<span class="" style="white-space:pre">        </span>BlockClosure>>value<span class="" style="white-space:pre">                        </span>(3)<span class="" style="white-space:pre">                </span>(99.93%)</div><div> 0.03% ( 0.03%)<span class="" style="white-space:pre">        </span>Time class>>mi...condClockValue<span style="white-space:pre">        </span>(2)<span class="" style="white-space:pre">                </span>(99.96%)</div><div> 0.01% ( 0.01%)<span class="" style="white-space:pre">        </span>ArrayedCollection>>size<span class="" style="white-space:pre">                </span><span style="white-space:pre">        </span>(1)<span class="" style="white-space:pre">                </span>(99.97%)</div><div> 0.01% ( 0.01%)<span class="" style="white-space:pre">        </span>BlockClosure>>bench<span class="" style="white-space:pre">                        </span>(1)<span class="" style="white-space:pre">                </span>(99.99%)</div><div> 0.01% ( 0.01%)<span class="" style="white-space:pre">        </span>EventSensor>>eventTickler<span style="white-space:pre">        </span><span class="" style="white-space:pre">        </span>(1)<span class="" style="white-space:pre">                </span>(100.0%)</div></div><div><br></div><div><div><div>| l |</div><div>l := LinkedList new.</div><div>1000 timesRepeat: [l add: (StackLink with: 1)].</div><div>[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench</div></div><div><br></div><div>gc prior. clear prior. </div><div>5.002 seconds; sampling frequency 1464 hz</div><div>7315 samples in the VM<span class="" style="white-space:pre">        </span>(7321 samples in the entire program) 99.92% of total</div><div><br></div><div>7307 samples in generated vm code 99.89% of entire vm (99.81% of total)</div><div>8 samples in vanilla vm code 0.11% of entire vm ( 0.11% of total)</div><div><br></div><div>% of generated vm code (% of total)<span class="" style="white-space:pre">                        </span>(samples) (cumulative)</div><div>27.33% (27.28%)<span class="" style="white-space:pre">        </span>UndefinedObject>>DoIt<span style="white-space:pre">        </span><span class="" style="white-space:pre">        </span>(1997)<span class="" style="white-space:pre">        </span>(27.33%)</div><div>27.11% (27.06%)<span class="" style="white-space:pre">        </span>LinkedList>>do:<span class="" style="white-space:pre">                        </span>(1981)<span class="" style="white-space:pre">        </span>(54.44%)</div><div>24.39% (24.34%)<span class="" style="white-space:pre">        </span>BlockClosure>>value:<span class="" style="white-space:pre">                </span>(1782)<span class="" style="white-space:pre">        </span>(78.83%)</div><div>11.15% (11.13%)<span class="" style="white-space:pre">        </span>SmallInteger>>+<span class="" style="white-space:pre">                        </span>(815)<span class="" style="white-space:pre">        </span>(89.98%)</div><div> 5.30% ( 5.29%)<span class="" style="white-space:pre">        </span>Link>>nextLink<span class="" style="white-space:pre">                        </span>(387)<span class="" style="white-space:pre">        </span>(95.28%)</div><div> 4.50% ( 4.49%)<span class="" style="white-space:pre">        </span>StackLink>>element<span class="" style="white-space:pre">                </span>(329)<span class="" style="white-space:pre">        </span>(99.78%)</div><div> 0.12% ( 0.12%)<span class="" style="white-space:pre">        </span>BlockClosure>>bench<span class="" style="white-space:pre">                </span>(9)<span class="" style="white-space:pre">                </span>(99.90%)</div><div> 0.07% ( 0.07%)<span class="" style="white-space:pre">        </span>Time class>>...ndClockValue<span style="white-space:pre">        </span>(5)<span class="" style="white-space:pre">                </span>(99.97%)</div><div> 0.01% ( 0.01%)<span class="" style="white-space:pre">        </span>ceSmallBlockContext<span class="" style="white-space:pre">                </span>(1)<span class="" style="white-space:pre">                </span>(99.99%)</div><div> 0.01% ( 0.01%)<span class="" style="white-space:pre">        </span>BlockClosure>>value<span class="" style="white-space:pre">                </span>(1)<span class="" style="white-space:pre">                </span>(100.0%)</div></div><div><br></div><div><br></div><div>So to try and make sense of this we can say that the cost of at: vs the cost of block evaluation is, in Cog</div><div><div>1349 + 274 / 2215 ~= 0.73</div></div><div><br></div><div>and in Spur</div><div><div>977 <span style="white-space:pre">+ </span>359 / 2423 ~= 0.55</div></div><div><br></div><div>at: being much faster in Spur because the object representation makes it much simpler to do the type analysis (since Object>>at: works on any indexable class) and bounds check.</div><div><br></div><div><br></div><div>and approximately the difference between the Array and the LinkedList comes down to the relative costs of at: & SequenceableCollection>>#do: vs nextLink & LinkedList>>#do:, in Cog</div><div><br></div><div>1349 + 274 + 1054 / (1647 + 464) ~= 1.27</div><div><br></div><div>and in Spur</div><div><br></div><div>977 <span style="white-space:pre">+ </span>359 / (1981 + 387) ~= 0.56<br></div><div><br></div><div><br></div><div>Now Sista will make a big difference because it won't just eliminate the bounds checking overhead in at:, it'll also inline the block, so it'll be much more like</div><div><br></div><div><div>| a |</div><div>a := Array new: 1000 withAll: 1 -> 1.</div><div>[| t | t := 0. 1 to: 1000 do: [:i| t := t + (a at: i) key]. t] bench</div><div><br></div><div>Cog: 72,700 per second.</div><div>Spur: 87,300 per second.</div></div><div><br></div><div>but faster still because the bounds check in at: is completely eliminated.</div></div>-- <br><div class="gmail_signature">best,<div>Eliot</div></div>
</div></div>