<div dir="ltr">Hi Marcel,<div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jan 16, 2015 at 5:54 AM, Marcel Taeumel <span dir="ltr">&lt;<a href="mailto:marcel.taeumel@student.hpi.uni-potsdam.de" target="_blank">marcel.taeumel@student.hpi.uni-potsdam.de</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Ah, okay. I was curious about the performance of iterating over an array vs.<br>
the linked list:<br>
<br>
a := Array new: 1000.<br>
l := LinkedList new.<br>
<br>
1000 timesRepeat: [l add: (StackLink with: nil)].<br>
<br>
[a do: [:ea | 1 + 2]] bench. &quot;&#39;94,600 per second.&#39;&quot;<br>
<br>
[l do: [:ea | 1 + 2]] bench. &quot;&#39;159,000 per second.&#39;&quot;<br>
<br>
I expected the opposite! :-O Why is that so?<br>
<br>
Best,<br>
Marcel<br></blockquote><div><br></div><div>OK, here&#39;s the profiling results, all on a .2GHz Core i7 MBP (and these numbers are noisy; there&#39;s a lot else going on on my machine).  First, using Cog and a slightly modified benchmark that uses the elements of the collections so that the benchmark is &quot;real&quot;, and so that each block does the same ammount of work</div><div><br></div><div><div>| a l |</div><div>a := Array new: 1000 withAll: 1 -&gt; 1.</div><div>l := LinkedList new.</div><div>1000 timesRepeat: [l add: (StackLink with: 1)].</div><div>{ [| t | t := 0. a do: [:ea | t := t + ea key]. t] bench.</div><div>  [| t | t := 0. l do: [:ea | t := t + ea element]. t] bench} #(&#39;51,900 per second.&#39; &#39;70,400 per second.&#39;)</div></div><div><br></div><div>Now the same code on Spur:</div><div>#(&#39;57,100 per second.&#39; &#39;63,300 per second.&#39;)<br></div><div><br></div><div>at: is faster, but (alas) LinkedList&gt;&gt;do: is slower because in Spur #== requires a read barrier to check for possible forwarders, so in &quot;[aLink == nil] whileFalse:&quot; Spur has to fetch a field from aLink to ensure it is not forwarded.</div><div><br></div><div>At the VM level we can see that Object&gt;&gt;at: is expensive, but the block overhead dominates.  First Cog</div><div><br></div><div><div>/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.v3/Fast.app/Contents/MacOS/Squeak  1/16/2015 </div><div>eden size: 2,097,152  stack pages: 160  code size: 1,048,576</div><div><br></div><div>| a |</div><div>a := Array new: 1000 withAll: 1 -&gt; 1.</div><div>[| t | t := 0. a do: [:ea | t := t + ea key]. t] bench</div><div><br></div><div>gc prior.  clear prior.  </div><div>5.004 seconds; sampling frequency 1471 hz</div><div>7340 samples in the VM<span class="" style="white-space:pre">        </span>(7363 samples in the entire program)  99.69% of total</div><div><br></div><div>7302 samples in generated vm code 99.48% of entire vm (99.17% of total)</div><div>38 samples in vanilla vm code   0.52% of entire vm (  0.52% of total)</div><div><br></div><div>% of generated vm code (% of total)<span class="" style="white-space:pre">                                </span>(samples) (cumulative)</div><div>30.33%    (30.08%)<span class="" style="white-space:pre">        </span>BlockClosure&gt;&gt;value:<span class="" style="white-space:pre">                        </span>(2215)<span class="" style="white-space:pre">        </span>(30.33%)</div><div>18.47%    (18.32%)<span class="" style="white-space:pre">        </span>Object&gt;&gt;at:<span class="" style="white-space:pre">                                        </span>(1349)<span class="" style="white-space:pre">        </span>(48.81%)</div><div>16.83%    (16.69%)<span class="" style="white-space:pre">        </span>UndefinedObject&gt;&gt;DoIt<span class="" style="white-space:pre">        </span><span style="white-space:pre">        </span><span class="" style="white-space:pre">        </span>(1229)<span class="" style="white-space:pre">        </span>(65.64%)</div><div>14.43%    (14.31%)<span class="" style="white-space:pre">        </span>SequenceableCollection&gt;&gt;do:<span class="" style="white-space:pre">        </span>(1054)<span class="" style="white-space:pre">        </span>(80.07%)</div><div>  9.57%    (  9.49%)<span class="" style="white-space:pre">        </span>LookupKey&gt;&gt;key<span class="" style="white-space:pre">                                </span>(699)<span class="" style="white-space:pre">        </span>(89.65%)</div><div>  6.35%    (  6.30%)<span class="" style="white-space:pre">        </span>SmallInteger&gt;&gt;+<span class="" style="white-space:pre">                                </span>(464)<span class="" style="white-space:pre">        </span>(96.00%)</div><div>  3.75%    (  3.72%)<span class="" style="white-space:pre">        </span>PIC at:<span class="" style="white-space:pre">                                                </span>(274)<span class="" style="white-space:pre">        </span>(99.75%)</div><div>  0.07%    (  0.07%)<span class="" style="white-space:pre">        </span>BlockClosure&gt;&gt;value<span class="" style="white-space:pre">                        </span>(5)<span class="" style="white-space:pre">                </span>(99.82%)</div><div>  0.05%    (  0.05%)<span class="" style="white-space:pre">        </span>BlockClosure&gt;&gt;bench<span class="" style="white-space:pre">                        </span>(4)<span class="" style="white-space:pre">                </span>(99.88%)</div><div>  0.04%    (  0.04%)<span class="" style="white-space:pre">        </span>PIC size<span class="" style="white-space:pre">                                                </span>(3)<span class="" style="white-space:pre">                </span>(99.92%)</div><div>  0.04%    (  0.04%)<span class="" style="white-space:pre">        </span>ceClosureCopyTrampoline<span class="" style="white-space:pre">                </span>(3)<span class="" style="white-space:pre">                </span>(99.96%)</div><div>  0.03%    (  0.03%)<span class="" style="white-space:pre">        </span>Time class&gt;&gt;mi...condClockValue<span style="white-space:pre">        </span>(2)<span class="" style="white-space:pre">                </span>(99.99%)</div><div>  0.01%    (  0.01%)<span class="" style="white-space:pre">        </span>ArrayedCollection&gt;&gt;size<span class="" style="white-space:pre">                </span><span style="white-space:pre">        </span>(1)<span class="" style="white-space:pre">                </span>(100.0%)</div></div><div><br></div><div><br></div><div><div>| l |</div><div>l := LinkedList new.</div><div>1000 timesRepeat: [l add: (StackLink with: 1)].</div><div>[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench</div><div><br></div><div><div>gc prior.  clear prior.  </div><div>5.001 seconds; sampling frequency 1452 hz</div><div>7240 samples in the VM<span class="" style="white-space:pre">        </span>(7260 samples in the entire program)  99.72% of total</div><div><br></div><div>7186 samples in generated vm code 99.25% of entire vm (98.98% of total)</div><div>54 samples in vanilla vm code   0.75% of entire vm (  0.74% of total)</div><div><br></div><div>% of generated vm code (% of total)<span class="" style="white-space:pre">                        </span>(samples) (cumulative)</div><div>29.22%    (28.93%)<span class="" style="white-space:pre">        </span>BlockClosure&gt;&gt;value:<span class="" style="white-space:pre">                </span>(2100)<span class="" style="white-space:pre">        </span>(29.22%)</div><div>26.32%    (26.05%)<span class="" style="white-space:pre">        </span>UndefinedObject&gt;&gt;DoIt<span style="white-space:pre">        </span><span class="" style="white-space:pre">        </span>(1891)<span class="" style="white-space:pre">        </span>(55.54%)</div><div>22.92%    (22.69%)<span class="" style="white-space:pre">        </span>LinkedList&gt;&gt;do:<span class="" style="white-space:pre">                        </span>(1647)<span class="" style="white-space:pre">        </span>(78.46%)</div><div>  8.14%    (  8.06%)<span class="" style="white-space:pre">        </span>SmallInteger&gt;&gt;+<span class="" style="white-space:pre">                        </span>(585)<span class="" style="white-space:pre">        </span>(86.60%)</div><div>  6.76%    (  6.69%)<span class="" style="white-space:pre">        </span>StackLink&gt;&gt;element<span class="" style="white-space:pre">                </span>(486)<span class="" style="white-space:pre">        </span>(93.36%)</div><div>  6.46%    (  6.39%)<span class="" style="white-space:pre">        </span>Link&gt;&gt;nextLink<span class="" style="white-space:pre">                        </span>(464)<span class="" style="white-space:pre">        </span>(99.82%)</div><div>  0.08%    (  0.08%)<span class="" style="white-space:pre">        </span>Time class&gt;&gt;...ndClockValue<span style="white-space:pre">        </span>(6)<span class="" style="white-space:pre">                </span>(99.90%)</div><div>  0.04%    (  0.04%)<span class="" style="white-space:pre">        </span>BlockClosure&gt;&gt;bench<span class="" style="white-space:pre">                </span>(3)<span class="" style="white-space:pre">                </span>(99.94%)</div><div>  0.03%    (  0.03%)<span class="" style="white-space:pre">        </span>BlockClosure&gt;&gt;value<span class="" style="white-space:pre">                </span>(2)<span class="" style="white-space:pre">                </span>(99.97%)</div><div>  0.01%    (  0.01%)<span class="" style="white-space:pre">        </span>ceClosureCopyTrampoline<span class="" style="white-space:pre">        </span>(1)<span class="" style="white-space:pre">                </span>(99.99%)</div><div>  0.01%    (  0.01%)<span class="" style="white-space:pre">        </span>ceCreateNew...Trampoline<span class="" style="white-space:pre">        </span>(1)<span class="" style="white-space:pre">                </span>(100.0%)</div></div></div><div><br></div><div><br></div><div>and then Spur:</div><div><br></div><div><div>/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.spur/Fast.app/Contents/MacOS/Squeak  1/16/2015 </div><div>eden size: 4,101,312  stack pages: 160  code size: 1,048,576</div><div><br></div><div><div>| a |</div><div>a := Array new: 1000 withAll: 1 -&gt; 1.</div><div>[| t | t := 0. a do: [:ea | t := t + ea key]. t] bench</div></div><div><br></div><div>gc prior.  clear prior.  </div><div>5.002 seconds; sampling frequency 1471 hz</div><div>7340 samples in the VM<span class="" style="white-space:pre">        </span>(7359 samples in the entire program)  99.74% of total</div><div><br></div><div>7330 samples in generated vm code 99.86% of entire vm (99.61% of total)</div><div>10 samples in vanilla vm code   0.14% of entire vm (  0.14% of total)</div><div><br></div><div>% of generated vm code (% of total)<span class="" style="white-space:pre">                                </span>(samples) (cumulative)</div><div>33.06%    (32.93%)<span class="" style="white-space:pre">        </span>BlockClosure&gt;&gt;value:<span class="" style="white-space:pre">                        </span>(2423)<span class="" style="white-space:pre">        </span>(33.06%)</div><div>18.66%    (18.59%)<span class="" style="white-space:pre">        </span>UndefinedObject&gt;&gt;DoIt<span class="" style="white-space:pre">                </span><span style="white-space:pre">        </span>(1368)<span class="" style="white-space:pre">        </span>(51.72%)</div><div>17.97%    (17.90%)<span class="" style="white-space:pre">        </span>SequenceableCollection&gt;&gt;do:<span class="" style="white-space:pre">        </span>(1317)<span class="" style="white-space:pre">        </span>(69.69%)</div><div>13.33%    (13.28%)<span class="" style="white-space:pre">        </span>Object&gt;&gt;at:<span class="" style="white-space:pre">                                        </span>(977)<span class="" style="white-space:pre">        </span>(83.02%)</div><div>  8.84%    (  8.81%)<span class="" style="white-space:pre">        </span>SmallInteger&gt;&gt;+<span class="" style="white-space:pre">                                </span>(648)<span class="" style="white-space:pre">        </span>(91.86%)</div><div>  4.90%    (  4.88%)<span class="" style="white-space:pre">        </span>PIC at:<span class="" style="white-space:pre">                                                </span>(359)<span class="" style="white-space:pre">        </span>(96.75%)</div><div>  3.08%    (  3.07%)<span class="" style="white-space:pre">        </span>LookupKey&gt;&gt;key<span class="" style="white-space:pre">                                </span>(226)<span class="" style="white-space:pre">        </span>(99.84%)</div><div>  0.05%    (  0.05%)<span class="" style="white-space:pre">        </span>ceSmallBlockContext<span class="" style="white-space:pre">                        </span>(4)<span class="" style="white-space:pre">                </span>(99.89%)</div><div>  0.04%    (  0.04%)<span class="" style="white-space:pre">        </span>BlockClosure&gt;&gt;value<span class="" style="white-space:pre">                        </span>(3)<span class="" style="white-space:pre">                </span>(99.93%)</div><div>  0.03%    (  0.03%)<span class="" style="white-space:pre">        </span>Time class&gt;&gt;mi...condClockValue<span style="white-space:pre">        </span>(2)<span class="" style="white-space:pre">                </span>(99.96%)</div><div>  0.01%    (  0.01%)<span class="" style="white-space:pre">        </span>ArrayedCollection&gt;&gt;size<span class="" style="white-space:pre">                </span><span style="white-space:pre">        </span>(1)<span class="" style="white-space:pre">                </span>(99.97%)</div><div>  0.01%    (  0.01%)<span class="" style="white-space:pre">        </span>BlockClosure&gt;&gt;bench<span class="" style="white-space:pre">                        </span>(1)<span class="" style="white-space:pre">                </span>(99.99%)</div><div>  0.01%    (  0.01%)<span class="" style="white-space:pre">        </span>EventSensor&gt;&gt;eventTickler<span style="white-space:pre">        </span><span class="" style="white-space:pre">        </span>(1)<span class="" style="white-space:pre">                </span>(100.0%)</div></div><div><br></div><div><div><div>| l |</div><div>l := LinkedList new.</div><div>1000 timesRepeat: [l add: (StackLink with: 1)].</div><div>[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench</div></div><div><br></div><div>gc prior.  clear prior.  </div><div>5.002 seconds; sampling frequency 1464 hz</div><div>7315 samples in the VM<span class="" style="white-space:pre">        </span>(7321 samples in the entire program)  99.92% of total</div><div><br></div><div>7307 samples in generated vm code 99.89% of entire vm (99.81% of total)</div><div>8 samples in vanilla vm code   0.11% of entire vm (  0.11% of total)</div><div><br></div><div>% of generated vm code (% of total)<span class="" style="white-space:pre">                        </span>(samples) (cumulative)</div><div>27.33%    (27.28%)<span class="" style="white-space:pre">        </span>UndefinedObject&gt;&gt;DoIt<span style="white-space:pre">        </span><span class="" style="white-space:pre">        </span>(1997)<span class="" style="white-space:pre">        </span>(27.33%)</div><div>27.11%    (27.06%)<span class="" style="white-space:pre">        </span>LinkedList&gt;&gt;do:<span class="" style="white-space:pre">                        </span>(1981)<span class="" style="white-space:pre">        </span>(54.44%)</div><div>24.39%    (24.34%)<span class="" style="white-space:pre">        </span>BlockClosure&gt;&gt;value:<span class="" style="white-space:pre">                </span>(1782)<span class="" style="white-space:pre">        </span>(78.83%)</div><div>11.15%    (11.13%)<span class="" style="white-space:pre">        </span>SmallInteger&gt;&gt;+<span class="" style="white-space:pre">                        </span>(815)<span class="" style="white-space:pre">        </span>(89.98%)</div><div>  5.30%    (  5.29%)<span class="" style="white-space:pre">        </span>Link&gt;&gt;nextLink<span class="" style="white-space:pre">                        </span>(387)<span class="" style="white-space:pre">        </span>(95.28%)</div><div>  4.50%    (  4.49%)<span class="" style="white-space:pre">        </span>StackLink&gt;&gt;element<span class="" style="white-space:pre">                </span>(329)<span class="" style="white-space:pre">        </span>(99.78%)</div><div>  0.12%    (  0.12%)<span class="" style="white-space:pre">        </span>BlockClosure&gt;&gt;bench<span class="" style="white-space:pre">                </span>(9)<span class="" style="white-space:pre">                </span>(99.90%)</div><div>  0.07%    (  0.07%)<span class="" style="white-space:pre">        </span>Time class&gt;&gt;...ndClockValue<span style="white-space:pre">        </span>(5)<span class="" style="white-space:pre">                </span>(99.97%)</div><div>  0.01%    (  0.01%)<span class="" style="white-space:pre">        </span>ceSmallBlockContext<span class="" style="white-space:pre">                </span>(1)<span class="" style="white-space:pre">                </span>(99.99%)</div><div>  0.01%    (  0.01%)<span class="" style="white-space:pre">        </span>BlockClosure&gt;&gt;value<span class="" style="white-space:pre">                </span>(1)<span class="" style="white-space:pre">                </span>(100.0%)</div></div><div><br></div><div><br></div><div>So to try and make sense of this we can say that the cost of at: vs the cost of block evaluation is, in Cog</div><div><div>1349 + 274 / 2215 ~= 0.73</div></div><div><br></div><div>and in Spur</div><div><div>977 <span style="white-space:pre">+ </span>359 / 2423 ~= 0.55</div></div><div><br></div><div>at: being much faster in Spur because the object representation makes it much simpler to do the type analysis (since Object&gt;&gt;at: works on any indexable class) and bounds check.</div><div><br></div><div><br></div><div>and approximately the difference between the Array and the LinkedList comes down to the relative costs of at: &amp; SequenceableCollection&gt;&gt;#do: vs nextLink &amp; LinkedList&gt;&gt;#do:, in Cog</div><div><br></div><div>1349 + 274 + 1054 / (1647 + 464) ~= 1.27</div><div><br></div><div>and in Spur</div><div><br></div><div>977 <span style="white-space:pre">+ </span>359 / (1981 + 387) ~= 0.56<br></div><div><br></div><div><br></div><div>Now Sista will make a big difference because it won&#39;t just eliminate the bounds checking overhead in at:, it&#39;ll also inline the block, so it&#39;ll be much more like</div><div><br></div><div><div>| a |</div><div>a := Array new: 1000 withAll: 1 -&gt; 1.</div><div>[| t | t := 0. 1 to: 1000 do: [:i| t := t + (a at: i) key]. t] bench</div><div><br></div><div>Cog: 72,700 per second.</div><div>Spur: 87,300 per second.</div></div><div><br></div><div>but faster still because the bounds check in at: is completely eliminated.</div></div>-- <br><div class="gmail_signature">best,<div>Eliot</div></div>
</div></div>