<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Hi Jan,<br><div><br></div><div>   well how about these?  Scroll down past the definitions to see the benchmarker:.  The point about benchFib is that 1 is added for every activation so the result is the number of calls required to evaluate it.  Hence divide by the time and one gets activations per second.  Very convenient.  The variations are between a method using block recursion, a method on Integer where the value is accessed as self, a method using perform:, and two methods that access the value as an argument, one with a SmallInteger receiver and the other with nil as the receiver.</div><div><br></div><div><div><br></div><div>!BlockClosure methodsFor: 'benchmarks' stamp: 'eem 1/23/2020 22:55'!</div><div>benchFib: arg</div><div><span class="gmail-Apple-tab-span" style="white-space:pre"> </span>| benchFib |</div><div><span class="gmail-Apple-tab-span" style="white-space:pre">   </span>benchFib := [:n| n < 2</div><div><span class="gmail-Apple-tab-span" style="white-space:pre">                                      </span>ifTrue: [1] </div><div><span class="gmail-Apple-tab-span" style="white-space:pre">                                  </span>ifFalse: [(benchFib value: n - 1) + (benchFib value: n - 2) + 1]].</div><div><span class="gmail-Apple-tab-span" style="white-space:pre">     </span>^benchFib value: arg! !</div><div><br></div><div>!Integer methodsFor: 'benchmarks' stamp: 'eem 1/23/2020 23:10'!</div><div>benchFib: n</div><div><span class="gmail-Apple-tab-span" style="white-space:pre">       </span>^n < 2</div><div><span class="gmail-Apple-tab-span" style="white-space:pre">              </span>ifTrue: [1] </div><div><span class="gmail-Apple-tab-span" style="white-space:pre">          </span>ifFalse: [(self benchFib: n-1) + (self benchFib: n-2) + 1]! !</div><div><br></div><div>!Integer methodsFor: 'benchmarks' stamp: 'jm 11/20/1998 07:06'!</div><div>benchFib</div><div><span class="gmail-Apple-tab-span" style="white-space:pre">    </span>^ self < 2</div><div><span class="gmail-Apple-tab-span" style="white-space:pre">          </span>ifTrue: [1] </div><div><span class="gmail-Apple-tab-span" style="white-space:pre">          </span>ifFalse: [(self-1) benchFib + (self-2) benchFib + 1]! !</div><div><br></div><div>!Symbol methodsFor: 'benchmarks' stamp: 'eem 1/23/2020 22:57'!</div><div>benchFib: n</div><div><span class="gmail-Apple-tab-span" style="white-space:pre">        </span>^n < 2</div><div><span class="gmail-Apple-tab-span" style="white-space:pre">              </span>ifTrue: [1] </div><div><span class="gmail-Apple-tab-span" style="white-space:pre">          </span>ifFalse: [(self perform: #benchFib: with: n - 1) + (self perform: #benchFib: with: n - 2) + 1]! !</div><div><br></div><div>!UndefinedObject methodsFor: 'benchmarks' stamp: 'eem 1/23/2020 23:09'!</div><div>benchFib: n</div><div><span class="gmail-Apple-tab-span" style="white-space:pre">     </span>^n < 2</div><div><span class="gmail-Apple-tab-span" style="white-space:pre">              </span>ifTrue: [1] </div><div><span class="gmail-Apple-tab-span" style="white-space:pre">          </span>ifFalse: [(self benchFib: n-1)  + (self benchFib: n-2) + 1]! !</div></div><div><br></div><div><br></div><div><div>Collect result / seconds.  Bigger is faster (more calls per second).  Using Integer receivers involves a branch in the inline cache check, whereas all the others have no such jump.  This is a 64-bit Squeak 5.2 image on my 2.9 GHz Intel Core i9 15" 2018 MacBookPro (thanks Doru!).  And I'm using the SistaV1 bytecode set with full blocks (no block dispatch to reach the code for a particular block; each block is its own method).</div><div><br></div><div>| n collector blocks times |<br></div></div><div><div>n := 42.</div><div>collector := [:block| | t r | t := [r := block value] timeToRun. { t. r. (r * 1000.0 / t) rounded }].</div><div>blocks := { [n benchFib]. [n benchFib: n]. [nil benchFib: n]. [#benchFib: benchFib: n]. [[] benchFib: n] }.</div><div>times := blocks collect: collector; collect: collector. "twice to ensure measuring hot code".</div><div>(1 to: blocks size) collect: [:i| { (blocks at: i) decompile }, (times at: i), {((times at: i) last / times first last * 100) rounded }]</div><div><br></div><div>{{{ [n benchFib]} . 3734 . 866988873 . 232187700 . 100 } .</div><div> {{ [n benchFib: n]} . 3675 . 866988873 . 235915340 . 102 } .</div><div> {{ [nil benchFib: n]} . 3450 . 866988873 . 251301123 . 108 } .</div><div> {{ [#benchFib: benchFib: n]} . 5573 . 866988873 . 155569509 . 67} .</div><div> {{ [[] benchFib: n]} . 4930 . 866988873 . 175859812 . 76 }}</div></div><div><br></div><div>So... the clock is very granular (you see this at low N}.</div><div>blocks are 76% as fast as straight integers.</div><div>perform: is 67% as fast as straight integers (not too shabby; but then integers are crawling).</div><div>Fastest is sending to a non-immediate receiver and accessing the value as an argument.<br></div><div>The rest indicate that frame building is really expensive and dominates differences between accessing the value as the receiver or accessing it as an argument, whether there's a jump in the inline cache check, etc. This confirms what we found many years ago that if the ifTrue: [^1] branch can be done frameless, or that significant inlining can occur (as an adaptive optimizer can achieve) then things go a lot faster.  But on the Cog execution architecture blocks and perform: are p.d.q. relative to vanilla sends.</div></div></div></div></div></div></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 23, 2020 at 2:35 AM Jan Vrany <<a href="mailto:jan.vrany@fit.cvut.cz">jan.vrany@fit.cvut.cz</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> <br>

Eliot, <br>

<br>

> 2) the lack of inline caches for #perform: (again, I am just guessing in<br>

> > this case).<br>

> > <br>

> <br>

> Right.  There is only the first level method lookup cache so it has<br>

> interpreter-like performance.  The selector and classs of receiver have to<br>

> be hashed and the first-level method lookup cache probed.  Way slower than<br>

> block activation.  I will claim though that Cog/Spur OpenSmalltalk's JIT<br>

> perform implementation is as good as or better than any other Smalltalk<br>

> VM's.  IIRC VW only machine coded/codes perform: and perform:with:<br>

<br>

Do you have a benchmark for perform: et. al.? I'd be quite interested. <br>

Last time I was on this topic, I struggled to come up with a benchmark <br>

that would represent any hope-to-be-like-real-workload benchmark (and whose<br>

results I could interpret :-)<br>

<br>

Jan<br>

<br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-size:small;border-collapse:separate"><div>_,,,^..^,,,_<br></div><div>best, Eliot</div></span></div></div></div>