Hi John,<div><br></div><div>    good questions.<br><br><div class="gmail_quote">On Thu, Feb 17, 2011 at 6:21 AM, John B Thiel <span dir="ltr">&lt;<a href="mailto:jbthiel@gmail.com">jbthiel@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Cog VM -- Thanks and Performance / Optimization Questions<br>

<br>

<br>

To Everyone, thanks for your great work on Pharo and Squeak,  and to<br>

Eliot Miranda, Ian Piumarta, and all VM/JIT gurus, especially thanks<br>

for the Squeak VM Cog and its precursors, which I was keenly<br>

anticipating for a decade or so, and is really going into stride with<br>

the latest builds.<br>

<br>

I like to code with awareness of performance issues.  Can you tell or<br>

point me to some performance and efficiency tips for Cog and the<br>

Squeak compiler -- detail on which methods are inlined, best among<br>

alternatives, etc.  For example, I understand #to:do: is inlined --<br>

what about #to:do:by: and #timesRepeat and #repeat  ?  Basically, I<br>

would like to read a full overview of which core methods are specially<br>

optimized (or planned).<br></blockquote><div><br></div><div>The bytecode compiler inlines a set of selectors if the arguments are suitable (typically literal blocks).  The standard compiler&#39;s list is MessageNode classPool at: #MacroSelectors, e.g.</div>

<div><br></div><div>#ifTrue: #ifFalse: #ifTrue:ifFalse: #ifFalse:ifTrue: #and: #or: #whileFalse: #whileTrue: #whileFalse #whileTrue #to:do: #to:by:do: #caseOf: #caseOf:otherwise: #ifNil: #ifNotNil: #ifNil:ifNotNil: #ifNotNil:ifNil:</div>

<div><br></div><div>Note that Nicolas Cellier has just added support for inlining repeat and timesRepeat in Squeak trunk. </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<br>

I know about the list of NoLookup primitives, as per Object<br>

class&gt;&gt;howToModifyPrimitives,  supposing that is still valid?<br></blockquote><div><br></div><div>Not for Cog.  While #== and #class are inlined all other primitives are looked up.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

What do you think is a reasonable speed factor for number-crunching<br>

Squeak code vs C ?   I am seeing about 20x slower in the semi-large<br>

scale, which surprised me a bit because I got about 10x on smaller<br>

tests, and a simple fib: with beautiful Cog is now about 3x (wow!).<br>

That range, 3x tiny tight loop, to 20x for general multi-class<br>

computation, seems a bit wide -- is it about expected?<br></blockquote><div><br></div><div>Are you saying that you have a macro benchmark that is 20 times faster in C than in Cog?  Cog, while faster than the interpreter, is still a non-inlining, non-globally-optimizing system and so performance is certainly to be expected to be worse than C.  But 20x sounds a little high so your benchmark could be useful.  If you can post this please do.</div>

<div><br></div><div>The current state of Cog is that the new code generator gives a significant speed-up but that the object model and garbage collector remain substantially the same as the Squeak interpreter.  The GC is slow and badly needs replacing.  The object model is both slow, especially for class access, which slows down all sends a little, and over-complex, which means that several performance-critical primitives have yet to be implemented in machine-code, especially at:put:, basicNew, basicNew:, and closure creation, all of which currently require expensive calls into C instead of using inline machine code.  I would expect that improving all these could add at least another factor of 33%. I&#39;m trying to find funding to work on these two issues ASAP.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

My profiling does not reveal any hotspots, as such -- it&#39;s basically<br>

2, 3, 5% scattered around, so I envision this is just the general<br>

vm/jit overhead as you scale up -- referencing distant objects, slots,<br>

dispatch lookups, more cache misses, etc.  But maybe I am generally<br>

using some backwater loop/control methods, techniques, etc. that could<br>

be tuned up.  e.g. I seem to recall a trace at some point showing<br>

#timesRepeat taking 10% of the time (?!).   Also, I recall reading<br>

about an anomaly with BlockClosures -- something like being rebuilt<br>

every time thru the loop - has that been fixed?  Any other gotchas to<br>

watch for currently?<br></blockquote><div><br></div><div>BlockClosures for non-inlined blocks are still created when mentioned.  So if you do have a loop which contains a block creation, consider pulling the block out into a temp variable.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

(Also, any notoriously slow subsystems?  For example, Transcript<br>

writing is glacial.)<br></blockquote><div><br></div><div>Someone should replace the Transcript&#39;s reliance on (I think) some kind of FormMorph which moved huge numbers of bits on each write.  But this is not a VM issue.  It&#39;s a Smalltalk issue.  Whoever did this would instantly become a hero.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">The Squeak bytecode compiler looks fairly straightforward and<br>

non-optimizing - just statement by statement translation.  So it<br>

misses e.g. chances to store and reuse, instead of pop, etc.  I see<br>

lots of redundant sequences emitted.  Are those kind of things now<br>

optimized out by Cog, or would tighter bytecode be another potential<br>

optimization path.  (Is that what the Opal project is targetting?)<br></blockquote><div><br></div><div>There is some limited constant folding in the StackToRegisterMapingCogit.  For example the pushTrue jumpFalse sequences generated in inlined and: and or: statements is eliminated.   Also constant SmallInteger arithmetic is folded iff the receiver is a literal.  The JIT doesn&#39;t have any type information so it can&#39;t fold var + 1 + 2, but it can and does fold 1 + 2 + var into 3 + var.</div>

<div><br></div><div>Marcus Denker and I are working as I write on the infrastructure for an adaptive-optimizer/speculative-inliner that will initially operate at the bytecode level, deriving type information from the JIT&#39;s inline caches and using this to direct bytecode-to-bytecode optimization that will inline blocks, inline methods, etc.  We hope eventually to target floating-point and other performance-critical code.  Marcus posted <a href=" Smalltalk Books Video Tutorials Smalltalk in Latam About Us! Contact Us Actually, I&#39;m trying to make Ruby natural, not simple. -- Yukihiro &quot;Matz&quot; Matsumoto Home  Talks &amp; Presentations  Eliot Miranda - Bytecode-to-bytecode adaptive optimization for Smalltalk    Eliot Miranda - Bytecode-to-bytecode adaptive optimization for Smalltalk Last Updated (Sunday, 14 February 2010 20:21)            |            Written by Administrator            |            Monday, 18 August 2008 19:28     Multimedia Gallery         - Talks &amp; Presentations     More ...     Comments from the google video: This talk summarises two decades of work on Smalltalk and Self compilation and virtual machine technology and describes a novel attempt at an adaptive optimizer for Smalltalk that is written in Smalltalk and to a meaningful extent, portable across implementations. Smalltalk-80 and Self are fully object-oriented implicitly typed dynamic programming languages and interactive programming environments hosted above virtual machines that appear to execute stack-oriented bytecode for a pure stack machine. These systems&#39; code and execution state are all represented by objects programmed in Smalltalk, such that the compiler, debugger, exception system and more are all implemented entirely in Smalltalk and available to the programmer for immediate incremental and interactive modification          LikeDislike         Community Disqus Add New Comment Optional: Login below.    Post as … Showing 0 comments Sort by      Subscribe by email    Subscribe by RSS blog comments powered by DISQUS back to top &lt; Prev                  Next &gt;  Main Menu Home ClubSmalltalk Merchandise About Us! Contact Us Articles Frontpage news Interviews Community ClubSmalltalk | Mailing List [In Spanish] ClubSmalltalk | LinkedIn Group ClubSmalltalk | Facebook ClubSmalltalk | Community Blogs Smalltalkers Blogs &amp; Personal pages Smalltalkers - Social Network Environments Commercial Smalltalk Environments Free Smalltalk Environments Abbandon Smalltalk Environments Frameworks, Platforms &amp; Tools FAQ Smalltalk Frequently Asked Questions GemStone Frequently Asked Questions ENVY/Manager Frequently Asked Questions Resources Smalltalk Jobs! Smalltalk Web Links Smalltalk News Feeds Smalltalk Books Smalltalk Podcasts Multimedia Gallery Back to the future - Photo Gallery Smalltalk History Channel Talks &amp; Presentations Smalltalk Documentaries Login  If you don&#39;t want to register in this site, you can use your Gmail or OpenId authentication.    Username  Password  Remember Me    Forgot your password? Forgot your username? Create an account Login with an OpenID What is OpenId?  Smalltalk on Twitter J_WICKS_CTE (J.Wicks) : I hate 2 word sentences #smalltalk 11th_echo (Yvonne) : Da fand er unseren Smalltalk wohl so einlullend, dass er erstmal seine Schlüssel liegen ließ :DDDDDDDDDDDD höhö Dennis_Klinger (Dennis Klinger) : benachrichtigung bei antwort: hi leute ich hab mal ne frage. wie kann man sich benachrichtigen lassen wenn in ... http://bit.ly/gTytl0 onizee (Oneal Madumo) : Those who understnd #smalltalk ,guy walks in on me in th gym showers,and he says,&quot;eish o tshwara shawara&quot;,how do i respond to this? JusCardo (Ricardo Cherry) : #SMALLtalk RT @_SweetTeee @JusCardo lol I hate you Paypal Donation Please, make a Paypal donation at least of 1 dollar! Thanks!       Copyright © 2011 ClubSmalltalk. All Rights Reserved. Contact Us!  Please, if you like the site visit our sponsors. Thanks.    ">a presentation I did a while back</a> that covers the essential ideas.  This project could completely close the gap to C if augmented by a good quality code generator.  At least, that&#39;s a goal.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

-- jbthiel<br>

<br>

</blockquote></div><br></div>