[Vm-dev] Re: [Pharo-project] Cog VM -- Thanks and Performance / Optimization Questions

Thu Feb 17 17:52:21 UTC 2011

Hi John,

    good questions.

On Thu, Feb 17, 2011 at 6:21 AM, John B Thiel <jbthiel at gmail.com> wrote:

> Cog VM -- Thanks and Performance / Optimization Questions
>
>
> To Everyone, thanks for your great work on Pharo and Squeak,  and to
> Eliot Miranda, Ian Piumarta, and all VM/JIT gurus, especially thanks
> for the Squeak VM Cog and its precursors, which I was keenly
> anticipating for a decade or so, and is really going into stride with
> the latest builds.
>
> I like to code with awareness of performance issues.  Can you tell or
> point me to some performance and efficiency tips for Cog and the
> Squeak compiler -- detail on which methods are inlined, best among
> alternatives, etc.  For example, I understand #to:do: is inlined --
> what about #to:do:by: and #timesRepeat and #repeat  ?  Basically, I
> would like to read a full overview of which core methods are specially
> optimized (or planned).
>

The bytecode compiler inlines a set of selectors if the arguments are
suitable (typically literal blocks).  The standard compiler's list is
MessageNode classPool at: #MacroSelectors, e.g.

#ifTrue: #ifFalse: #ifTrue:ifFalse: #ifFalse:ifTrue: #and: #or: #whileFalse:
#whileTrue: #whileFalse #whileTrue #to:do: #to:by:do: #caseOf:
#caseOf:otherwise: #ifNil: #ifNotNil: #ifNil:ifNotNil: #ifNotNil:ifNil:

Note that Nicolas Cellier has just added support for inlining repeat and
timesRepeat in Squeak trunk.

> I know about the list of NoLookup primitives, as per Object
> class>>howToModifyPrimitives,  supposing that is still valid?
>

Not for Cog.  While #== and #class are inlined all other primitives are
looked up.

> What do you think is a reasonable speed factor for number-crunching
> Squeak code vs C ?   I am seeing about 20x slower in the semi-large
> scale, which surprised me a bit because I got about 10x on smaller
> tests, and a simple fib: with beautiful Cog is now about 3x (wow!).
> That range, 3x tiny tight loop, to 20x for general multi-class
> computation, seems a bit wide -- is it about expected?
>

Are you saying that you have a macro benchmark that is 20 times faster in C
than in Cog?  Cog, while faster than the interpreter, is still a
non-inlining, non-globally-optimizing system and so performance is certainly
to be expected to be worse than C.  But 20x sounds a little high so your
benchmark could be useful.  If you can post this please do.

The current state of Cog is that the new code generator gives a significant
speed-up but that the object model and garbage collector remain
substantially the same as the Squeak interpreter.  The GC is slow and badly
needs replacing.  The object model is both slow, especially for class
access, which slows down all sends a little, and over-complex, which means
that several performance-critical primitives have yet to be implemented in
machine-code, especially at:put:, basicNew, basicNew:, and closure creation,
all of which currently require expensive calls into C instead of using
inline machine code.  I would expect that improving all these could add at
least another factor of 33%. I'm trying to find funding to work on these two
issues ASAP.

> My profiling does not reveal any hotspots, as such -- it's basically
> 2, 3, 5% scattered around, so I envision this is just the general
> vm/jit overhead as you scale up -- referencing distant objects, slots,
> dispatch lookups, more cache misses, etc.  But maybe I am generally
> using some backwater loop/control methods, techniques, etc. that could
> be tuned up.  e.g. I seem to recall a trace at some point showing
> #timesRepeat taking 10% of the time (?!).   Also, I recall reading
> about an anomaly with BlockClosures -- something like being rebuilt
> every time thru the loop - has that been fixed?  Any other gotchas to
> watch for currently?
>

BlockClosures for non-inlined blocks are still created when mentioned.  So
if you do have a loop which contains a block creation, consider pulling the
block out into a temp variable.

> (Also, any notoriously slow subsystems?  For example, Transcript
> writing is glacial.)
>

Someone should replace the Transcript's reliance on (I think) some kind of
FormMorph which moved huge numbers of bits on each write.  But this is not a
VM issue.  It's a Smalltalk issue.  Whoever did this would instantly become
a hero.

> The Squeak bytecode compiler looks fairly straightforward and
> non-optimizing - just statement by statement translation.  So it
> misses e.g. chances to store and reuse, instead of pop, etc.  I see
> lots of redundant sequences emitted.  Are those kind of things now
> optimized out by Cog, or would tighter bytecode be another potential
> optimization path.  (Is that what the Opal project is targetting?)
>

There is some limited constant folding in the StackToRegisterMapingCogit.
 For example the pushTrue jumpFalse sequences generated in inlined and: and
or: statements is eliminated.   Also constant SmallInteger arithmetic is
folded iff the receiver is a literal.  The JIT doesn't have any type
information so it can't fold var + 1 + 2, but it can and does fold 1 + 2 +
var into 3 + var.

Marcus Denker and I are working as I write on the infrastructure for an
adaptive-optimizer/speculative-inliner that will initially operate at the
bytecode level, deriving type information from the JIT's inline caches and
using this to direct bytecode-to-bytecode optimization that will inline
blocks, inline methods, etc.  We hope eventually to target floating-point
and other performance-critical code.  Marcus posted a presentation I did a
while back <Smalltalk Books Video Tutorials Smalltalk in Latam About Us!
Contact Us Actually, I'm trying to make Ruby natural, not simple. --
Yukihiro "Matz" Matsumoto Home Talks & Presentations Eliot Miranda -
Bytecode-to-bytecode adaptive optimization for Smalltalk Eliot Miranda -
Bytecode-to-bytecode adaptive optimization for Smalltalk Last Updated
(Sunday, 14 February 2010 20:21) | Written by Administrator | Monday, 18
August 2008 19:28 Multimedia Gallery - Talks & Presentations More ...
Comments from the google video: This talk summarises two decades of work on
Smalltalk and Self compilation and virtual machine technology and describes
a novel attempt at an adaptive optimizer for Smalltalk that is written in
Smalltalk and to a meaningful extent, portable across implementations.
Smalltalk-80 and Self are fully object-oriented implicitly typed dynamic
programming languages and interactive programming environments hosted above
virtual machines that appear to execute stack-oriented bytecode for a pure
stack machine. These systems' code and execution state are all represented
by objects programmed in Smalltalk, such that the compiler, debugger,
exception system and more are all implemented entirely in Smalltalk and
available to the programmer for immediate incremental and interactive
modification LikeDislike Community Disqus Add New Comment Optional: Login
below. Post as … Showing 0 comments Sort by Subscribe by email Subscribe by
RSS blog comments powered by DISQUS back to top < Prev Next > Main Menu Home
ClubSmalltalk Merchandise About Us! Contact Us Articles Frontpage news
Interviews Community ClubSmalltalk | Mailing List [In Spanish] ClubSmalltalk
| LinkedIn Group ClubSmalltalk | Facebook ClubSmalltalk | Community Blogs
Smalltalkers Blogs & Personal pages Smalltalkers - Social Network
Environments Commercial Smalltalk Environments Free Smalltalk Environments
Abbandon Smalltalk Environments Frameworks, Platforms & Tools FAQ Smalltalk
Frequently Asked Questions GemStone Frequently Asked Questions ENVY/Manager
Frequently Asked Questions Resources Smalltalk Jobs! Smalltalk Web Links
Smalltalk News Feeds Smalltalk Books Smalltalk Podcasts Multimedia Gallery
Back to the future - Photo Gallery Smalltalk History Channel Talks &
Presentations Smalltalk Documentaries Login If you don't want to register in
this site, you can use your Gmail or OpenId authentication. Username
Password Remember Me Forgot your password? Forgot your username? Create an
account Login with an OpenID What is OpenId? Smalltalk on Twitter
J_WICKS_CTE (J.Wicks) : I hate 2 word sentences #smalltalk 11th_echo
(Yvonne) : Da fand er unseren Smalltalk wohl so einlullend, dass er erstmal
seine Schlüssel liegen ließ :DDDDDDDDDDDD höhö Dennis_Klinger (Dennis
Klinger) : benachrichtigung bei antwort: hi leute ich hab mal ne frage. wie
kann man sich benachrichtigen lassen wenn in ... http://bit.ly/gTytl0 onizee
(Oneal Madumo) : Those who understnd #smalltalk ,guy walks in on me in th
gym showers,and he says,"eish o tshwara shawara",how do i respond to this?
JusCardo (Ricardo Cherry) : #SMALLtalk RT @_SweetTeee @JusCardo lol I hate
you Paypal Donation Please, make a Paypal donation at least of 1 dollar!
Thanks! Copyright © 2011 ClubSmalltalk. All Rights Reserved. Contact Us!
Please, if you like the site visit our sponsors. Thanks.> that covers the
essential ideas.  This project could completely close the gap to C if
augmented by a good quality code generator.  At least, that's a goal.

> -- jbthiel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20110217/86ce88c3/attachment.htm