[Vm-dev] Re: [squeak-dev] Short-circuiting comparisons (booleanCheat) via special selectors [Was Using #= for integer comparison instead of #==]

Fri Nov 19 23:43:33 UTC 2010

Hi Igor,

On Fri, Nov 19, 2010 at 2:50 PM, Igor Stasenko <siguctua at gmail.com> wrote:

>
> On 19 November 2010 21:48, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> > Hi All,
> >     so the performance benefit of the special selectors is interesting.
>  One
> > thing the interpreter does on all relational special selectors #= #~= #<
> #<=
> > #> #>= is statically predict SmallIntegers and/or Floats and look ahead
> for
> > a following conditional branch, evaluating the branch immediately,
> avoiding
> > reifying the result of the relational as either true or false and later
> > testing for it, hence jumping directly on the condition codes.
> > e.g.
> > bytecodePrimLessThan
> > | rcvr arg aBool |
> > rcvr := self internalStackValue: 1.
> > arg := self internalStackValue: 0.
> > (self areIntegers: rcvr and: arg) ifTrue:
> > ["The C code can avoid detagging since tagged integers are still signed.
> > But this means the simulator must override to do detagging."
> > ^self cCode: [self booleanCheat: rcvr < arg]
> > inSmalltalk: [self booleanCheat: (objectMemory integerValueOf: rcvr) <
> > (objectMemory integerValueOf: arg)]].
> > self initPrimCall.
> > aBool := self primitiveFloatLess: rcvr thanArg: arg.
> > self successful ifTrue: [^ self booleanCheat: aBool].
> > messageSelector := self specialSelector: 2.
> > argumentCount := 1.
> > self normalSend
> > booleanCheat: cond
> > "cheat the interpreter out of the pleasure of handling the next bytecode
> IFF
> > it is a jump-on-boolean. Which it is, often enough when the current
> bytecode
> > is something like bytecodePrimEqual"
> > <inline: true>
> > cond
> > ifTrue: [self booleanCheatTrue]
> > ifFalse: [self booleanCheatFalse]
> > booleanCheatFalse
> > "cheat the interpreter out of the pleasure of handling the next bytecode
> IFF
> > it is a jump-on-boolean. Which it is, often enough when the current
> bytecode
> > is something like bytecodePrimEqual"
> > | bytecode offset |
> > <sharedCodeNamed: 'booleanCheatFalse' inCase: 179>
> > bytecode := self fetchByte.  "assume next bytecode is jumpIfFalse (99%)"
> > self internalPop: 2.
> > (bytecode < 160 and: [bytecode > 151]) ifTrue:  "short jumpIfFalse"
> > [^self jump: bytecode - 151].
> > bytecode = 172 ifTrue:  "long jumpIfFalse"
> > [offset := self fetchByte.
> > ^self jump: offset].
> > "not followed by a jumpIfFalse; undo instruction fetch and push boolean
> > result"
> > localIP := localIP - 1.
> > self fetchNextBytecode.
> > self internalPush: objectMemory falseObject
> > booleanCheatTrue
> > "cheat the interpreter out of the pleasure of handling the next bytecode
> IFF
> > it is a jump-on-boolean. Which it is, often enough when the current
> bytecode
> > is something like bytecodePrimEqual"
> > | bytecode |
> > <sharedCodeNamed: 'booleanCheatTrue' inCase: 178>
> > bytecode := self fetchByte.  "assume next bytecode is jumpIfFalse (99%)"
> > self internalPop: 2.
> > (bytecode < 160 and: [bytecode > 151]) ifTrue:  "short jumpIfFalse"
> > [^self fetchNextBytecode].
> > bytecode = 172 ifTrue: "long jumpIfFalse"
> > [self fetchByte.
> > ^self fetchNextBytecode].
> > "not followed by a jumpIfFalse; undo instruction fetch and push boolean
> > result"
> > localIP := localIP - 1.
> > self fetchNextBytecode.
> > self internalPush: objectMemory trueObject
> >
> > Until yesterday Cog didn't do this for jitted code.  The conversation
> about
> > using #= for testing got me motivated to implement it.  Note that
> > VisualWorks' HPS VM does something even but more aggressive than
> > booleanCheat:.  So what are we talking about?  Here's my micro-benchmark:
> > Time millisecondsToRun: [1 to: 100000000 do: [:i| ]]
>
> > The bytecode compiler optimizes the inner loop to code equivalent to
> > | i |
> > i := 1.
> > [i <= 100000000] whileTrue: [i := i + 1]
>
> i think that with agressive optimization, this loop can be turned into
> no-op.
> or more presisely to an instruction which sets i to 100000001 :)
>

Are you missing the point, which is to try and generate efficient loop code,
not to optimize empty loops?  I stripped the example down to the smallest
possible code (an empty loop) just to show how much better VW code is and
how limited the effect of merely doing the booleanCheat:.  But loops that
/can't/ be optimized away are very common in the system, and right now Cog
isn't doing a great job at making these go fast.

But i wonder what kind of analysis should be applied to determine that
> loop is bound
> to 100000000, and there is no other side effects than just incrementing
> counter.
> Btw, i think it is much easier to optimize original
> 1 to: 100000 do:
> since you know beforehead the loop bounds, and need only to analyze if
> loop body has any side effects.
>

Imagine the loop had something meaningful in it such as Set>>do:.

In that way, it is better to analyze the code at AST level, than on
> bytecode level, since once you turn it into bytecode,
> you losing a precious information that loop has bounds and have no
> choice but to strictly follow bytecode semantics.
> While of course, Cog have not much choice, since it can't operate with
> AST, just a bytecode.

Yes, but you're missing the point.  One can still generate faster jitted
code than Cog is producing now.  My message was trying to show how much more
there is to be gained by a better jit code generator, not about higher-level
optimization.

best
Eliot

> [snip]
>
> > best,
> > Eliot
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20101119/5933e0a8/attachment-0001.htm