[Vm-dev] Re: [squeak-dev] Short-circuiting comparisons (booleanCheat) via special selectors [Was Using #= for integer comparison instead of #==]

Fri Nov 19 22:50:08 UTC 2010

On 19 November 2010 21:48, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> Hi All,
>     so the performance benefit of the special selectors is interesting.  One
> thing the interpreter does on all relational special selectors #= #~= #< #<=
> #> #>= is statically predict SmallIntegers and/or Floats and look ahead for
> a following conditional branch, evaluating the branch immediately, avoiding
> reifying the result of the relational as either true or false and later
> testing for it, hence jumping directly on the condition codes.
> e.g.
> bytecodePrimLessThan
> | rcvr arg aBool |
> rcvr := self internalStackValue: 1.
> arg := self internalStackValue: 0.
> (self areIntegers: rcvr and: arg) ifTrue:
> ["The C code can avoid detagging since tagged integers are still signed.
> But this means the simulator must override to do detagging."
> ^self cCode: [self booleanCheat: rcvr < arg]
> inSmalltalk: [self booleanCheat: (objectMemory integerValueOf: rcvr) <
> (objectMemory integerValueOf: arg)]].
> self initPrimCall.
> aBool := self primitiveFloatLess: rcvr thanArg: arg.
> self successful ifTrue: [^ self booleanCheat: aBool].
> messageSelector := self specialSelector: 2.
> argumentCount := 1.
> self normalSend
> booleanCheat: cond
> "cheat the interpreter out of the pleasure of handling the next bytecode IFF
> it is a jump-on-boolean. Which it is, often enough when the current bytecode
> is something like bytecodePrimEqual"
> <inline: true>
> cond
> ifTrue: [self booleanCheatTrue]
> ifFalse: [self booleanCheatFalse]
> booleanCheatFalse
> "cheat the interpreter out of the pleasure of handling the next bytecode IFF
> it is a jump-on-boolean. Which it is, often enough when the current bytecode
> is something like bytecodePrimEqual"
> | bytecode offset |
> <sharedCodeNamed: 'booleanCheatFalse' inCase: 179>
> bytecode := self fetchByte.  "assume next bytecode is jumpIfFalse (99%)"
> self internalPop: 2.
> (bytecode < 160 and: [bytecode > 151]) ifTrue:  "short jumpIfFalse"
> [^self jump: bytecode - 151].
> bytecode = 172 ifTrue:  "long jumpIfFalse"
> [offset := self fetchByte.
> ^self jump: offset].
> "not followed by a jumpIfFalse; undo instruction fetch and push boolean
> result"
> localIP := localIP - 1.
> self fetchNextBytecode.
> self internalPush: objectMemory falseObject
> booleanCheatTrue
> "cheat the interpreter out of the pleasure of handling the next bytecode IFF
> it is a jump-on-boolean. Which it is, often enough when the current bytecode
> is something like bytecodePrimEqual"
> | bytecode |
> <sharedCodeNamed: 'booleanCheatTrue' inCase: 178>
> bytecode := self fetchByte.  "assume next bytecode is jumpIfFalse (99%)"
> self internalPop: 2.
> (bytecode < 160 and: [bytecode > 151]) ifTrue:  "short jumpIfFalse"
> [^self fetchNextBytecode].
> bytecode = 172 ifTrue: "long jumpIfFalse"
> [self fetchByte.
> ^self fetchNextBytecode].
> "not followed by a jumpIfFalse; undo instruction fetch and push boolean
> result"
> localIP := localIP - 1.
> self fetchNextBytecode.
> self internalPush: objectMemory trueObject
>
> Until yesterday Cog didn't do this for jitted code.  The conversation about
> using #= for testing got me motivated to implement it.  Note that
> VisualWorks' HPS VM does something even but more aggressive than
> booleanCheat:.  So what are we talking about?  Here's my micro-benchmark:
> Time millisecondsToRun: [1 to: 100000000 do: [:i| ]]

> The bytecode compiler optimizes the inner loop to code equivalent to
> | i |
> i := 1.
> [i <= 100000000] whileTrue: [i := i + 1]

i think that with agressive optimization, this loop can be turned into no-op.
or more presisely to an instruction which sets i to 100000001 :)

But i wonder what kind of analysis should be applied to determine that
loop is bound
to 100000000, and there is no other side effects than just incrementing counter.
Btw, i think it is much easier to optimize original
1 to: 100000 do:
since you know beforehead the loop bounds, and need only to analyze if
loop body has any side effects.

In that way, it is better to analyze the code at AST level, than on
bytecode level, since once you turn it into bytecode,
you losing a precious information that loop has bounds and have no
choice but to strictly follow bytecode semantics.
While of course, Cog have not much choice, since it can't operate with
AST, just a bytecode.

[snip]

> best,
> Eliot

-- 
Best regards,
Igor Stasenko AKA sig.