Re: [Vm-dev] Interpreter>>isContextHeader: optimization

22 Feb 2009


      Hi Igor,
On Sat, Feb 21, 2009 at 7:10 PM, Igor Stasenko siguctua@gmail.com wrote:
...
2009/2/22 Eliot Miranda eliot.miranda@gmail.com:
...
On Sat, Feb 21, 2009 at 3:36 PM, Igor Stasenko siguctua@gmail.com
wrote:
...
...
2009/2/21 Eliot Miranda eliot.miranda@gmail.com:
...
Hi Igor,
On Fri, Feb 20, 2009 at 11:37 PM, Igor Stasenko siguctua@gmail.com
wrote:
...
...
...
...
Here the method:
isContextHeader: aHeader
       self inline: true.
       ^ ((aHeader >> 12) bitAnd: 16r1F) = 13
"MethodContext"
...
...
...
...
           or: [((aHeader >> 12) bitAnd: 16r1F) = 14

"BlockContext"
...
...
...
...
           or: [((aHeader >> 12) bitAnd: 16r1F) = 4]]

"PseudoContext"
...
...
...
...
i think it wouldn't hurt to rewrite it as:
isContextHeader: aHeader
       self inline: true.
 | hdr |
 hdr := aHeader bitAnd: (16r1F << 12).
       ^ hdr = (13 << 12)                      "MethodContext"
               or: [ hdr = (14 << 12)          "BlockContext"
               or: [ hdr = (4 << 12)]]  "PseudoContext"
which will allow GCC to optimize it more easily.
I'm not sure if it can optimize it in its current state.
This may impact a small speedup of copy operations and any other
operations which need to determine a number of pointer fields in
object (users of #lastPointerOf:)
First you should look at the assembly that gcc generates to be sure
anything is needed.  e.g.
...
...
...
cat >t.c <<END
long isContext(long aHeader) {
    return ((aHeader >> 12) & 0x1F) == 13
        || ((aHeader >> 12) & 0x1F) == 14
        || ((aHeader >> 12) & 0x1F) == 4;
}
END
gcc -O3 -S -fomit-frame-pointer t.c; cat t.s
    .text
.globl _isContext
_isContext:
    movl    4(%esp), %edx
    sarl    $12, %edx
    andl    $31, %edx
    leal    -13(%edx), %eax
    cmpl    $1, %eax
    jbe L2
    cmpl    $4, %edx
    je  L2
    xorl    %eax, %eax
    ret
L2:
    movl    $1, %eax
    ret
    .subsections_via_symbols
So you don't need to do anything; it has done everything for you.
However, one point is important.  Using 16r1F << 12 et al as your
masks and constants to compare against is much worse on many systems, most
importantly x86, than shifting down by 12 and comparing against small
constants, because the instruction set can encode small constants far more
compactly, and that means better code density in the icache which is
significant for performance.  e.g. on x86 a constant in the range -128 to
127 typically takes a byte whereas anything else will take 4.
...
...
...
But what I really think is that this is too low a level to worry
about.  Much more important to focus on
...
...
...

context to stack mapping
in-line cacheing via a JIT
exploiting multicore via Hydra

and beyond (e.g. speculative inlining)
than worrying about tiny micro-optimizations like this :)
Thanks Eliot.
In fact, this method drawn my attention because of its number of
checks. Typically, all code which dealing with object formats contain
many branches. And from places where this method is called, there are
additional checks surrounding it.
So, the real problem is the overwhelming number of checks to do
something, and i think this having own impacts on performance.
I hope that a new object format with 64 bit header, which you plan to
use, will allow us to avoid so many branches in code which having high
usage frequency.
In fact the StackVM makes a big improvement to this very method because
in the StackVM there are only MethodContexts and so the method reads
...
isContextHeader: aHeader
<inline: true>
"c.f. {BlockContext. MethodContext. PseudoContext} collect: [:class|
class -> class indexIfCompact]"
...
^(self compactClassIndexOfHeader: aHeader) ==
ClassMethodContextCompactIndex
...
which is f course equivalent to
isContextHeader: aHeader
^((aHeader >> 12) bitAnd: 16r1F) = 13
:)
yeah much more concise & understandable.
I currently thinking is there are simple ways to decompose huge
ObjectMemory/Interpreter on multiple smaller classes.
To illustrate it , applied to #isContextHeader: we could write it as
following:
isContextHeader: aHeader
<inline: true>
<var: #aHeader class: #OopBaseHeader>
^ aHeader compactClassIndex == ClassMethodContextCompactIndex
and, of course, then we really don't need #isContextHeader: at all
because we can simply write a direct message in methods where we need
such checks:
<var: #header class: #OopBaseHeader>
<var: #oop class: #Oop>
header := oop basicHeader.
header isContextHeader ifTrue: [ ... ]
The idea is to use type information in code generator to determine
where it should look for a code when translating a message sends to C
code.
With little more heuristics, we don't even need to declare types so often:
Oop>>basicHeader
 <returnType: #OopBaseHeader>
 ^ self longAt: 0
or even:
Oop>>basicHeader
 ^ (self longAt: 0) as:OopBaseHeader
so, then you could simply write:
oop basicHeader isContextHeader ifTrue: [... ]
looks like a plain smalltalk code, isnt? :)
i'm using similar technique for static inlining in Moebius/CorruptVM.
I'm also using something like this in Cog, but only for simple struct types,
a machine code method CogMethod, a stack page, various structs in the
compiler such as an instruction, a block start, etc.  e.g.
*generateInstructionsAt:* eventualAbsoluteAddress
        "Size pc-dependent instructions and assign eventual addresses to all
instructions.
         Answer the size of the code.
         Compute forward branches based on virtual address (abstract code
starts at 0),
         assuming that any branches branched over are long.
         Compute backward branches based on actual address.
         Reuse the fixups array to record the pc-dependent instructions that
need to have
         their code generation postponed until after the others."
        | absoluteAddress pcDependentIndex abstractInstruction fixup |
        <var: #abstractInstruction type: #'AbstractInstruction *'>
        <var: #fixup type: #'BytecodeFixup *'>
        absoluteAddress *:=* eventualAbsoluteAddress.
        pcDependentIndex *:=* 0.
        0 to: opcodeIndex - 1 do:
                 [:i|
                 breakPC = absoluteAddress ifTrue:
                          [self halt: 'breakPC reached in
generateInstructionsAt:'].
                 abstractInstruction *:=* self abstractInstructionAt: i.
                 abstractInstruction isPCDependent
                          ifTrue:
                                   [abstractInstruction
sizePCDependentInstructionAt: absoluteAddress.
                                    fixup *:=* self fixupAt:
pcDependentIndex.
                                    pcDependentIndex *:=* pcDependentIndex +
 1.
                                    fixup instructionIndex: i.
                                    absoluteAddress *:=* absoluteAddress +
abstractInstruction machineCodeSize]
                          ifFalse:
                                   [abstractInstruction address:
absoluteAddress.
                                    absoluteAddress *:=* abstractInstruction
 concretizeAt: absoluteAddress]].
        0 to: pcDependentIndex - 1 do:
                 [:i|
                 fixup *:=* self fixupAt: i.
                 abstractInstruction *:=* self abstractInstructionAt: fixup
instructionIndex.
                 breakPC = absoluteAddress ifTrue:
                          [self halt: 'breakPC reached in
generateInstructionsAt:'].
                 abstractInstruction concretizeAt: abstractInstruction
address].
        ^absoluteAddress - eventualAbsoluteAddress
You'll notice the lack of type inferrence means I have to assign typed
results to typed local variables to have the code generator be able to find
the right code.
My problem with doing it for oops has been not wanting to add methods to
Integer.  Do you use a special type for oop or do you add methods to
Integer?
...
...
...
--
Best regards,
Igor Stasenko AKA sig.
--
Best regards,
Igor Stasenko AKA sig.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Vm-dev] Interpreter>>isContextHeader: optimization