Re: [Vm-dev] Interpreter>>isContextHeader: optimization

21 Feb 2009


      Hi Igor,
On Fri, Feb 20, 2009 at 11:37 PM, Igor Stasenko siguctua@gmail.com wrote:
...
Here the method:
isContextHeader: aHeader
       self inline: true.
       ^ ((aHeader >> 12) bitAnd: 16r1F) = 13
 "MethodContext"
               or: [((aHeader >> 12) bitAnd: 16r1F) = 14
"BlockContext"
               or: [((aHeader >> 12) bitAnd: 16r1F) = 4]]
 "PseudoContext"
i think it wouldn't hurt to rewrite it as:
isContextHeader: aHeader
       self inline: true.
 | hdr |
 hdr := aHeader bitAnd: (16r1F << 12).
       ^ hdr = (13 << 12)                      "MethodContext"
               or: [ hdr = (14 << 12)          "BlockContext"
               or: [ hdr = (4 << 12)]]  "PseudoContext"
which will allow GCC to optimize it more easily.
I'm not sure if it can optimize it in its current state.
This may impact a small speedup of copy operations and any other
operations which need to determine a number of pointer fields in
object (users of #lastPointerOf:)
First you should look at the assembly that gcc generates to be sure anything
is needed.  e.g.
cat >t.c <<END
long isContext(long aHeader) {
    return ((aHeader >> 12) & 0x1F) == 13
        || ((aHeader >> 12) & 0x1F) == 14
        || ((aHeader >> 12) & 0x1F) == 4;
}
END
gcc -O3 -S -fomit-frame-pointer t.c; cat t.s
    .text
.globl _isContext
_isContext:
    movl    4(%esp), %edx
    sarl    $12, %edx
    andl    $31, %edx
    leal    -13(%edx), %eax
    cmpl    $1, %eax
    jbe L2
    cmpl    $4, %edx
    je  L2
    xorl    %eax, %eax
    ret
L2:
    movl    $1, %eax
    ret
    .subsections_via_symbols
So you don't need to do anything; it has done everything for you.
However, one point is important.  Using 16r1F << 12 et al as your masks and
constants to compare against is much worse on many systems, most importantly
x86, than shifting down by 12 and comparing against small constants, because
the instruction set can encode small constants far more compactly, and that
means better code density in the icache which is significant for
performance.  e.g. on x86 a constant in the range -128 to 127 typically
takes a byte whereas anything else will take 4.
But what I really think is that this is too low a level to worry about.
 Much more important to focus on
- context to stack mapping
- in-line cacheing via a JIT
- exploiting multicore via Hydra
and beyond (e.g. speculative inlining)
than worrying about tiny micro-optimizations like this :)
Best
Eliot
...
--
Best regards,
Igor Stasenko AKA sig.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Vm-dev] Interpreter>>isContextHeader: optimization