[Vm-dev] Slang inliner effectiveness

David T. Lewis lewis at mail.msen.com
Sun Jul 31 15:25:31 UTC 2011


On Sun, Jul 31, 2011 at 04:38:39PM +0200, Stefan Marr wrote:
> 
> Hi Dave:
> 
> On 31/07/11 15:35, David T. Lewis wrote:
> >
> >On Sun, Jul 31, 2011 at 12:59:21PM +0200, Stefan Marr wrote:
> >>Just out of curiosity, in which kind of use cases is the inline-behavior
> >>of the used C compiler not sufficient to rely on, instead of manually
> >>inline such C code?
> >>
> >>Especially, since people like Mike Pall of the LuaJIT2 claim that GCC
> >>with -O3 inlines to aggressively anyway which leads to code bloat that
> >>does not fit into typical CPU instruction caches and thus slows things 
> >>down.
> >>But since that is just 3rd-hand knowledge, I would like to hear about
> >>real experiences.
> >The slang inliner is amazingly effective. The most obvious use case
> >is of course the interpreter itself. Try turning off the slang inlining,
> >apply all the GCC optimization you want, and you will end up with a
> >painfully slow VM.
> >
> >As a second use case, which to me was even more convincing, consider
> >the memory access macros in sqMemoryAccess.h. These are written to be
> >as efficient as possible for speed. Then look at the slang code in
> >the MemoryAccess package on SqueakSource/VMMaker. This is a slang
> >replacement for the memory access macros. When this package is used,
> >the macros are not used at all, and the memory access code is all
> >Smalltalk down to the lowest possible level.
> >
> >I found that using the slang memory access methods, which are fully
> >inlined by the slang inliner, results in a VM with performance
> >identical to that of the VM with C macros (to the best of my ability
> >to measure it with #tinyBenchmarks). I was extremely surprised by this
> >result, and it tells me that the slang inliner is really very effective
> >indeed.
> Interesting. Just for completeness:
> When the inlineing isn't done, the generated C functions have compiler 
> hints like `inline` and `__attribute__ ((always_inline))`, and are 
> generated into the same compilation unit? I guess the last one is true 
> since the interp.c is probably the most relevant thing here.

No, there would have been no extra compiler hints generated in either
case. Yes, for the interpreter proper (and hence the main interpreter
loop) there is only a single compilation unit (interp.c).

> 
> Any guess what the reason could be why the C compiler fails to do proper 
> inlineing?

I cannot really say, and I am not much of an expert of C compilers.
For the most part I was just concerned with the slang inliner itself
when I was doing this (it needed some tweaks and fixes before I could
get MemoryAccess to work properly). I was very impressed with how well
the slang inliner actually worked in practice, though I cannot say too
much about what it might take to get a C compiler to achieve similar
results. To be honest I would not much care about it, given how well
the slang inlining already works, and given that it is generating C
code that will work well on most any compiler. I also like the fact
that it is 100% Smalltalk, and does not rely on any hidden magic in
the external compiler.

Dave



More information about the Vm-dev mailing list