Hi Dave:
On 31/07/11 15:35, David T. Lewis wrote:
On Sun, Jul 31, 2011 at 12:59:21PM +0200, Stefan Marr wrote:
Just out of curiosity, in which kind of use cases is the inline-behavior of the used C compiler not sufficient to rely on, instead of manually inline such C code?
Especially, since people like Mike Pall of the LuaJIT2 claim that GCC with -O3 inlines to aggressively anyway which leads to code bloat that does not fit into typical CPU instruction caches and thus slows things down. But since that is just 3rd-hand knowledge, I would like to hear about real experiences.
The slang inliner is amazingly effective. The most obvious use case is of course the interpreter itself. Try turning off the slang inlining, apply all the GCC optimization you want, and you will end up with a painfully slow VM.
As a second use case, which to me was even more convincing, consider the memory access macros in sqMemoryAccess.h. These are written to be as efficient as possible for speed. Then look at the slang code in the MemoryAccess package on SqueakSource/VMMaker. This is a slang replacement for the memory access macros. When this package is used, the macros are not used at all, and the memory access code is all Smalltalk down to the lowest possible level.
I found that using the slang memory access methods, which are fully inlined by the slang inliner, results in a VM with performance identical to that of the VM with C macros (to the best of my ability to measure it with #tinyBenchmarks). I was extremely surprised by this result, and it tells me that the slang inliner is really very effective indeed.
Interesting. Just for completeness: When the inlineing isn't done, the generated C functions have compiler hints like `inline` and `__attribute__ ((always_inline))`, and are generated into the same compilation unit? I guess the last one is true since the interp.c is probably the most relevant thing here.
Any guess what the reason could be why the C compiler fails to do proper inlineing?
Thanks Stefan
Dave