David,
If you replace the cpp macros with high level Smalltalk (slang), and rely on the inliner in CCodeGenertor to unroll the Smalltalk into inlined C code, then the performance of the resulting interpreter is essentially the same as that of the interpreter implemented with cpp macros.
This is important for two reasons:
- By getting rid of the cpp macros, you open up the possibility of doing
profiling and debugging of low level functions directly with generated C code that is not obscured by cpp macros.
I agree it is better, but in terms of measurement inlined functions and macros are equivalent. But:
- If you want to look at time spent in individual functions that normally
would be inlined (either because they are cpp macros, or because the slang inliner unrolled them in similar ways), then you can do so by disabling the inlining during C code generation. This produces a very slow VM, but one that can be easily profiled to see time spent in the individual functions.
Such results are way better than nothing but I think they will be distorted. When a function is inlined you can do additional optimization so the resulting code is not the same as in the function but without the call/return overhead.
I have only ever measured this in the interpreter VM (see package MemoryAccess in the VMMaker repository), but it would be reasonable to expect similar results with oscog, at least with respect to the base interpreter and primitive functions.
Like I said, I am interested both in the interpreter and in compiled code. Thanks for the tip!
-- Jecel