[ENH][VM] Improved code generation (hopefully ;)

Andreas Raab andreas.raab at gmx.de
Sun Jul 6 23:32:34 UTC 2003


Hi Guys,

I was always suspicious about the way CCodeGenerator handled #interpret with
respect to temps (e.g., inlining all temps into interpret and randomly
renaming them t1 ... tN) as it completely spoils life-time analysis for the
C compiler (which has to assume that temps may be read in other code
branches and may even "optimize" them into wasting unneeded registers across
code branches).

So I spend the day fixing this to see what the effect actually is. Like I
suspected, the change is significant. For tinyBenchmarks it gets us:

	'112974404 bytecodes/sec; 3236713 sends/sec' "before"
	'123195380 bytecodes/sec; 3433110 sends/sec' "after"

which makes for +8% in bytecode and +6% in send speed without actually
changing any real code. Even macroBenchmarks agree on 4-6% improvements in
real-life code which is pretty significant, considering that much of the
code is in primitives:

	#(9476 62198 21160 10493 0 6928 4417) "before"
	#(9035 59880 19894 10000 0 6657 4261) "after"

I'm almost certain that the changes apply to all platforms (as the improved
life-time analysis should benefit all but the most stupid C compilers ;-)
but I'd like to double check with anyone who's compiling their own VMs.

The changes are split into two parts: The first part is the
CCodeGenEnhancement CS which should (to the best of my knowledge) be
applicable to any Squeak between 3.4 and 3.6. The second part fixes a few
issues due to stricter constraints when using "shared code sections" in the
interpreter loop. These should be taken with a bit of salt if you're on 3.4
or 3.6 - I've written the changes against 3.5 and while they are pretty
simple (see preambles) they need to be validated against the concrete system
you're using (I'm just too lazy to check it).

In any case, I'd really appreciate if some of you could recompile a VM and
let me know how things are before and after applying the changes. If there
are no ill effects I'd really want to get these changes into VMMaker before
too long - they are incredibly useful if you want to hack the VM in C in
order to experiment with various local optimizations. In fact, this was my
original reason for doing it - I have some really ugly suspicions about a
few extremely heavily used portions of the code where I think that even
slightly reordering the operations could make an incredible difference.

Here are the CS preambles:
"Change Set:		CodeGenEnhancements-ar
Date:			7 July 2003
Author:			Andreas Raab

This change set modifies the code generator to inline case branches in the
main interpreter loop without sharing all of the temps between the cases.
This heavily increases life-time analysis for the C compiler which no longer
has to assume that all temps within interpret must be written to, therefore
allowing for local optimizations which were impossible to achieve otherwise.

With the changes provided here, there are some additional rules for using
'shared code sections' (provided by #shareCodeNamed:inCase:). Methods
containing these sections have two requirements:
a) They must not take arguments but communicate all state via (potentially
localized) variables.
b) They must only be called as the LAST method by any caller so that nested
shared sections can be inlined appropriately.
While the first constraint is validated by CCodeGenerator, the latter
currently isn't and will lead to all hell breaking loose if used improperly.
"

"Change Set:		InterpreterFixes-ar
Date:			7 July 2003
Author:			Andreas Raab

This change set fixes Interpreter to adhere to the rules for sharing code
sections as defined by the CCodeGenEnhancements-ar change set. Specifically
the CS does the following:
* introducing 'localReturnValue' and 'localReturnContext' for passing the
arguments to the (shared) common return section
* Replacing #returnValue:to: by #commonReturn in accordance with the above.
* Removing the common code section from #internalFindNewMethod as it did not
adhere to the end-recursion rule.
* Instead, providing a #commonSend method which is shared and does adhere to
the end-recursion rule
* Change both #normalSend and #superclassSend to use the shared code section
in #commonSend (therefore making it equivalent with the former behavior).
"

Cheers,
  - Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CodeGenEnhancements-ar.1.cs
Type: application/octet-stream
Size: 18539 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20030707/62c300d9/CodeGenEnhancements-ar.1.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: InterpreterFixes-ar.1.cs
Type: application/octet-stream
Size: 9413 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20030707/62c300d9/InterpreterFixes-ar.1.obj


More information about the Squeak-dev mailing list