Hi Chris,
First, there are a few bytecodes that don't compile. The major language feature missing now is cascades. Bytecode 136 is duplicate top of stack which is used for cascades. There's also only a handful of primitives implemented. If you're doing something that the interpreter optimises more than Exupery does now then compiling will slow down execution. That said, 70% of the time is spent inside interpret() the big interpreter function produced by inlining the interpreter's main loop. For now I'm targeting that 70%.
The easiest way to try to optimise something is to use the following sequence:
ExuperyProfiler optimise: [your code].
#optimise: runs the code in the block and profiles it. Based on that profile it will try to compile methods that will benefit.
your code.
Execute your code again to populate the polymorphic inline caches. Exupery uses them to dynamically inline primitives.
Exupery dynamicallyInline.
#dynamciallyInline runs over all the natively compiled methods in the system then dynamically inlines any primitives
Exupery's send optimisations only provide a speed improvement if both sides are compiled. Performance seems identical to the interpreter when calling interpreted code.
The interpreter's main loop includes implementations of a handful of primitives including #at: and #at:put: that have their own bytecodes. Exupery optimises these by using dynamic primitive inlining however that requires a second compile or explicit inlining instructions. Also I haven't yet re-implemented all of the primitives that the interpreter optimises. SmallInteger operations are automatically inlined.
Exupery also needs to compile a method once for each receiver. I do this so that I can specialise the method for it's receiver. At the moment only #at: and #at:put: are specialised. The advantage is that the code executed is customised to the receivers shape. I may allow some method's to be shared to multiple receivers in the future but for now compiling everything the same way is simpler.
In your example:
ByteArray>>#maUint: bits at: position put: anInteger position + 1 to: position + (bits // 8) do: [ :pos | self at: pos put: (anInteger digitAt: pos-position) ]. ^anInteger
First I'd try compiling it using the profiler as above. If I was manually trying to compile it, I would also compile SmallInteger>>digitAt:. ByteArray>>#at:put: can not be compiled yet but the intepreter optimises it into the #at:put: bytecode. When optimising a method, try to compile all the methods it will call while measuring any benefits.
I haven't yet tried to optimise code that uses LargeIntegers heavily. I don't know how such code will perform. There are several options availible to optimise them including compiling calls to primitives into compiled code. Compiling a call to a primitive would let it benefit from Exupery's faster sends between compiled code.
How heavily are LargeInteger's used in Magma?
Bryce