[NB] NativeBoost meets JIT - Vm-dev

21 Sep 2012


      Hello there,
so, we're entered a new area, where native code, generated from image
side can be run directly by JIT.
This feature was one of the first things which i wanted to try, once
Eliot released Cog :)
The way how we do that, is when VM decides to JIT a specific method,
we copying the native code (from method trailer)
directly into the method's code.
All you need to do is to use special primitive for that 220 (
#primitiveVoltage)
So, a first question, which we wanted to be answered is how faster to
run native code by JIT,
comparing to running native code via NativeBoost primitive , which is
#primitiveNativeCall..
For here are methods, which just answer 42:
This one using #primitiveNativeCall
nbFoo2
    <primitive: #primitiveNativeCall module: #NativeBoostPlugin error: errorCode>
^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm |
    	asm noStackFrame.
    	asm
    		mov: (42 << 1) + 1 to: asm EAX;
    		ret.
    ]
And this one uses JIT:
nbFoo
    <primitive: 220 error: errorCode>
[ errorCode = ErrRunningViaInterpreter  ] whileTrue: [ ^ self nbFoo ].
^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm |
    	asm noStackFrame.
    	asm
    		mov: (42 << 1) + 1 to: asm EDX;
    		ret: 4 asUImm.
    ]
And this one is code which JIT can do:
nbFoo42
    ^ 42
So, here the numbers:
Time to run via #primitiveNativeCall :
[100000000 timesRepeat: [ MyClass nbFoo2  ] ] timeToRun
 6995
Time to run via JIT:
[100000000 timesRepeat: [ MyClass nbFoo  ] ] timeToRun
897
Time to run JITed method:
[100000000 timesRepeat: [ MyClass nbFoo42  ] ] timeToRun
899
so, as you can see, the JITed method and our custom generated code is
on par (which is logical ;).
Time to run an empty loop:
[100000000 timesRepeat: [  ] ] timeToRun 679
So, here the result, if we extract the loop overhead, we can see the
difference in
calling our native code when it uses JIT vs using #primitiveNativeCall :
(6995 - 679 ) / (897- 679) asFloat 28.972477064220183
28 times faster!!!!
So, with this new feature, we now can make our generated code to run
with unmatched speed,
without overhead related to #primitiveNativeCall.
This is especially useful for implementing primives which involving
heavy numeric crunching.
I would release this code to public, but there's one little
discrepancy i need to deal with first:
(one little problem, which i hope Eliot can help to solve)
it looks like primitivePerform: never enters the JIT mode, but always
executing the method via interpreter.
This is why you see this code:
    [ errorCode = ErrRunningViaInterpreter  ] whileTrue: [ ^ self nbFoo ].
because if i do it inside of NBNativeCodeGen>>jitMethodAssembly:,
which checks for same error and retries the send using perform
primitive, it never enters the JIT mode,
resulting in endless loop :(
This is despite the fact that method is JITed, because we enforce the
JITing of that method during error handling:
lastError = ErrRunningViaInterpreter ifTrue: [
    	"a method contains native code, but executed by interpreter "
    	method forceJIT ifFalse: [ self error: 'Failed to JIT the compiled
method. Try reducing it''s size ' ].
    	^ self retrySend: aContext
    	].
The #forceJit is the primitive which i implemented like following:
primitiveForceJIT
<export: true >
    
    | val result |
    
    val := self stackTop.
    
    (self isIntegerObject: val) ifTrue: [ ^ self primitiveFail ].	
    (self isCompiledMethod: val) ifFalse: [ ^ self primitiveFail ].
(self methodHasCogMethod: val) ifFalse: [
    	cogit cog: val selector: objectMemory nilObject ].
    
    result := (self methodHasCogMethod: val ) ifTrue: [ objectMemory
trueObject ] ifFalse: [ objectMemory falseObject ].
^ self pop: 1 thenPush: result.
As you can see from its usage, if VM, for some reason will fail to jit
the method, the primitive will answer false,
and we will stop with an error.. Which apparently never happens.
Still, a #primitivePerform seems like ignoring that the method
contains machine code an always runs it interpreted :(
I do not like the idea, that users will be forced to manually put such
loops in every method they will write..
any ideas/suggestions how to overcome that?
-- 
Best regards,
Igor Stasenko.