Hello there,
so, we're entered a new area, where native code, generated from image side can be run directly by JIT. This feature was one of the first things which i wanted to try, once Eliot released Cog :)
The way how we do that, is when VM decides to JIT a specific method, we copying the native code (from method trailer) directly into the method's code. All you need to do is to use special primitive for that 220 ( #primitiveVoltage)
So, a first question, which we wanted to be answered is how faster to run native code by JIT, comparing to running native code via NativeBoost primitive , which is #primitiveNativeCall..
For here are methods, which just answer 42:
This one using #primitiveNativeCall
nbFoo2 <primitive: #primitiveNativeCall module: #NativeBoostPlugin error: errorCode>
^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EAX; ret. ]
And this one uses JIT:
nbFoo <primitive: 220 error: errorCode>
[ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self nbFoo ].
^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EDX; ret: 4 asUImm. ]
And this one is code which JIT can do:
nbFoo42 ^ 42
So, here the numbers:
Time to run via #primitiveNativeCall :
[100000000 timesRepeat: [ MyClass nbFoo2 ] ] timeToRun 6995
Time to run via JIT:
[100000000 timesRepeat: [ MyClass nbFoo ] ] timeToRun 897
Time to run JITed method:
[100000000 timesRepeat: [ MyClass nbFoo42 ] ] timeToRun 899
so, as you can see, the JITed method and our custom generated code is on par (which is logical ;).
Time to run an empty loop:
[100000000 timesRepeat: [ ] ] timeToRun 679
So, here the result, if we extract the loop overhead, we can see the difference in calling our native code when it uses JIT vs using #primitiveNativeCall :
(6995 - 679 ) / (897- 679) asFloat 28.972477064220183
28 times faster!!!!
So, with this new feature, we now can make our generated code to run with unmatched speed, without overhead related to #primitiveNativeCall. This is especially useful for implementing primives which involving heavy numeric crunching.
I would release this code to public, but there's one little discrepancy i need to deal with first:
(one little problem, which i hope Eliot can help to solve)
it looks like primitivePerform: never enters the JIT mode, but always executing the method via interpreter.
This is why you see this code: [ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self nbFoo ].
because if i do it inside of NBNativeCodeGen>>jitMethodAssembly:, which checks for same error and retries the send using perform primitive, it never enters the JIT mode, resulting in endless loop :(
This is despite the fact that method is JITed, because we enforce the JITing of that method during error handling:
lastError = ErrRunningViaInterpreter ifTrue: [ "a method contains native code, but executed by interpreter " method forceJIT ifFalse: [ self error: 'Failed to JIT the compiled method. Try reducing it''s size ' ]. ^ self retrySend: aContext ].
The #forceJit is the primitive which i implemented like following:
primitiveForceJIT
<export: true > | val result | val := self stackTop. (self isIntegerObject: val) ifTrue: [ ^ self primitiveFail ]. (self isCompiledMethod: val) ifFalse: [ ^ self primitiveFail ].
(self methodHasCogMethod: val) ifFalse: [ cogit cog: val selector: objectMemory nilObject ]. result := (self methodHasCogMethod: val ) ifTrue: [ objectMemory trueObject ] ifFalse: [ objectMemory falseObject ].
^ self pop: 1 thenPush: result.
As you can see from its usage, if VM, for some reason will fail to jit the method, the primitive will answer false, and we will stop with an error.. Which apparently never happens. Still, a #primitivePerform seems like ignoring that the method contains machine code an always runs it interpreted :(
I do not like the idea, that users will be forced to manually put such loops in every method they will write.. any ideas/suggestions how to overcome that?
On Fri, Sep 21, 2012 at 04:59:36PM +0200, Igor Stasenko wrote:
Hello there,
so, we're entered a new area, where native code, generated from image side can be run directly by JIT. This feature was one of the first things which i wanted to try, once Eliot released Cog :)
I must confess that I am not do not know much about jitting and code generation, but ....
So, here the result, if we extract the loop overhead, we can see the difference in calling our native code when it uses JIT vs using #primitiveNativeCall :
(6995 - 679 ) / (897- 679) asFloat 28.972477064220183
28 times faster!!!!
This is very impressive work :)
Dave
On 21 September 2012 17:15, David T. Lewis lewis@mail.msen.com wrote:
On Fri, Sep 21, 2012 at 04:59:36PM +0200, Igor Stasenko wrote:
Hello there,
so, we're entered a new area, where native code, generated from image side can be run directly by JIT. This feature was one of the first things which i wanted to try, once Eliot released Cog :)
I must confess that I am not do not know much about jitting and code generation, but ....
well, with this feature, one can simply provide own code for JIT and can experiment/explore it much faster comparing to (re)compiling VM each time.. This is actually a main goal/mission of NativeBoost project. An FFI stuff is SECONDARY.
I can't say that i know much about Cog JIT.
One of nice features of machine code, is that it has deterministic behavior: - if it not blowing up, and works correctly it will keep working correctly ad infinitum.
So, bit by bit i build a machine code generation snippets which never fail, and then i can use them for higher abstractions like FFI/whatever :)
So, here the result, if we extract the loop overhead, we can see the difference in calling our native code when it uses JIT vs using #primitiveNativeCall :
(6995 - 679 ) / (897- 679) asFloat 28.972477064220183
28 times faster!!!!
This is very impressive work :)
Dave
Hello
2012/9/21 Igor Stasenko siguctua@gmail.com
nbFoo2 <primitive: #primitiveNativeCall module: #NativeBoostPlugin error: errorCode>
^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EAX; ret. ]
And this one uses JIT:
nbFoo <primitive: 220 error: errorCode>
[ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self
nbFoo ].
^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EDX; ret: 4 asUImm. ]
Why assembly code is different for jitted and non jitted versions? What I should change in my nativeboost assembly to be able to jit it?
Any way thank's for such impressive work.
Best regards, Denis
On 21 September 2012 17:53, Denis Kudriashov dionisiydk@gmail.com wrote:
Hello
2012/9/21 Igor Stasenko siguctua@gmail.com
nbFoo2 <primitive: #primitiveNativeCall module: #NativeBoostPlugin error: errorCode>
^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EAX; ret. ]
And this one uses JIT:
nbFoo <primitive: 220 error: errorCode>
[ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self nbFoo ]. ^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EDX; ret: 4 asUImm. ]
Why assembly code is different for jitted and non jitted versions?
In short: because Cog JIT code uses different convention(s). One of them is return value should be in EDX, unlike from cdecl convention, which is EAX.
What I should change in my nativeboost assembly to be able to jit it?
i think, most of the time it is entry and leave code.. and accessing method's argument(s). The rest, like calling interpreterProxy's functions can be left unchanged. Also, it must be noted, that JIT code uses own stack and making calls to C functions might be very dangerous (overflow stack) unless you know that C function won't consume much stack space. (otherwise you need to temporary switch stack when doing such calls).
I don't know too much details right now, but Eliot knows better because he wrote it :)
Any way thank's for such impressive work.
Best regards, Denis
On Fri, Sep 21, 2012 at 9:25 AM, Igor Stasenko siguctua@gmail.com wrote:
On 21 September 2012 17:53, Denis Kudriashov dionisiydk@gmail.com wrote:
Hello
2012/9/21 Igor Stasenko siguctua@gmail.com
nbFoo2 <primitive: #primitiveNativeCall module: #NativeBoostPlugin
error: errorCode>
^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EAX; ret. ]
And this one uses JIT:
nbFoo <primitive: 220 error: errorCode>
[ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self
nbFoo ].
^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EDX; ret: 4 asUImm. ]
Why assembly code is different for jitted and non jitted versions?
In short: because Cog JIT code uses different convention(s). One of them is return value should be in EDX, unlike from cdecl convention, which is EAX.
What I should change in my nativeboost assembly to be able to jit it?
i think, most of the time it is entry and leave code.. and accessing method's argument(s). The rest, like calling interpreterProxy's functions can be left unchanged. Also, it must be noted, that JIT code uses own stack and making calls to C functions might be very dangerous (overflow stack) unless you know that C function won't consume much stack space. (otherwise you need to temporary switch stack when doing such calls).
Exactly. The Smalltalk stack is paged, about 1k bytes per page, all part of the context-to-stack mapping scheme. One can't run general C code on that stack. Only code which is known not to consume stack space could be used safely. Instead, as NB does, the JIT generates machine code for certain performance-critical primitives and runs them directly on the Smalltalk stack.
I don't know too much details right now, but Eliot knows better because he wrote it :)
Any way thank's for such impressive work.
Best regards, Denis
-- Best regards, Igor Stasenko.
On Fri, Sep 21, 2012 at 8:53 AM, Denis Kudriashov dionisiydk@gmail.comwrote:
Hello
2012/9/21 Igor Stasenko siguctua@gmail.com
nbFoo2 <primitive: #primitiveNativeCall module: #NativeBoostPlugin error: errorCode>
^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EAX; ret. ]
And this one uses JIT:
nbFoo <primitive: 220 error: errorCode>
[ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self
nbFoo ].
^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EDX; ret: 4 asUImm. ]
Why assembly code is different for jitted and non jitted versions?
The Cog JIT uses its own register-based calling convention and runs on a segmented (paged) stack. So one can't run or call C code directly from Smalltalk. Instead, C code involves switching to the C stack and calling the C function there-on, a lot like a system call.
What I should change in my nativeboost assembly to be able to jit it?
Any way thank's for such impressive work.
Best regards, Denis
Hi Igor,
great news!
On Fri, Sep 21, 2012 at 7:59 AM, Igor Stasenko siguctua@gmail.com wrote:
Hello there,
so, we're entered a new area, where native code, generated from image side can be run directly by JIT. This feature was one of the first things which i wanted to try, once Eliot released Cog :)
The way how we do that, is when VM decides to JIT a specific method, we copying the native code (from method trailer) directly into the method's code. All you need to do is to use special primitive for that 220 ( #primitiveVoltage)
So, a first question, which we wanted to be answered is how faster to run native code by JIT, comparing to running native code via NativeBoost primitive , which is #primitiveNativeCall..
For here are methods, which just answer 42:
This one using #primitiveNativeCall
nbFoo2 <primitive: #primitiveNativeCall module: #NativeBoostPlugin error: errorCode>
^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EAX; ret. ]
And this one uses JIT:
nbFoo <primitive: 220 error: errorCode>
[ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self
nbFoo ].
^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EDX; ret: 4 asUImm. ]
And this one is code which JIT can do:
nbFoo42 ^ 42
So, here the numbers:
Time to run via #primitiveNativeCall :
[100000000 timesRepeat: [ MyClass nbFoo2 ] ] timeToRun 6995
Time to run via JIT:
[100000000 timesRepeat: [ MyClass nbFoo ] ] timeToRun 897
Time to run JITed method:
[100000000 timesRepeat: [ MyClass nbFoo42 ] ] timeToRun 899
so, as you can see, the JITed method and our custom generated code is on par (which is logical ;).
Time to run an empty loop:
[100000000 timesRepeat: [ ] ] timeToRun 679
So, here the result, if we extract the loop overhead, we can see the difference in calling our native code when it uses JIT vs using #primitiveNativeCall :
(6995 - 679 ) / (897- 679) asFloat 28.972477064220183
28 times faster!!!!
So, with this new feature, we now can make our generated code to run with unmatched speed, without overhead related to #primitiveNativeCall. This is especially useful for implementing primives which involving heavy numeric crunching.
I would release this code to public, but there's one little discrepancy i need to deal with first:
(one little problem, which i hope Eliot can help to solve)
it looks like primitivePerform: never enters the JIT mode, but always executing the method via interpreter.
I'll take a look. This is all very detailed so I'll need a little time.
This is why you see this code:
[ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self
nbFoo ].
because if i do it inside of NBNativeCodeGen>>jitMethodAssembly:, which checks for same error and retries the send using perform primitive, it never enters the JIT mode, resulting in endless loop :(
This is despite the fact that method is JITed, because we enforce the JITing of that method during error handling:
lastError = ErrRunningViaInterpreter ifTrue: [ "a method contains native code, but executed by
interpreter " method forceJIT ifFalse: [ self error: 'Failed to JIT the compiled method. Try reducing it''s size ' ]. ^ self retrySend: aContext ].
The #forceJit is the primitive which i implemented like following:
primitiveForceJIT
<export: true > | val result | val := self stackTop. (self isIntegerObject: val) ifTrue: [ ^ self primitiveFail ]. (self isCompiledMethod: val) ifFalse: [ ^ self primitiveFail ]. (self methodHasCogMethod: val) ifFalse: [ cogit cog: val selector: objectMemory nilObject ]. result := (self methodHasCogMethod: val ) ifTrue: [ objectMemory
trueObject ] ifFalse: [ objectMemory falseObject ].
^ self pop: 1 thenPush: result.
As you can see from its usage, if VM, for some reason will fail to jit the method, the primitive will answer false, and we will stop with an error.. Which apparently never happens. Still, a #primitivePerform seems like ignoring that the method contains machine code an always runs it interpreted :(
I do not like the idea, that users will be forced to manually put such loops in every method they will write.. any ideas/suggestions how to overcome that?
Yes. The JIT should be told that methods that have NB code should be jitted. But right now I don't understand enough of how NB code is generated and methods marked that they have NB code etc to know exactly how to do this. I need to play around a bit.
On 21 September 2012 19:50, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Igor,
great news!
On Fri, Sep 21, 2012 at 7:59 AM, Igor Stasenko siguctua@gmail.com wrote:
Hello there,
so, we're entered a new area, where native code, generated from image side can be run directly by JIT. This feature was one of the first things which i wanted to try, once Eliot released Cog :)
The way how we do that, is when VM decides to JIT a specific method, we copying the native code (from method trailer) directly into the method's code. All you need to do is to use special primitive for that 220 ( #primitiveVoltage)
So, a first question, which we wanted to be answered is how faster to run native code by JIT, comparing to running native code via NativeBoost primitive , which is #primitiveNativeCall..
For here are methods, which just answer 42:
This one using #primitiveNativeCall
nbFoo2 <primitive: #primitiveNativeCall module: #NativeBoostPlugin error: errorCode>
^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EAX; ret. ]
And this one uses JIT:
nbFoo <primitive: 220 error: errorCode>
[ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self nbFoo ]. ^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EDX; ret: 4 asUImm. ]
And this one is code which JIT can do:
nbFoo42 ^ 42
So, here the numbers:
Time to run via #primitiveNativeCall :
[100000000 timesRepeat: [ MyClass nbFoo2 ] ] timeToRun 6995
Time to run via JIT:
[100000000 timesRepeat: [ MyClass nbFoo ] ] timeToRun 897
Time to run JITed method:
[100000000 timesRepeat: [ MyClass nbFoo42 ] ] timeToRun 899
so, as you can see, the JITed method and our custom generated code is on par (which is logical ;).
Time to run an empty loop:
[100000000 timesRepeat: [ ] ] timeToRun 679
So, here the result, if we extract the loop overhead, we can see the difference in calling our native code when it uses JIT vs using #primitiveNativeCall :
(6995 - 679 ) / (897- 679) asFloat 28.972477064220183
28 times faster!!!!
So, with this new feature, we now can make our generated code to run with unmatched speed, without overhead related to #primitiveNativeCall. This is especially useful for implementing primives which involving heavy numeric crunching.
I would release this code to public, but there's one little discrepancy i need to deal with first:
(one little problem, which i hope Eliot can help to solve)
it looks like primitivePerform: never enters the JIT mode, but always executing the method via interpreter.
I'll take a look. This is all very detailed so I'll need a little time.
Heh.. it took me a while (more than a year) before i was able to understand how i can hook in.. sure i did not spent whole year working on that ;) , but anyways , i am not expecting immediate answer from you :)
This is why you see this code: [ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self nbFoo ].
because if i do it inside of NBNativeCodeGen>>jitMethodAssembly:, which checks for same error and retries the send using perform primitive, it never enters the JIT mode, resulting in endless loop :(
This is despite the fact that method is JITed, because we enforce the JITing of that method during error handling:
lastError = ErrRunningViaInterpreter ifTrue: [ "a method contains native code, but executed by interpreter " method forceJIT ifFalse: [ self error: 'Failed to JIT the compiled
method. Try reducing it''s size ' ]. ^ self retrySend: aContext ].
The #forceJit is the primitive which i implemented like following:
primitiveForceJIT
<export: true > | val result | val := self stackTop. (self isIntegerObject: val) ifTrue: [ ^ self primitiveFail ]. (self isCompiledMethod: val) ifFalse: [ ^ self primitiveFail ]. (self methodHasCogMethod: val) ifFalse: [ cogit cog: val selector: objectMemory nilObject ]. result := (self methodHasCogMethod: val ) ifTrue: [ objectMemory
trueObject ] ifFalse: [ objectMemory falseObject ].
^ self pop: 1 thenPush: result.
As you can see from its usage, if VM, for some reason will fail to jit the method, the primitive will answer false, and we will stop with an error.. Which apparently never happens. Still, a #primitivePerform seems like ignoring that the method contains machine code an always runs it interpreted :(
I do not like the idea, that users will be forced to manually put such loops in every method they will write.. any ideas/suggestions how to overcome that?
Yes. The JIT should be told that methods that have NB code should be jitted. But right now I don't understand enough of how NB code is generated and methods marked that they have NB code etc to know exactly how to do this. I need to play around a bit.
Let me explain some internal bits, to make it clear: It is not really matters how code is generated.. From VM's side of view it is simple: it takes bytes from Compiled method's trailer, and copies it to JIT method during code generation.
The hook for that is the 220-voltage ;) primitive , which i put it into #initializePrimitiveTableForSqueakV3, like that, when cog jits the method, it calls the 'code generator' for that primitive - #genPrimitiveNBNativeCall, which does nothing but directly copies the bytes from method's trailer into generated code, or fails if there's none:
------------------- genPrimitiveNBNativeCall | len trailer codeOffset instr | len := (objectMemory lengthOf: methodObj). trailer := (coInterpreter byteAt: methodObj + BaseHeaderSize + len-1 ). (trailer bitAnd: 2r11111100) = 40 " Native code trailer id " ifFalse: [ ^ -1"... fail somehow " ]. "the next two bytes should be an offset for a native code start" codeOffset := (self byteAt: methodObj + BaseHeaderSize + len-4 ) + ((self byteAt: methodObj + BaseHeaderSize + len-5 ) << 8). "entry point address is method oop + header + len - codeOffset" instr := (self cCoerce: (objectMemory firstFixedField: methodObj) to: 'sqInt') + len - codeOffset.
"copy generated code" [ instr < (methodObj + len - 5) ] whileTrue: [ self Fill32: (objectMemory longAt: instr ). instr := instr + 4. ]. ^ 0 ---------------
Like that, the produced JITed method will contain the native code in place of its primitive code. The bytecode of the method is still generated as usual.. because native code might want to fail the prim (and then it should enter the method's body). But as i told before, on failure in native code i'd rather switch back to interpreter and run method's body interpreted , because method can often contain a lot of assembler code (since you providing its implementation in assembler), but jiting that code makes no sense at all, because it is run just once and if jited, will simply waste space.
Initially, the primitive itself (220) was not even implemented at all (so if you execute the method by interpreter, it will simply fail and enter the method's body), but then i added implementation, which also always fails, but reports different error codes, depending if executed method has native code in its trailer or not.
In future, i could make a simple change in #methodShouldBeCogged: to check if that method contains primitive 220 + already have native code in trailer, and so it will flag that method to be cogged, regardless of anything.
But as i said, for #primitivePerform it looks like it doesn't matters whether method is cogged or not, it always executing it interpreted..
I also found i unable to force run jited method from doits, i.e. if i do:
MyClass foo
despite that #foo method is already jited (guaranteed), it always run it interpreted.
But if i do:
(1 to: 10) collect: [:i | ([ MyClass foo ] on: NBNativeCodeError do: [] ) ]
it yielding following result: #(nil 42 42 42 42 42 42 42 42 42)
which shows that it starts using jited version of method only after the outer method is jited (the doit itself).
Another thing which i suspecting of, that since i using 'thisContext sender', to take the method and its arguments, in order to retry the very same message send, this might cause deoptimizations on stack i suppose, which in own turn makes that piece of code impossible to run by JIT.
Since the NB code generation performed once for method , and after installing the method's native code it never enters the method's body (unless native code fails the prim), i don't really care how fast/slow the code generation is, and whether it runs deoptimized or not, what i care is that it should be able to retry the same message-send after it done generating code, so it works seamlessly and users don't need to write any additional code to handle it.
On Fri, Sep 21, 2012 at 1:49 PM, Igor Stasenko siguctua@gmail.com wrote:
On 21 September 2012 19:50, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Igor,
great news!
On Fri, Sep 21, 2012 at 7:59 AM, Igor Stasenko siguctua@gmail.com
wrote:
Hello there,
so, we're entered a new area, where native code, generated from image side can be run directly by JIT. This feature was one of the first things which i wanted to try, once Eliot released Cog :)
The way how we do that, is when VM decides to JIT a specific method, we copying the native code (from method trailer) directly into the method's code. All you need to do is to use special primitive for that 220 ( #primitiveVoltage)
So, a first question, which we wanted to be answered is how faster to run native code by JIT, comparing to running native code via NativeBoost primitive , which is #primitiveNativeCall..
For here are methods, which just answer 42:
This one using #primitiveNativeCall
nbFoo2 <primitive: #primitiveNativeCall module: #NativeBoostPlugin
error: errorCode>
^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EAX; ret. ]
And this one uses JIT:
nbFoo <primitive: 220 error: errorCode>
[ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self
nbFoo ].
^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EDX; ret: 4 asUImm. ]
And this one is code which JIT can do:
nbFoo42 ^ 42
So, here the numbers:
Time to run via #primitiveNativeCall :
[100000000 timesRepeat: [ MyClass nbFoo2 ] ] timeToRun 6995
Time to run via JIT:
[100000000 timesRepeat: [ MyClass nbFoo ] ] timeToRun 897
Time to run JITed method:
[100000000 timesRepeat: [ MyClass nbFoo42 ] ] timeToRun 899
so, as you can see, the JITed method and our custom generated code is on par (which is logical ;).
Time to run an empty loop:
[100000000 timesRepeat: [ ] ] timeToRun 679
So, here the result, if we extract the loop overhead, we can see the difference in calling our native code when it uses JIT vs using #primitiveNativeCall :
(6995 - 679 ) / (897- 679) asFloat 28.972477064220183
28 times faster!!!!
So, with this new feature, we now can make our generated code to run with unmatched speed, without overhead related to #primitiveNativeCall. This is especially useful for implementing primives which involving heavy numeric crunching.
I would release this code to public, but there's one little discrepancy i need to deal with first:
(one little problem, which i hope Eliot can help to solve)
it looks like primitivePerform: never enters the JIT mode, but always executing the method via interpreter.
I'll take a look. This is all very detailed so I'll need a little time.
Heh.. it took me a while (more than a year) before i was able to understand how i can hook in.. sure i did not spent whole year working on that ;) , but anyways , i am not expecting immediate answer from you :)
This is why you see this code: [ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self
nbFoo ].
because if i do it inside of NBNativeCodeGen>>jitMethodAssembly:, which checks for same error and retries the send using perform primitive, it never enters the JIT mode, resulting in endless loop :(
This is despite the fact that method is JITed, because we enforce the JITing of that method during error handling:
lastError = ErrRunningViaInterpreter ifTrue: [ "a method contains native code, but executed by
interpreter "
method forceJIT ifFalse: [ self error: 'Failed to JIT
the compiled
method. Try reducing it''s size ' ]. ^ self retrySend: aContext ].
The #forceJit is the primitive which i implemented like following:
primitiveForceJIT
<export: true > | val result | val := self stackTop. (self isIntegerObject: val) ifTrue: [ ^ self primitiveFail ]. (self isCompiledMethod: val) ifFalse: [ ^ self primitiveFail ]. (self methodHasCogMethod: val) ifFalse: [ cogit cog: val selector: objectMemory nilObject ]. result := (self methodHasCogMethod: val ) ifTrue: [ objectMemory
trueObject ] ifFalse: [ objectMemory falseObject ].
^ self pop: 1 thenPush: result.
As you can see from its usage, if VM, for some reason will fail to jit the method, the primitive will answer false, and we will stop with an error.. Which apparently never happens. Still, a #primitivePerform seems like ignoring that the method contains machine code an always runs it interpreted :(
I do not like the idea, that users will be forced to manually put such loops in every method they will write.. any ideas/suggestions how to overcome that?
Yes. The JIT should be told that methods that have NB code should be
jitted. But right now I don't understand enough of how NB code is generated and methods marked that they have NB code etc to know exactly how to do this. I need to play around a bit.
Let me explain some internal bits, to make it clear: It is not really matters how code is generated.. From VM's side of view it is simple: it takes bytes from Compiled method's trailer, and copies it to JIT method during code generation.
The hook for that is the 220-voltage ;) primitive , which i put it into #initializePrimitiveTableForSqueakV3, like that, when cog jits the method, it calls the 'code generator' for that primitive - #genPrimitiveNBNativeCall, which does nothing but directly copies the bytes from method's trailer into generated code, or fails if there's none:
genPrimitiveNBNativeCall | len trailer codeOffset instr | len := (objectMemory lengthOf: methodObj).
trailer := (coInterpreter byteAt: methodObj + BaseHeaderSize +
len-1 ). (trailer bitAnd: 2r11111100) = 40 " Native code trailer id " ifFalse: [ ^ -1"... fail somehow " ].
"the next two bytes should be an offset for a native code start" codeOffset := (self byteAt: methodObj + BaseHeaderSize + len-4 ) +
((self byteAt: methodObj + BaseHeaderSize + len-5 ) << 8).
"entry point address is method oop + header + len - codeOffset" instr := (self cCoerce: (objectMemory firstFixedField: methodObj)
to: 'sqInt') + len - codeOffset.
"copy generated code" [ instr < (methodObj + len - 5) ] whileTrue: [ self Fill32: (objectMemory longAt: instr ). instr := instr + 4. ]. ^ 0
Like that, the produced JITed method will contain the native code in place of its primitive code. The bytecode of the method is still generated as usual.. because native code might want to fail the prim (and then it should enter the method's body). But as i told before, on failure in native code i'd rather switch back to interpreter and run method's body interpreted , because method can often contain a lot of assembler code (since you providing its implementation in assembler), but jiting that code makes no sense at all, because it is run just once and if jited, will simply waste space.
Initially, the primitive itself (220) was not even implemented at all (so if you execute the method by interpreter, it will simply fail and enter the method's body), but then i added implementation, which also always fails, but reports different error codes, depending if executed method has native code in its trailer or not.
In future, i could make a simple change in #methodShouldBeCogged: to check if that method contains primitive 220 + already have native code in trailer, and so it will flag that method to be cogged, regardless of anything.
This is the right thing to do. methodShouldBeCogged: should force jitting for primitive 220 in an NB VM.
But as i said, for #primitivePerform it looks like it doesn't matters whether method is cogged or not, it always executing it interpreted..
That's a bug. I'll check, but I'm pretty sure it doesn't do that. No it doesn't do that. I've attached the xray primitives with which you can test. CotextPart>xray answers a bit pattern that tells you its state:
xray "Lift the veil from a context and answer an integer describing its interior state. Used for e.g. VM tests so they can verify they're testing what they think they're testing. 0 implies a vanilla heap context. Bit 0 = is or was married to a frame Bit 1 = is still married to a frame Bit 2 = frame is executing machine code Bit 3 = has machine code pc (as opposed to nil or a bytecode pc) Bit 4 = method is currently compiled to machine code" <primitive: 213> ^0 "Can only fail if unimplemented; therefore simply answer 0"
So I defined
Object>foo ^thisContext xray
"(1 to: 4) collect: [:ign| self perform: #foo]" "(1 to: 4) collect: [:ign| self foo. self perform: #foo]"
and both the doits return 31 for all 1 through 4. So foo is always compiled to machine code. Something else must be going on. You see, primitivePerform calls executeNewMethod, and in the CoInterpreter executeNewMethod always compiles to machine code to make doits fast (since executeNewMethod is also used by primitiveExecuteMethod) so performs are eagerly compiled to machine code too.
I also found i unable to force run jited method from doits, i.e. if i do:
MyClass foo
despite that #foo method is already jited (guaranteed), it always run it interpreted.
But if i do:
(1 to: 10) collect: [:i | ([ MyClass foo ] on: NBNativeCodeError do: [] ) ]
it yielding following result: #(nil 42 42 42 42 42 42 42 42 42)
which shows that it starts using jited version of method only after the outer method is jited (the doit itself).
Another thing which i suspecting of, that since i using 'thisContext sender', to take the method and its arguments, in order to retry the very same message send, this might cause deoptimizations on stack i suppose, which in own turn makes that piece of code impossible to run by JIT.
Since the NB code generation performed once for method , and after installing the method's native code it never enters the method's body (unless native code fails the prim), i don't really care how fast/slow the code generation is, and whether it runs deoptimized or not, what i care is that it should be able to retry the same message-send after it done generating code, so it works seamlessly and users don't need to write any additional code to handle it.
-- Best regards, Igor Stasenko.
On 22 September 2012 02:04, Eliot Miranda eliot.miranda@gmail.com wrote:
On Fri, Sep 21, 2012 at 1:49 PM, Igor Stasenko siguctua@gmail.com wrote:
On 21 September 2012 19:50, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Igor,
great news!
On Fri, Sep 21, 2012 at 7:59 AM, Igor Stasenko siguctua@gmail.com wrote:
Hello there,
so, we're entered a new area, where native code, generated from image side can be run directly by JIT. This feature was one of the first things which i wanted to try, once Eliot released Cog :)
The way how we do that, is when VM decides to JIT a specific method, we copying the native code (from method trailer) directly into the method's code. All you need to do is to use special primitive for that 220 ( #primitiveVoltage)
So, a first question, which we wanted to be answered is how faster to run native code by JIT, comparing to running native code via NativeBoost primitive , which is #primitiveNativeCall..
For here are methods, which just answer 42:
This one using #primitiveNativeCall
nbFoo2 <primitive: #primitiveNativeCall module: #NativeBoostPlugin error: errorCode>
^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EAX; ret. ]
And this one uses JIT:
nbFoo <primitive: 220 error: errorCode>
[ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self nbFoo ]. ^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EDX; ret: 4 asUImm. ]
And this one is code which JIT can do:
nbFoo42 ^ 42
So, here the numbers:
Time to run via #primitiveNativeCall :
[100000000 timesRepeat: [ MyClass nbFoo2 ] ] timeToRun 6995
Time to run via JIT:
[100000000 timesRepeat: [ MyClass nbFoo ] ] timeToRun 897
Time to run JITed method:
[100000000 timesRepeat: [ MyClass nbFoo42 ] ] timeToRun 899
so, as you can see, the JITed method and our custom generated code is on par (which is logical ;).
Time to run an empty loop:
[100000000 timesRepeat: [ ] ] timeToRun 679
So, here the result, if we extract the loop overhead, we can see the difference in calling our native code when it uses JIT vs using #primitiveNativeCall :
(6995 - 679 ) / (897- 679) asFloat 28.972477064220183
28 times faster!!!!
So, with this new feature, we now can make our generated code to run with unmatched speed, without overhead related to #primitiveNativeCall. This is especially useful for implementing primives which involving heavy numeric crunching.
I would release this code to public, but there's one little discrepancy i need to deal with first:
(one little problem, which i hope Eliot can help to solve)
it looks like primitivePerform: never enters the JIT mode, but always executing the method via interpreter.
I'll take a look. This is all very detailed so I'll need a little time.
Heh.. it took me a while (more than a year) before i was able to understand how i can hook in.. sure i did not spent whole year working on that ;) , but anyways , i am not expecting immediate answer from you :)
This is why you see this code: [ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self nbFoo ].
because if i do it inside of NBNativeCodeGen>>jitMethodAssembly:, which checks for same error and retries the send using perform primitive, it never enters the JIT mode, resulting in endless loop :(
This is despite the fact that method is JITed, because we enforce the JITing of that method during error handling:
lastError = ErrRunningViaInterpreter ifTrue: [ "a method contains native code, but executed by interpreter " method forceJIT ifFalse: [ self error: 'Failed to JIT the compiled
method. Try reducing it''s size ' ]. ^ self retrySend: aContext ].
The #forceJit is the primitive which i implemented like following:
primitiveForceJIT
<export: true > | val result | val := self stackTop. (self isIntegerObject: val) ifTrue: [ ^ self primitiveFail ]. (self isCompiledMethod: val) ifFalse: [ ^ self primitiveFail ]. (self methodHasCogMethod: val) ifFalse: [ cogit cog: val selector: objectMemory nilObject ]. result := (self methodHasCogMethod: val ) ifTrue: [ objectMemory
trueObject ] ifFalse: [ objectMemory falseObject ].
^ self pop: 1 thenPush: result.
As you can see from its usage, if VM, for some reason will fail to jit the method, the primitive will answer false, and we will stop with an error.. Which apparently never happens. Still, a #primitivePerform seems like ignoring that the method contains machine code an always runs it interpreted :(
I do not like the idea, that users will be forced to manually put such loops in every method they will write.. any ideas/suggestions how to overcome that?
Yes. The JIT should be told that methods that have NB code should be jitted. But right now I don't understand enough of how NB code is generated and methods marked that they have NB code etc to know exactly how to do this. I need to play around a bit.
Let me explain some internal bits, to make it clear: It is not really matters how code is generated.. From VM's side of view it is simple: it takes bytes from Compiled method's trailer, and copies it to JIT method during code generation.
The hook for that is the 220-voltage ;) primitive , which i put it into #initializePrimitiveTableForSqueakV3, like that, when cog jits the method, it calls the 'code generator' for that primitive - #genPrimitiveNBNativeCall, which does nothing but directly copies the bytes from method's trailer into generated code, or fails if there's none:
genPrimitiveNBNativeCall | len trailer codeOffset instr | len := (objectMemory lengthOf: methodObj).
trailer := (coInterpreter byteAt: methodObj + BaseHeaderSize + len-1 ). (trailer bitAnd: 2r11111100) = 40 " Native code trailer id " ifFalse: [ ^ -1"... fail somehow " ]. "the next two bytes should be an offset for a native code start" codeOffset := (self byteAt: methodObj + BaseHeaderSize + len-4 ) +
((self byteAt: methodObj + BaseHeaderSize + len-5 ) << 8).
"entry point address is method oop + header + len - codeOffset" instr := (self cCoerce: (objectMemory firstFixedField: methodObj) to:
'sqInt') + len - codeOffset.
"copy generated code" [ instr < (methodObj + len - 5) ] whileTrue: [ self Fill32: (objectMemory longAt: instr ). instr := instr + 4. ]. ^ 0
Like that, the produced JITed method will contain the native code in place of its primitive code. The bytecode of the method is still generated as usual.. because native code might want to fail the prim (and then it should enter the method's body). But as i told before, on failure in native code i'd rather switch back to interpreter and run method's body interpreted , because method can often contain a lot of assembler code (since you providing its implementation in assembler), but jiting that code makes no sense at all, because it is run just once and if jited, will simply waste space.
Initially, the primitive itself (220) was not even implemented at all (so if you execute the method by interpreter, it will simply fail and enter the method's body), but then i added implementation, which also always fails, but reports different error codes, depending if executed method has native code in its trailer or not.
In future, i could make a simple change in #methodShouldBeCogged: to check if that method contains primitive 220 + already have native code in trailer, and so it will flag that method to be cogged, regardless of anything.
This is the right thing to do. methodShouldBeCogged: should force jitting for primitive 220 in an NB VM.
But as i said, for #primitivePerform it looks like it doesn't matters whether method is cogged or not, it always executing it interpreted..
That's a bug. I'll check, but I'm pretty sure it doesn't do that. No it doesn't do that. I've attached the xray primitives with which you can test. CotextPart>xray answers a bit pattern that tells you its state:
xray "Lift the veil from a context and answer an integer describing its interior state. Used for e.g. VM tests so they can verify they're testing what they think they're testing. 0 implies a vanilla heap context. Bit 0 = is or was married to a frame Bit 1 = is still married to a frame Bit 2 = frame is executing machine code Bit 3 = has machine code pc (as opposed to nil or a bytecode pc) Bit 4 = method is currently compiled to machine code" <primitive: 213> ^0 "Can only fail if unimplemented; therefore simply answer 0"
So I defined
Object>foo ^thisContext xray
"(1 to: 4) collect: [:ign| self perform: #foo]" "(1 to: 4) collect: [:ign| self foo. self perform: #foo]"
and both the doits return 31 for all 1 through 4. So foo is always compiled to machine code. Something else must be going on. You see, primitivePerform calls executeNewMethod, and in the CoInterpreter executeNewMethod always compiles to machine code to make doits fast (since executeNewMethod is also used by primitiveExecuteMethod) so performs are eagerly compiled to machine code too.
Yes, indeed, something fishy there. I tried following:
foo <primitive: 220 error: errorCode> ^ thisContext xray And here is how i install native code into that method externally:
MyClass class>>generateFooCode
| asm | asm := AJx86Assembler new noStackFrame.
asm mov: (99 << 1) + 1 to: asm EDX; ret: 4 asUImm.
NBNativeCodeGen installNativeCode: asm bytes into: (self class>>#foo). (self class>>#foo) forceJIT
-----------
So, before installing code: (1 to: 4) collect: [: i | MyClass foo ] #(11 11 11 11) no matter how many times i run this doit, it always yields 11
But after i run
MyClass generateFooCode, i getting following:
(1 to: 4) collect: [: i | MyClass foo ] #(31 99 99 99)
99 here is indication that my native code is actually run, instead of xray thingy :)
The above result also doesn't changes no matter how many times i repeat, it always produces #(31 99 99 99) .
If i omit #forceJit, however, the first doit answers: (1 to: 4) collect: [: i | MyClass foo ] #(11 31 99 99)
and 11 here is quite correct and expectable, but 31 is of course wrong, there should be no 31 at all!
It looks like that when interpreter executing JITed method, it doing so without taking into account a method's primitive and instead directly jumps over it, to machine code which corresponds to a first bytecode of compiled method. This is not happens when jited method calls another jited method.
On 22 September 2012 08:34, Igor Stasenko siguctua@gmail.com wrote:
On 22 September 2012 02:04, Eliot Miranda eliot.miranda@gmail.com wrote:
On Fri, Sep 21, 2012 at 1:49 PM, Igor Stasenko siguctua@gmail.com wrote:
On 21 September 2012 19:50, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Igor,
great news!
On Fri, Sep 21, 2012 at 7:59 AM, Igor Stasenko siguctua@gmail.com wrote:
Hello there,
so, we're entered a new area, where native code, generated from image side can be run directly by JIT. This feature was one of the first things which i wanted to try, once Eliot released Cog :)
The way how we do that, is when VM decides to JIT a specific method, we copying the native code (from method trailer) directly into the method's code. All you need to do is to use special primitive for that 220 ( #primitiveVoltage)
So, a first question, which we wanted to be answered is how faster to run native code by JIT, comparing to running native code via NativeBoost primitive , which is #primitiveNativeCall..
For here are methods, which just answer 42:
This one using #primitiveNativeCall
nbFoo2 <primitive: #primitiveNativeCall module: #NativeBoostPlugin error: errorCode>
^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EAX; ret. ]
And this one uses JIT:
nbFoo <primitive: 220 error: errorCode>
[ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self nbFoo ]. ^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EDX; ret: 4 asUImm. ]
And this one is code which JIT can do:
nbFoo42 ^ 42
So, here the numbers:
Time to run via #primitiveNativeCall :
[100000000 timesRepeat: [ MyClass nbFoo2 ] ] timeToRun 6995
Time to run via JIT:
[100000000 timesRepeat: [ MyClass nbFoo ] ] timeToRun 897
Time to run JITed method:
[100000000 timesRepeat: [ MyClass nbFoo42 ] ] timeToRun 899
so, as you can see, the JITed method and our custom generated code is on par (which is logical ;).
Time to run an empty loop:
[100000000 timesRepeat: [ ] ] timeToRun 679
So, here the result, if we extract the loop overhead, we can see the difference in calling our native code when it uses JIT vs using #primitiveNativeCall :
(6995 - 679 ) / (897- 679) asFloat 28.972477064220183
28 times faster!!!!
So, with this new feature, we now can make our generated code to run with unmatched speed, without overhead related to #primitiveNativeCall. This is especially useful for implementing primives which involving heavy numeric crunching.
I would release this code to public, but there's one little discrepancy i need to deal with first:
(one little problem, which i hope Eliot can help to solve)
it looks like primitivePerform: never enters the JIT mode, but always executing the method via interpreter.
I'll take a look. This is all very detailed so I'll need a little time.
Heh.. it took me a while (more than a year) before i was able to understand how i can hook in.. sure i did not spent whole year working on that ;) , but anyways , i am not expecting immediate answer from you :)
This is why you see this code: [ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self nbFoo ].
because if i do it inside of NBNativeCodeGen>>jitMethodAssembly:, which checks for same error and retries the send using perform primitive, it never enters the JIT mode, resulting in endless loop :(
This is despite the fact that method is JITed, because we enforce the JITing of that method during error handling:
lastError = ErrRunningViaInterpreter ifTrue: [ "a method contains native code, but executed by interpreter " method forceJIT ifFalse: [ self error: 'Failed to JIT the compiled
method. Try reducing it''s size ' ]. ^ self retrySend: aContext ].
The #forceJit is the primitive which i implemented like following:
primitiveForceJIT
<export: true > | val result | val := self stackTop. (self isIntegerObject: val) ifTrue: [ ^ self primitiveFail ]. (self isCompiledMethod: val) ifFalse: [ ^ self primitiveFail ]. (self methodHasCogMethod: val) ifFalse: [ cogit cog: val selector: objectMemory nilObject ]. result := (self methodHasCogMethod: val ) ifTrue: [ objectMemory
trueObject ] ifFalse: [ objectMemory falseObject ].
^ self pop: 1 thenPush: result.
As you can see from its usage, if VM, for some reason will fail to jit the method, the primitive will answer false, and we will stop with an error.. Which apparently never happens. Still, a #primitivePerform seems like ignoring that the method contains machine code an always runs it interpreted :(
I do not like the idea, that users will be forced to manually put such loops in every method they will write.. any ideas/suggestions how to overcome that?
Yes. The JIT should be told that methods that have NB code should be jitted. But right now I don't understand enough of how NB code is generated and methods marked that they have NB code etc to know exactly how to do this. I need to play around a bit.
Let me explain some internal bits, to make it clear: It is not really matters how code is generated.. From VM's side of view it is simple: it takes bytes from Compiled method's trailer, and copies it to JIT method during code generation.
The hook for that is the 220-voltage ;) primitive , which i put it into #initializePrimitiveTableForSqueakV3, like that, when cog jits the method, it calls the 'code generator' for that primitive - #genPrimitiveNBNativeCall, which does nothing but directly copies the bytes from method's trailer into generated code, or fails if there's none:
genPrimitiveNBNativeCall | len trailer codeOffset instr | len := (objectMemory lengthOf: methodObj).
trailer := (coInterpreter byteAt: methodObj + BaseHeaderSize + len-1 ). (trailer bitAnd: 2r11111100) = 40 " Native code trailer id " ifFalse: [ ^ -1"... fail somehow " ]. "the next two bytes should be an offset for a native code start" codeOffset := (self byteAt: methodObj + BaseHeaderSize + len-4 ) +
((self byteAt: methodObj + BaseHeaderSize + len-5 ) << 8).
"entry point address is method oop + header + len - codeOffset" instr := (self cCoerce: (objectMemory firstFixedField: methodObj) to:
'sqInt') + len - codeOffset.
"copy generated code" [ instr < (methodObj + len - 5) ] whileTrue: [ self Fill32: (objectMemory longAt: instr ). instr := instr + 4. ]. ^ 0
Like that, the produced JITed method will contain the native code in place of its primitive code. The bytecode of the method is still generated as usual.. because native code might want to fail the prim (and then it should enter the method's body). But as i told before, on failure in native code i'd rather switch back to interpreter and run method's body interpreted , because method can often contain a lot of assembler code (since you providing its implementation in assembler), but jiting that code makes no sense at all, because it is run just once and if jited, will simply waste space.
Initially, the primitive itself (220) was not even implemented at all (so if you execute the method by interpreter, it will simply fail and enter the method's body), but then i added implementation, which also always fails, but reports different error codes, depending if executed method has native code in its trailer or not.
In future, i could make a simple change in #methodShouldBeCogged: to check if that method contains primitive 220 + already have native code in trailer, and so it will flag that method to be cogged, regardless of anything.
This is the right thing to do. methodShouldBeCogged: should force jitting for primitive 220 in an NB VM.
But as i said, for #primitivePerform it looks like it doesn't matters whether method is cogged or not, it always executing it interpreted..
That's a bug. I'll check, but I'm pretty sure it doesn't do that. No it doesn't do that. I've attached the xray primitives with which you can test. CotextPart>xray answers a bit pattern that tells you its state:
xray "Lift the veil from a context and answer an integer describing its interior state. Used for e.g. VM tests so they can verify they're testing what they think they're testing. 0 implies a vanilla heap context. Bit 0 = is or was married to a frame Bit 1 = is still married to a frame Bit 2 = frame is executing machine code Bit 3 = has machine code pc (as opposed to nil or a bytecode pc) Bit 4 = method is currently compiled to machine code" <primitive: 213> ^0 "Can only fail if unimplemented; therefore simply answer 0"
So I defined
Object>foo ^thisContext xray
"(1 to: 4) collect: [:ign| self perform: #foo]" "(1 to: 4) collect: [:ign| self foo. self perform: #foo]"
and both the doits return 31 for all 1 through 4. So foo is always compiled to machine code. Something else must be going on. You see, primitivePerform calls executeNewMethod, and in the CoInterpreter executeNewMethod always compiles to machine code to make doits fast (since executeNewMethod is also used by primitiveExecuteMethod) so performs are eagerly compiled to machine code too.
Yes, indeed, something fishy there. I tried following:
foo <primitive: 220 error: errorCode>
^ thisContext xray
And here is how i install native code into that method externally:
MyClass class>>generateFooCode
| asm | asm := AJx86Assembler new noStackFrame. asm mov: (99 << 1) + 1 to: asm EDX; ret: 4 asUImm. NBNativeCodeGen installNativeCode: asm bytes into: (self class>>#foo). (self class>>#foo) forceJIT
So, before installing code: (1 to: 4) collect: [: i | MyClass foo ] #(11 11 11 11) no matter how many times i run this doit, it always yields 11
But after i run
MyClass generateFooCode, i getting following:
(1 to: 4) collect: [: i | MyClass foo ] #(31 99 99 99)
99 here is indication that my native code is actually run, instead of xray thingy :)
The above result also doesn't changes no matter how many times i repeat, it always produces #(31 99 99 99) .
If i omit #forceJit, however, the first doit answers: (1 to: 4) collect: [: i | MyClass foo ] #(11 31 99 99)
and 11 here is quite correct and expectable, but 31 is of course wrong, there should be no 31 at all!
It looks like that when interpreter executing JITed method, it doing so without taking into account a method's primitive and instead directly jumps over it, to machine code which corresponds to a first bytecode of compiled method. This is not happens when jited method calls another jited method.
Okay, i know what happens!
Changing the method to:
foo <primitive: 220 error: errorCode> ^ {errorCode. thisContext xray } doing:
MyClass generateFooCode
and then:
(1 to: 4) collect: [: i | MyClass foo ]
gives me:
#(#(505 11) #(505 31) 99 99)
The 505 error code answered by primitive , means that: - method is JITed, but since primitive is run, it means that VM decided to execute this method via interpreter.
So, according to that, 505-11 pair is fine: method is not yet JITed, and sure thing it runs using interpreter.
But 505-31 means that interpreter runs the primitive first, and then since it fails, ( primitive 220 always fails) it goes to activate that method and only then decides to run its machine code, but using entry point which corresponds to a first bytecode of method.
Is there a reason for doing that, instead jumping directly to machine code (which will invoke own primitive and do the rest)?
On 22 September 2012 08:52, Igor Stasenko siguctua@gmail.com wrote:
On 22 September 2012 08:34, Igor Stasenko siguctua@gmail.com wrote:
On 22 September 2012 02:04, Eliot Miranda eliot.miranda@gmail.com wrote:
On Fri, Sep 21, 2012 at 1:49 PM, Igor Stasenko siguctua@gmail.com wrote:
On 21 September 2012 19:50, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Igor,
great news!
On Fri, Sep 21, 2012 at 7:59 AM, Igor Stasenko siguctua@gmail.com wrote:
Hello there,
so, we're entered a new area, where native code, generated from image side can be run directly by JIT. This feature was one of the first things which i wanted to try, once Eliot released Cog :)
The way how we do that, is when VM decides to JIT a specific method, we copying the native code (from method trailer) directly into the method's code. All you need to do is to use special primitive for that 220 ( #primitiveVoltage)
So, a first question, which we wanted to be answered is how faster to run native code by JIT, comparing to running native code via NativeBoost primitive , which is #primitiveNativeCall..
For here are methods, which just answer 42:
This one using #primitiveNativeCall
nbFoo2 <primitive: #primitiveNativeCall module: #NativeBoostPlugin error: errorCode>
^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EAX; ret. ]
And this one uses JIT:
nbFoo <primitive: 220 error: errorCode>
[ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self nbFoo ]. ^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm | asm noStackFrame. asm mov: (42 << 1) + 1 to: asm EDX; ret: 4 asUImm. ]
And this one is code which JIT can do:
nbFoo42 ^ 42
So, here the numbers:
Time to run via #primitiveNativeCall :
[100000000 timesRepeat: [ MyClass nbFoo2 ] ] timeToRun 6995
Time to run via JIT:
[100000000 timesRepeat: [ MyClass nbFoo ] ] timeToRun 897
Time to run JITed method:
[100000000 timesRepeat: [ MyClass nbFoo42 ] ] timeToRun 899
so, as you can see, the JITed method and our custom generated code is on par (which is logical ;).
Time to run an empty loop:
[100000000 timesRepeat: [ ] ] timeToRun 679
So, here the result, if we extract the loop overhead, we can see the difference in calling our native code when it uses JIT vs using #primitiveNativeCall :
(6995 - 679 ) / (897- 679) asFloat 28.972477064220183
28 times faster!!!!
So, with this new feature, we now can make our generated code to run with unmatched speed, without overhead related to #primitiveNativeCall. This is especially useful for implementing primives which involving heavy numeric crunching.
I would release this code to public, but there's one little discrepancy i need to deal with first:
(one little problem, which i hope Eliot can help to solve)
it looks like primitivePerform: never enters the JIT mode, but always executing the method via interpreter.
I'll take a look. This is all very detailed so I'll need a little time.
Heh.. it took me a while (more than a year) before i was able to understand how i can hook in.. sure i did not spent whole year working on that ;) , but anyways , i am not expecting immediate answer from you :)
This is why you see this code: [ errorCode = ErrRunningViaInterpreter ] whileTrue: [ ^ self nbFoo ].
because if i do it inside of NBNativeCodeGen>>jitMethodAssembly:, which checks for same error and retries the send using perform primitive, it never enters the JIT mode, resulting in endless loop :(
This is despite the fact that method is JITed, because we enforce the JITing of that method during error handling:
lastError = ErrRunningViaInterpreter ifTrue: [ "a method contains native code, but executed by interpreter " method forceJIT ifFalse: [ self error: 'Failed to JIT the compiled
method. Try reducing it''s size ' ]. ^ self retrySend: aContext ].
The #forceJit is the primitive which i implemented like following:
primitiveForceJIT
<export: true > | val result | val := self stackTop. (self isIntegerObject: val) ifTrue: [ ^ self primitiveFail ]. (self isCompiledMethod: val) ifFalse: [ ^ self primitiveFail ]. (self methodHasCogMethod: val) ifFalse: [ cogit cog: val selector: objectMemory nilObject ]. result := (self methodHasCogMethod: val ) ifTrue: [ objectMemory
trueObject ] ifFalse: [ objectMemory falseObject ].
^ self pop: 1 thenPush: result.
As you can see from its usage, if VM, for some reason will fail to jit the method, the primitive will answer false, and we will stop with an error.. Which apparently never happens. Still, a #primitivePerform seems like ignoring that the method contains machine code an always runs it interpreted :(
I do not like the idea, that users will be forced to manually put such loops in every method they will write.. any ideas/suggestions how to overcome that?
Yes. The JIT should be told that methods that have NB code should be jitted. But right now I don't understand enough of how NB code is generated and methods marked that they have NB code etc to know exactly how to do this. I need to play around a bit.
Let me explain some internal bits, to make it clear: It is not really matters how code is generated.. From VM's side of view it is simple: it takes bytes from Compiled method's trailer, and copies it to JIT method during code generation.
The hook for that is the 220-voltage ;) primitive , which i put it into #initializePrimitiveTableForSqueakV3, like that, when cog jits the method, it calls the 'code generator' for that primitive - #genPrimitiveNBNativeCall, which does nothing but directly copies the bytes from method's trailer into generated code, or fails if there's none:
genPrimitiveNBNativeCall | len trailer codeOffset instr | len := (objectMemory lengthOf: methodObj).
trailer := (coInterpreter byteAt: methodObj + BaseHeaderSize + len-1 ). (trailer bitAnd: 2r11111100) = 40 " Native code trailer id " ifFalse: [ ^ -1"... fail somehow " ]. "the next two bytes should be an offset for a native code start" codeOffset := (self byteAt: methodObj + BaseHeaderSize + len-4 ) +
((self byteAt: methodObj + BaseHeaderSize + len-5 ) << 8).
"entry point address is method oop + header + len - codeOffset" instr := (self cCoerce: (objectMemory firstFixedField: methodObj) to:
'sqInt') + len - codeOffset.
"copy generated code" [ instr < (methodObj + len - 5) ] whileTrue: [ self Fill32: (objectMemory longAt: instr ). instr := instr + 4. ]. ^ 0
Like that, the produced JITed method will contain the native code in place of its primitive code. The bytecode of the method is still generated as usual.. because native code might want to fail the prim (and then it should enter the method's body). But as i told before, on failure in native code i'd rather switch back to interpreter and run method's body interpreted , because method can often contain a lot of assembler code (since you providing its implementation in assembler), but jiting that code makes no sense at all, because it is run just once and if jited, will simply waste space.
Initially, the primitive itself (220) was not even implemented at all (so if you execute the method by interpreter, it will simply fail and enter the method's body), but then i added implementation, which also always fails, but reports different error codes, depending if executed method has native code in its trailer or not.
In future, i could make a simple change in #methodShouldBeCogged: to check if that method contains primitive 220 + already have native code in trailer, and so it will flag that method to be cogged, regardless of anything.
This is the right thing to do. methodShouldBeCogged: should force jitting for primitive 220 in an NB VM.
But as i said, for #primitivePerform it looks like it doesn't matters whether method is cogged or not, it always executing it interpreted..
That's a bug. I'll check, but I'm pretty sure it doesn't do that. No it doesn't do that. I've attached the xray primitives with which you can test. CotextPart>xray answers a bit pattern that tells you its state:
xray "Lift the veil from a context and answer an integer describing its interior state. Used for e.g. VM tests so they can verify they're testing what they think they're testing. 0 implies a vanilla heap context. Bit 0 = is or was married to a frame Bit 1 = is still married to a frame Bit 2 = frame is executing machine code Bit 3 = has machine code pc (as opposed to nil or a bytecode pc) Bit 4 = method is currently compiled to machine code" <primitive: 213> ^0 "Can only fail if unimplemented; therefore simply answer 0"
So I defined
Object>foo ^thisContext xray
"(1 to: 4) collect: [:ign| self perform: #foo]" "(1 to: 4) collect: [:ign| self foo. self perform: #foo]"
and both the doits return 31 for all 1 through 4. So foo is always compiled to machine code. Something else must be going on. You see, primitivePerform calls executeNewMethod, and in the CoInterpreter executeNewMethod always compiles to machine code to make doits fast (since executeNewMethod is also used by primitiveExecuteMethod) so performs are eagerly compiled to machine code too.
Yes, indeed, something fishy there. I tried following:
foo <primitive: 220 error: errorCode>
^ thisContext xray
And here is how i install native code into that method externally:
MyClass class>>generateFooCode
| asm | asm := AJx86Assembler new noStackFrame. asm mov: (99 << 1) + 1 to: asm EDX; ret: 4 asUImm. NBNativeCodeGen installNativeCode: asm bytes into: (self class>>#foo). (self class>>#foo) forceJIT
So, before installing code: (1 to: 4) collect: [: i | MyClass foo ] #(11 11 11 11) no matter how many times i run this doit, it always yields 11
But after i run
MyClass generateFooCode, i getting following:
(1 to: 4) collect: [: i | MyClass foo ] #(31 99 99 99)
99 here is indication that my native code is actually run, instead of xray thingy :)
The above result also doesn't changes no matter how many times i repeat, it always produces #(31 99 99 99) .
If i omit #forceJit, however, the first doit answers: (1 to: 4) collect: [: i | MyClass foo ] #(11 31 99 99)
and 11 here is quite correct and expectable, but 31 is of course wrong, there should be no 31 at all!
It looks like that when interpreter executing JITed method, it doing so without taking into account a method's primitive and instead directly jumps over it, to machine code which corresponds to a first bytecode of compiled method. This is not happens when jited method calls another jited method.
Okay, i know what happens!
Changing the method to:
foo <primitive: 220 error: errorCode>
^ {errorCode. thisContext xray }
doing:
MyClass generateFooCode
and then:
(1 to: 4) collect: [: i | MyClass foo ]
gives me:
#(#(505 11) #(505 31) 99 99)
The 505 error code answered by primitive , means that:
- method is JITed, but since primitive is run, it means that VM
decided to execute this method via interpreter.
sorry, small correction: the method is not necessary JITed. It just checks that method contains native code in trailer , and so, ready to be jited. if method does not contains native code in trailer, it reports different error code.
So, according to that, 505-11 pair is fine: method is not yet JITed, and sure thing it runs using interpreter.
But 505-31 means that interpreter runs the primitive first, and then since it fails, ( primitive 220 always fails) it goes to activate that method and only then decides to run its machine code, but using entry point which corresponds to a first bytecode of method.
Is there a reason for doing that, instead jumping directly to machine code (which will invoke own primitive and do the rest)?
-- Best regards, Igor Stasenko.
I checked the code, and indeed, #executeNewMethod first, eagerly checks for primitive and then runs it if it there, otherwise it goes to activate new method, and only at that point it decides to run jited version of it.
So, mainly the question, can #executeNewMethod be changed to use
executeCoggedNewMethod: (a something to implement), instead of activateCoggedNewMethod:
or there are reasons for not doing that?
Hello there.. So, with little change (see attached) , i made it work.
I simply put the check if method is cogged in #executeNewMethod _before_ call to potential primitive, and if so, it simply jumps to method's entry point. So, then method can run primitive/activate itself by own.
If you followed me so far, with this change , the method:
foo <primitive: 220 error: errorCode> ^ {errorCode. thisContext xray }
--- if it doesn't contains native code in trailer, the:
(1 to: 4) collect: [: i | MyClass foo ]
always gives me:
#(#(502 11) #(502 11) #(502 11) #(502 11))
where 502 is NB-specific error ErrNoNativeCodeInMethod "failed to find a native code for primitive method", and 11 means that method runs interpreted, again correct because when cog tries to jit that method, it will always fail because prim 220 generator fails code generation if there's no native code in method's trailer , and so VM forced to run this method interpreted.
Now, if i do this:
| asm | asm := AJx86Assembler new noStackFrame.
asm mov: (99 << 1) + 1 to: asm EDX; ret: 4 asUImm.
NBNativeCodeGen installNativeCode: asm bytes into: (MyClass class>>#foo). " (MyClass class>>#foo) forceJIT "
(1 to: 4) collect: [: i | MyClass foo ] gives me:
#(#(505 11) 99 99 99)
and finally, if i uncomment #forceJIT send, it gives me:
#(99 99 99 99)
and no #(505 31) combination anymore! :)
And, even more. Now if i simply doit:
MyClass foo
it also answers 99!
Eliot, do you think it worth putting this change into common codebase (because right now i keep it in NBCoInterpreter subclass)?
I do not expect a big changes in performance, and i cannot even tell will it make things slower/faster, but you might know better.
vm-dev@lists.squeakfoundation.org