[Vm-dev] [NB] NativeBoost meets JIT

Igor Stasenko siguctua at gmail.com
Sat Sep 22 06:52:15 UTC 2012


On 22 September 2012 08:34, Igor Stasenko <siguctua at gmail.com> wrote:
> On 22 September 2012 02:04, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>
>>
>>
>> On Fri, Sep 21, 2012 at 1:49 PM, Igor Stasenko <siguctua at gmail.com> wrote:
>>>
>>>
>>> On 21 September 2012 19:50, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>> >
>>> > Hi Igor,
>>> >
>>> >     great news!
>>> >
>>> > On Fri, Sep 21, 2012 at 7:59 AM, Igor Stasenko <siguctua at gmail.com> wrote:
>>> >>
>>> >>
>>> >> Hello there,
>>> >>
>>> >> so, we're entered a new area, where native code, generated from image
>>> >> side can be run directly by JIT.
>>> >> This feature was one of the first things which i wanted to try, once
>>> >> Eliot released Cog :)
>>> >>
>>> >> The way how we do that, is when VM decides to JIT a specific method,
>>> >> we copying the native code (from method trailer)
>>> >> directly into the method's code.
>>> >> All you need to do is to use special primitive for that 220 (
>>> >> #primitiveVoltage)
>>> >>
>>> >> So, a first question, which we wanted to be answered is how faster to
>>> >> run native code by JIT,
>>> >> comparing to running native code via NativeBoost primitive , which is
>>> >> #primitiveNativeCall..
>>> >>
>>> >> For here are methods, which just answer 42:
>>> >>
>>> >> This one using #primitiveNativeCall
>>> >>
>>> >> nbFoo2
>>> >>         <primitive: #primitiveNativeCall module: #NativeBoostPlugin error: errorCode>
>>> >>
>>> >>         ^ NBNativeCodeGen methodAssembly: [:gen :proxy :asm |
>>> >>                 asm noStackFrame.
>>> >>                 asm
>>> >>                         mov: (42 << 1) + 1 to: asm EAX;
>>> >>                         ret.
>>> >>         ]
>>> >>
>>> >> And this one uses JIT:
>>> >>
>>> >> nbFoo
>>> >>         <primitive: 220 error: errorCode>
>>> >>
>>> >>         [ errorCode = ErrRunningViaInterpreter  ] whileTrue: [ ^ self nbFoo ].
>>> >>
>>> >>         ^ NBNativeCodeGen jitMethodAssembly: [:gen :proxy :asm |
>>> >>                 asm noStackFrame.
>>> >>                 asm
>>> >>                         mov: (42 << 1) + 1 to: asm EDX;
>>> >>                         ret: 4 asUImm.
>>> >>         ]
>>> >>
>>> >> And this one is code which JIT can do:
>>> >>
>>> >> nbFoo42
>>> >>         ^ 42
>>> >>
>>> >> So, here the numbers:
>>> >>
>>> >> Time to run via #primitiveNativeCall :
>>> >>
>>> >> [100000000 timesRepeat: [ MyClass nbFoo2  ] ] timeToRun
>>> >>  6995
>>> >>
>>> >> Time to run via JIT:
>>> >>
>>> >> [100000000 timesRepeat: [ MyClass nbFoo  ] ] timeToRun
>>> >> 897
>>> >>
>>> >> Time to run JITed method:
>>> >>
>>> >> [100000000 timesRepeat: [ MyClass nbFoo42  ] ] timeToRun
>>> >> 899
>>> >>
>>> >> so, as you can see, the JITed method and our custom generated code is
>>> >> on par (which is logical ;).
>>> >>
>>> >> Time to run an empty loop:
>>> >>
>>> >> [100000000 timesRepeat: [  ] ] timeToRun 679
>>> >>
>>> >>
>>> >> So, here the result, if we extract the loop overhead, we can see the
>>> >> difference in
>>> >> calling our native code when it uses JIT vs using #primitiveNativeCall :
>>> >>
>>> >> (6995 - 679 ) / (897- 679) asFloat 28.972477064220183
>>> >>
>>> >> 28 times faster!!!!
>>> >>
>>> >> So, with this new feature, we now can make our generated code to run
>>> >> with unmatched speed,
>>> >> without overhead related to #primitiveNativeCall.
>>> >> This is especially useful for implementing primives which involving
>>> >> heavy numeric crunching.
>>> >>
>>> >> I would release this code to public, but there's one little
>>> >> discrepancy i need to deal with first:
>>> >>
>>> >> (one little problem, which i hope Eliot can help to solve)
>>> >>
>>> >>  it looks like primitivePerform: never enters the JIT mode, but always
>>> >> executing the method via interpreter.
>>> >
>>> >
>>> > I'll take a look.  This is all very detailed so I'll need a little time.
>>> >
>>>
>>> Heh.. it took me a while (more than a year) before i was able to
>>> understand how i can hook in.. sure i did not spent whole year working
>>> on that ;) , but anyways ,
>>> i am not expecting immediate answer from you :)
>>>
>>> >> This is why you see this code:
>>> >>         [ errorCode = ErrRunningViaInterpreter  ] whileTrue: [ ^ self nbFoo ].
>>> >>
>>> >> because if i do it inside of NBNativeCodeGen>>jitMethodAssembly:,
>>> >> which checks for same error and retries the send using perform
>>> >> primitive, it never enters the JIT mode,
>>> >> resulting in endless loop :(
>>> >>
>>> >> This is despite the fact that method is JITed, because we enforce the
>>> >> JITing of that method during error handling:
>>> >>
>>> >>         lastError = ErrRunningViaInterpreter ifTrue: [
>>> >>                 "a method contains native code, but executed by interpreter "
>>> >>                 method forceJIT ifFalse: [ self error: 'Failed to JIT the compiled
>>> >> method. Try reducing it''s size ' ].
>>> >>                 ^ self retrySend: aContext
>>> >>                 ].
>>> >>
>>> >> The #forceJit is the primitive which i implemented like following:
>>> >>
>>> >> primitiveForceJIT
>>> >>
>>> >>         <export: true >
>>> >>
>>> >>         | val result |
>>> >>
>>> >>         val := self stackTop.
>>> >>
>>> >>         (self isIntegerObject: val) ifTrue: [ ^ self primitiveFail ].
>>> >>         (self isCompiledMethod: val) ifFalse: [ ^ self primitiveFail ].
>>> >>
>>> >>         (self methodHasCogMethod: val) ifFalse: [
>>> >>                 cogit cog: val selector: objectMemory nilObject ].
>>> >>
>>> >>         result := (self methodHasCogMethod: val ) ifTrue: [ objectMemory
>>> >> trueObject ] ifFalse: [ objectMemory falseObject ].
>>> >>
>>> >>         ^ self pop: 1 thenPush: result.
>>> >>
>>> >> As you can see from its usage, if VM, for some reason will fail to jit
>>> >> the method, the primitive will answer false,
>>> >> and we will stop with an error.. Which apparently never happens.
>>> >> Still, a #primitivePerform seems like ignoring that the method
>>> >> contains machine code an always runs it interpreted :(
>>> >>
>>> >> I do not like the idea, that users will be forced to manually put such
>>> >> loops in every method they will write..
>>> >> any ideas/suggestions how to overcome that?
>>> >
>>> >
>>> > Yes.  The JIT should be told that methods that have NB code should be jitted.  But right now I don't understand enough of how NB code is generated and methods marked that they have NB code etc to know exactly how to do this.  I need to play around a bit.
>>> >
>>>
>>> Let me explain some internal bits, to make it clear:
>>> It is not really matters how code is generated.. From VM's side of
>>> view it is simple:
>>> it takes bytes from Compiled method's trailer, and copies it to JIT
>>> method during code generation.
>>>
>>> The hook for that is the 220-voltage ;) primitive , which i put it
>>> into #initializePrimitiveTableForSqueakV3,
>>> like that, when cog jits the method, it calls the 'code generator' for
>>> that primitive - #genPrimitiveNBNativeCall,
>>> which does nothing but directly copies the bytes from method's trailer
>>> into generated code,
>>> or fails if there's none:
>>>
>>> -------------------
>>> genPrimitiveNBNativeCall
>>>         | len trailer codeOffset instr |
>>>         len := (objectMemory lengthOf: methodObj).
>>>
>>>         trailer := (coInterpreter byteAt: methodObj + BaseHeaderSize + len-1 ).
>>>         (trailer bitAnd: 2r11111100) = 40 " Native code trailer id "
>>>                 ifFalse: [ ^ -1"... fail somehow " ].
>>>
>>>         "the next two bytes should be an offset for a native code start"
>>>         codeOffset := (self byteAt: methodObj + BaseHeaderSize + len-4 ) +
>>> ((self byteAt: methodObj + BaseHeaderSize + len-5 ) << 8).
>>>
>>>         "entry point address is method oop + header + len - codeOffset"
>>>
>>>         instr := (self cCoerce: (objectMemory firstFixedField: methodObj) to:
>>> 'sqInt') + len - codeOffset.
>>>
>>>         "copy generated code"
>>>         [ instr < (methodObj + len - 5) ] whileTrue: [
>>>                 self Fill32: (objectMemory longAt: instr ).
>>>                 instr := instr + 4.
>>>         ].
>>>
>>>         ^ 0
>>> ---------------
>>>
>>> Like that, the produced JITed method will contain the native code in
>>> place of its primitive code.
>>> The bytecode of the method is still generated as usual..
>>> because native code might want to fail the prim (and then it should
>>> enter the method's body).
>>> But as i told before, on failure in native code i'd rather switch back
>>> to interpreter and run method's body interpreted , because method can
>>> often contain a lot of assembler code (since you providing its
>>> implementation in assembler), but jiting that code makes no sense at
>>> all,
>>> because it is run just once and if jited, will simply waste space.
>>>
>>> Initially, the primitive itself (220) was not even implemented at all
>>> (so if you execute the method by interpreter, it will simply fail and
>>> enter the method's body), but then i added implementation,
>>>  which also always fails, but reports different error codes, depending
>>> if executed method has native code in its trailer or not.
>>>
>>> In future, i could make a simple change in #methodShouldBeCogged: to
>>> check if that method
>>> contains primitive 220 + already have native code in trailer, and so
>>> it will flag that method to be cogged,
>>> regardless of anything.
>>
>>
>> This is the right thing to do.  methodShouldBeCogged: should force jitting for primitive 220 in an NB VM.
>>
>>>
>>>
>>> But as i said, for #primitivePerform it looks like it doesn't matters
>>> whether method is cogged or not,
>>> it always executing it interpreted..
>>
>>
>> That's a bug.  I'll check, but I'm pretty sure it doesn't do that.  No it doesn't do that.  I've attached the xray primitives with which you can test.  CotextPart>xray answers a bit pattern that tells you its state:
>>
>> xray
>> "Lift the veil from a context and answer an integer describing its interior state.
>> Used for e.g. VM tests so they can verify they're testing what they think they're testing.
>> 0 implies a vanilla heap context.
>> Bit 0 = is or was married to a frame
>> Bit 1 = is still married to a frame
>> Bit 2 = frame is executing machine code
>> Bit 3 = has machine code pc (as opposed to nil or a bytecode pc)
>> Bit 4 = method is currently compiled to machine code"
>> <primitive: 213>
>> ^0 "Can only fail if unimplemented; therefore simply answer 0"
>>
>> So I defined
>>
>> Object>foo
>> ^thisContext xray
>>
>>
>> "(1 to: 4) collect: [:ign| self perform: #foo]"
>> "(1 to: 4) collect: [:ign| self foo. self perform: #foo]"
>>
>> and both the doits return 31 for all 1 through 4.  So foo is always compiled to machine code.  Something else must be going on.  You see, primitivePerform calls executeNewMethod, and in the CoInterpreter executeNewMethod always compiles to machine code to make doits fast (since executeNewMethod is also used by primitiveExecuteMethod) so performs are eagerly compiled to machine code too.
>>
>
> Yes, indeed, something fishy there.
> I tried following:
>
> foo
>         <primitive: 220 error: errorCode>
>
>         ^ thisContext xray
>
> And here is how i install native code into that method externally:
>
> MyClass class>>generateFooCode
>
>         | asm |
>         asm := AJx86Assembler new noStackFrame.
>
>         asm
>                 mov: (99 << 1) + 1 to: asm EDX;
>                 ret: 4 asUImm.
>
>         NBNativeCodeGen installNativeCode:  asm bytes into: (self class>>#foo).
>
>         (self class>>#foo) forceJIT
>
> -----------
>
> So, before installing code:
> (1 to: 4) collect: [: i | MyClass foo ] #(11 11 11 11)
> no matter how many times i run this doit, it always yields 11
>
> But after i run
>
> MyClass generateFooCode,
> i getting following:
>
> (1 to: 4) collect: [: i | MyClass foo ] #(31 99 99 99)
>
> 99 here is indication that my native code is actually run, instead of
> xray thingy :)
>
> The above  result also doesn't changes no matter how many times i repeat,
> it always produces #(31 99 99 99) .
>
> If i omit #forceJit, however, the first doit answers:
> (1 to: 4) collect: [: i | MyClass foo ] #(11 31 99 99)
>
> and 11 here is quite correct and expectable, but 31 is of course
> wrong, there should be no 31 at all!
>
> It looks like that when interpreter executing JITed method, it doing
> so without taking into account
> a method's primitive and instead directly jumps over it, to machine
> code which corresponds to a
> first bytecode of compiled method.
> This is not happens when jited method calls another jited method.
>
Okay, i know what happens!

Changing the method to:

foo
	<primitive: 220 error: errorCode>
	
	^ {errorCode. thisContext xray }
	
doing:

MyClass  generateFooCode

and then:

(1 to: 4) collect: [: i | MyClass foo ]

gives me:

#(#(505 11) #(505 31) 99 99)

The 505 error code answered by primitive , means that:
 - method is JITed, but since primitive is run, it means that VM
decided to execute this method via interpreter.

So, according to that,
505-11 pair is fine: method is not yet JITed, and sure thing it runs
using interpreter.

But 505-31 means that interpreter runs the primitive first,
and then since it fails, ( primitive 220 always fails)
it goes to activate that method and only then decides to run its machine code,
but using entry point which corresponds to a first bytecode of method.

Is there a reason for doing that, instead jumping directly to machine
code (which will invoke own primitive
and do the rest)?

-- 
Best regards,
Igor Stasenko.


More information about the Vm-dev mailing list