[Vm-dev] Exploring the simulator (was Re: REPL image for simulation)

Sun Jun 12 18:16:05 UTC 2016

On Sun, Jun 12, 2016 at 7:36 PM, Ben Coman <btc at openinworld.com> wrote:

>
> On Sun, Jun 12, 2016 at 10:59 PM, Clément Bera <bera.clement at gmail.com>
> wrote:
> >
> > Hi again,
> >
> > On Sun, Jun 12, 2016 at 10:44 AM, Clément Bera <bera.clement at gmail.com>
> wrote:
> >>
> >> Hi Ben,
> >>
> >> I'm glad you're now looking into the JIT. If you have some blog or
> something, please write an experience report about you looking into the
> simulator. It's helpful for us to have noise around the VM.
>
> Cool. I'll have a go.
>
> >>
> >> On Sun, Jun 12, 2016 at 8:35 AM, Ben Coman <btc at openinworld.com> wrote:
> >>>
> >>>
> >>> I am stepping for the first time through the CogVM, having [set break
> >>> selector...] forkAt:
> >>> After stepping in a few times I get to #activateCoggedNewMethod.
> >>>   CogVMSimulatorLSB(CoInterpreter)>>dispatchOn:in:
> >>>   CogVMSimulatorLSB(CoInterpreter)>>sendLiteralSelector1ArgBytecode
> >>>   CogVMSimulatorLSB(CoInterpreter)>>commonSendOrdinary
> >>>   CogVMSimulatorLSB(CoInterpreter)>>insternalExecuteNewMethod
> >>>   CogVMSimulatorLSB(CoInterpreter)>>activateCoggedNewMethod
> >>>
> >>> Here from the code at the top.
> >>>     methodHeader := self rawHeaderOf: newMethod.
> >>>     self assert: (self isCogMethodReference: methodHeader).
> >>>     cogMethod := self cCoerceSimple: methodHeader to: #'CogMethod *'.
> >>>     methodHeader := cogMethod methodHeader.
> >>>
> >>> I guess methodHeader's double assignment above is related to the
> >>> machine code frame having two addresses as Clement described...
> >>
> >>
> >> Errr... I don't really fancy the way you say it but I think yes that's
> it.
> >>
> >> A method can have 2 addresses, the address of the bytecoded version in
> the heap and the address of its jitted version in the machine code zone. In
> the machine code frame printing, the simulator displays the 2 addresses.
> But the frame has a single pointer to the method.
> >>
> >> So what you're looking at is the dispatch logic from the bytecoded
> method to the jitted method. When the JIT compiles a bytecoded method to
> machine code, it replaces the bytecoded method compiled method header
> (first literal) by a pointer to the jitted version. The machine code
> version of the method keeps the  compiled method header, so accessing it is
> different in methods compiled to machine code and methods not compiled to
> machine code.
> >>
> >> #rawHeaderOf: answers the first literal of the bytecoded method which
> is a pointer to the jitted version of the method if the method has a jitted
> version, else is the compiled method header. In the code you show, the VM
> ensures the method has a jitted version with the assertion, hence the
> compiled method header is fetched from the jitted version.
>
> I think I've got it. So upon JITing, CompiledMethod and its literals
> and bytecodes don't move.
> Only its bytecodeHeader is manipulated and re-purposed.
>
> Before JIT...
> compiledMethod := { bytecodeHeader, literals, bytecodes }.
> byteCodeHeader := compiledMethod at: 1
>
> After JIT something like...
> cogMethod := { cogMethodHeader, bytecodeHeader, machineCode }
> compiledMethod := { pointerTo_cogMethod, literals, bytecodes }.
> rawHeader := compiledMethod at: 1
> cogMethodHeader := dereferenced(rawHeader) at: 1.
>
>
> I guess we could say that.

A compiled method is more like, before JIT:
object's header (64 bits)
compiled method header (1 word)
literals (several words)
bytecodes (several bytes)
compiled method trailer (several bytes)

After jitting, the compiled method header is replaced by a pointer to the
cog method.

The cog method has (assuming it has no blocks):
- header
- native instructions
- map

and the cog method header includes the compiled method header.

> >>
> >>>
> >>> >> On Mon, May 30, 2016 at 4:12 PM, Clément Bera <
> bera.clement at gmail.com> wrote:
> >>> >>> Now that you've print the frame, you can see the method addresses
> in this line:
> >>> >>> 16r103144:      method:    16r51578  16r102BDD0 16r102BDD0: a(n)
> CompiledMethod.
> >>> >>> This is a machine code frame, so the method has two addresses:
> >>> >>> 16r51578 => in generated method, so you need to use
> [disassembleMethod/trampoline...] and write down the hex to see the
> disassembly.
> >>> >>> 16r102BDD0 => in the heap. This is the bytecode version of the
> method. You can print it using [print oop...]
> >>>
> >>> This time...
> >>> [print ext head frame] ==>
> >>>   16r101214 M BlockClosure>forkAt: 16r2FC420: a(n) BlockClosure
> >>>   16r101210: method:     16rBBF0  16rC4E948 16rC4E948: a(n)
> CompiledMethod
> >>>
> >>> self rawHeaderOf: newMethod ==> 16rBBF0
> >>> So the "raw header" is the cogged method.
> >>>
> >>> Looking at the output below, the space ship operator <-> seems to link
> >>> between cogged method headers like a call stack, except   #forkAt:
> >>> calls  #newProcess  which calls  #asContext
> >>>
> >>> [print cog method for...] 16rBBF0 ==>
> >>>   16rBBF0 <-> 16rBC80: method: 16rC4E948 selector: 16r6CC798 forkAt:
> >>>
> >>> [print cog method for...] 16rBC80 ==>
> >>>    16rBC80 <-> 16rBEA8: method: 16rC51970 prim 19 selector: 16r6D1620
> newProcess
> >>>
> >>> [print cog method for...] 16rBEA8 ==>
> >>>    16rBEA8 <->     16rBF28: method:   16rC518C0 selector:   16r76A600
> asContext
> >>>
> >>> However the links don't seem to go back up the call stack but forward,
> >>> to statements to be executed in the future.   So I am confused?
> >>
> >>
> >> Yeah it's the jitted version of the method header address, then <->,
> then the jitted method entry point address, the bytecode version address,
> selector address.
> >>
> >> The cogMethod header is used to store the bytecoded compiled method
> header (because it was replaced with a pointer to the cogMethod) and
> various flags.
> >>
> >>>
> >>>
> >>> -------------
> >>>
> >>> Considering further [print cog method for...] 16rBBF0 ==>
> >>>   16rBBF0 <-> 16rBC80: method: 16rC4E948 selector: 16r6CC798 forkAt:
> >>>
> >>> [print oop...] 16r6CC798 ==>
> >>>    a(n) ByteSymbol nbytes 7  forkAt:
> >>>
> >>> Clement early advised is the bytecode version of the method is this...
> >>> [print oop...] 16rC4E948 ==>
> >>>   16rC4E948: a(n) CompiledMethod nbytes 37
> >>>      16rBBF0  is in generated methods
> >>>    16r6D1620 #newProcess   16r6CC650 #priority:   16r6CC690 #resume
> >>>    16r6CC798 #forkAt:   16rAE5490 a ClassBinding #BlockClosure ->
> 16r0088D618
> >>>   16rC4E968:  70/112 D0/208 88/136 10/16 E1/225 87/135 D2/210 7C/124
> >>>   16rC4E970:  28/40 AF/175 BA/186 F3/243 20/32
> >>>
> >>> Now I've been a bit slow on the uptake and only just realised, but to
> confirm...
> >>> the line 16r6CC798 is the one specifying the method as
> BlockClosure>>forkAt:
> >>
> >> 16r6CC798 is the address of the selector #forkAt:
>
> Sorry I wasn't clear.  I wasn't referring to the address itself of the
> selector - that was just a line reference.  My insight I wanted to
> confirm was that the last oop before the bytecode was...
>      a ClassBinding #BlockClosure -> 16r0088D618
> and the next last before that was...
>     #forkAt:   16rAE5490
> indicating the output of [print oop...] was method BlockClosure>>forkAt: ,
> while above that line are the methods called by #forkAt: and below it
> is the bytecode.
>
> Ahhh, actually I just saw this relevant comment in CompiledMethod...
> "The last literal in a CompiledMethod must be its
> methodClassAssociation, a binding whose value is the class the method
> is installed in.  The methodClassAssociation is used to implement
> super sends.  If a method contains no super send then its
> methodClassAssociation may be nil (as would be the case for example of
> methods providing a pool of inst var accessors). By convention the
> penultimate literal of a method is either its selector or an instance
> of AdditionalMethodState. "
>
> So it seems it won't always show the Class>>method, but often will.
>

Well...

By convention the bytecode compiler always put the class binding as the
last literal except if there is not enough room in the literal frame, which
in practice happens on average 1 methods out of 100.000 in my experience,
and which will never happen once we've switched to the new bytecode set...

But if the method has no super sends, it does not need the class as the
last literal, and one could compile an image like that to save some memory.
If one removes both the selector (last but one literal) and the class
binding, one could save 150kb in the base Pharo image out of ~47Mb.
Currently there is no setting to do that, but one could do it.

>
> >>
> >>>
> >>> For the last two lines, I notice the numbers before the slash (70, 88,
> >>> 10...) are the method bytecode, but what are the numbers after the
> >>> slash?
> >>
> >>
> >> The bytecode in decimal instead of hexa I think.
>
> I checked. You are right.  Obvious in hindsight.
>
> >>> ----------------
> >>>
> >>> In #activeCoggedNewMethod: the second assignment to methodHeader
> >>>   ==> 16r208000B
> >>>
> >>> which matches the mthhdr field of the raw header
> >>> [print cog method header for...] 16rBBF0 ==>
> >>>     BBF0
> >>>     objhdr: 8000000A000035
> >>>     nArgs: 1 type: 2
> >>>     blksiz: 90
> >>>     method: C4E948
> >>>     mthhdr: 208000B
> >>>     selctr: 6CC798=#forkAt:
> >>>     blkentry: 0
> >>>     stackCheckOffset: 5E/BC4E
> >>>     cmRefersToYoung: no cmIsFullBlock: no
> >>>
> >>> What is "type: 2" ?
> >>
> >>
> >> Haha.
> >>
> >> Well when you iterate over the machine code zone you need to know what
> the current element you iterate on is. In the machine code zone there can
> be:
> >> - cog method
> >> - closed PICS
> >> - open PICS
> >> - free space
> >> And now we're adding cog full block method but it's sharing the index
> with cog method and have a separated flag :-)
> >>
> >> The type tells you what it is. Look at the Literal variables CMFree,
> CMClosedPIC, CMOpenPIC, etc .
> >>
> >> 2 is CMMethod with is a constant. You can improve the printing there
> and commit the changes if you feel so.
> >
> >
> > What did I write here I don't understand myself ? I mean CMMethod = 2,
> so type = 2 means the struct you're looking at in the machine code zone is
> a method and not free space or a PIC.
> >>
> >>
> >> Ok I have to go I will look at the rest of your mail later.
> >
> >
> > Let's do this...
> >>
> >>
> >>>
> >>>
> >>> --------------------------
> >>>
> >>> Stepping through to  Cogit>>ceEnterCogCodePopReceiverReg
> >>> I notice its protocol is "simulation only"
> >>> and it calls  "simulateEnilopmart:numArgs:
> ceEnterCogCodePopReceiverReg"
> >>> but I don't see any other implementors of
> #ceEnterCogCodePopReceiverReg.
> >>> Also there is a pragma <doNotGenerate>.
> >>>
> >>> Obviously the real non-simulated VM works differently, but I can't
> >>> determine how.
> >>>
> >>> btw, I have noticed that  ceEnterCogCodePopReceiverReg
> >>>    ==> 16r10B8
> >>> and [print cog method for...] 16r10B8
> >>>    ==> trampoline ceEnterCogCodePopReceiverReg
> >>>
> >>> Is ceEnterCogCodePopReceiverReg provided by the platform C code?
> >
> >
> > Well it's in cogitIA32.c. I don't remember where it comes from.
>
> Cool. I had a peek.
>
> >
> > Basically in Cog you have specific machine code routines, called
> trampolines, that switch from machine code to C code. When trampoline is
> written backward (Enilopmart) it means that the routine is meant to switch
> from C code to machine code.
> >
> > The real VM calls in ceEnterCogCodePopReceiverReg a machine code routine
> that does the right thing (register remapped, maybe fp and sp saved, etc)
> to switch from the C runtime from the C compiler to the machine code
> runtime executing code generated by the JIT.
>
> I see its a function pointer...
>    void (*ceEnterCogCodePopReceiverReg)(void)
>
> set by...
>    ceEnterCogCodePopReceiverReg =
> genEnilopmartForandandforCallcalled(ReceiverResultReg, NoReg, NoReg,
> 0, "ceEnterCogCodePopReceiverReg");
>
> which is beyond my current level need-to-know.  Still useful to fill
> in the background architecture.  This comment comparing
> trampoline/enilopmart to system-call-like transition was
> enlightening...
>
> /*      An enilopmart (the reverse of a trampoline) is a piece of code
> that makes
>         the system-call-like transition from the C runtime into
> generated machine
>         code. The desired arguments and entry-point are pushed on a
> stackPage's
>         stack. The enilopmart pops off the values to be loaded into
> registers and
>         then executes a return instruction to pop off the entry-point
> and jump to
>         it.
>         BEFORE                          AFTER
> (stacks grow down)
>         whatever                        stackPointer -> whatever
>         target address =>       reg1 = reg1val, etc
>         reg1val                         pc = target address
>         reg2val
>         stackPointer -> reg3val */
>
>         /* Cogit>>#genEnilopmartFor:and:and:forCall:called: */
>
> >
> > In simulation, the C code is simulated by executing Slang as Smalltalk
> code and the machine code is simulated using the processor simulator (Bochs
> for IA32). So it has to be done differently as there is no C stack with
> register state and stuff. Both trampolines and enilmoparts are simulated
> with specific code.
>
> >
> >>>
> >>>
> >>> ---------------------------
> >>> Stepping through to simulateCogCodeAt:
> >>> it called processor singleStepIn:minimumAddress:readOnlyBelow:
> >>> which called
> BochsIA32Alien>>primitiveSingleStepInMemory:minimumAddress:readOnlyBelow:
> >>>      <primitive: 'primitiveSingleStepInMemoryMinimumAddressReadWrite'
> >>>        module: 'BochsIA32Plugin'
> >>>        error: ec>
> >>>      ^ec == #'inappropriate operation'
> >>>          ifTrue: [self handleExecutionPrimitiveFailureIn: memoryArray
> >>>                 minimumAddress: minimumAddress]
> >>>          ifFalse: [self reportPrimitiveFailure]
> >>>
> >>> and the debugger cursor was inside the ifTrue: statement.  I found I
> >>> didn't have bochs installed, but after installing bochs-2.6-2, I go
> >>> the same result. So could I get some background around this..
> >>>
> >>> Also I'm curious how the simulator seemed to be running a CogVM before
> >>> bochs was installed. Perhaps since I was not debugging through it, the
> >>> machine code ran for real rather than being simulated.
> >>>
> >
> > No the machine code is always simulated. Bochs was working for sure if
> you successfully simulated the image on top of the cog simulator until the
> display was shown.
> >
> > If you have a VM from one of Eliot's build (from the Cog blog) the
> processor simulators are present as plugins by default. On Mac you can do
> [show package contents...] and then look at the file inside to check the
> Bochs Plugin is there. It's not the case on the Pharo VMs so don't use them
> for CogVM simulation. You don't need to install anything.
>
> Ahhh... I see them now.
> ./lib/squeak/5.0-3692/BochsX64Plugin
> ./lib/squeak/5.0-3692/BochsIA32Plugin
>
> The clears my misconception - a lack of understanding the purpose of
> the primitive failure and a red herring when I saw the Boch's system
> package wasn't installed.
>
> >
> > On normal simulation the simulator goes often in the branch you've just
> shown. It means it reached a simulation trap. As for enilmopart that can't
> be properly simulated, trampolines can't be simulated. So to simulate a
> trampoline the processor simulator fails a call and the trampoline is done
> in the simulation code. Look at #handleCallOrJumpSimulationTrap: for
> example.
>
> Ah, so its an 'inappropriate operation' from Bochs' perspective, but
> from the Simulator's perspective the primitiveFail is a useful
> condition like the #ensure: "Primitive 198 always fails.  The VM uses
> prim 198 in a context's method as the mark for an ensure:/ifCurtailed:
> activation."  ?
>

Err... I think it's a bit different.

The processor simulator keeps running machine code until it traps, in which
case the simulation figures out why it traps, and likely it trapped because
it needed to switch from machine code to C code hence to the Smalltalk
runtime in simulation. The normal behavior is that most of the time
processor simulator primitives succeed, sometimes they fail. Primitive 198
and 199 always fail.

If you want to try you can alternatively use the MIPS back-end to simulate
machine code which is done fully in Smalltalk instead of Bochs. The
back-ends for x86, x64 and ARM are simulated using external processor
simulator frameworks, while the MIPS simulator is written entirely in
Smalltalk. The settings to use MIPS is (ISA MIPSEL). Don't hesitate to use
other back-ends, it's fun, (IA32 X64 ARMv5 MIPSEL) settings.

> cheers -ben
>
> btw, I bumped into a bit of history...
> http://www.mirandabanda.org/cogblog/2008/12/12/simulate-out-of-the-bochs/

Yeah this is a good post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160612/30b28921/attachment-0001.htm