[Vm-dev] CogVM Execution Flow

Tue Jun 14 17:48:06 UTC 2016

Hi Ben,

On Tuesday, June 14, 2016, Ben Coman <btc at openinworld.com> wrote:

>
> On Tue, Jun 14, 2016 at 2:41 AM, Eliot Miranda <eliot.miranda at gmail.com
> <javascript:;>> wrote:
> >
> > Hi Ben,
> >
> >     the diagram below shows the trees, but the wood is arguably more
> important.  The diagram below is focussing on the transitions, but doesn't
> clearly show what is being transitioned between.  I imagine a diagram which
> shows the structures and has what you have in the yellow boxes as
> transitions.
>
> By transitions do you mean graph edges?

Yes.  For example ceSend: is a transition from machine code to the run-time.

>  That actually turns out a
> little difficult because I can't attach multiple edges together like I
> can attached multiple edges to a shape - but I'll keep trying.
>
> btw, Are externalizeIPandSP and internalizeIPandSP big hints as to the
> transitions between those main structures?

No :-(.  This is actually a milli-optimisation to the interpreter.  If one
wants an interpreter to go fast one needs as many of the key
interpreter variables (stack pointer, frame pointer, instruction pointer)
in registers.  Compilers such as gcc allow global register variables (in
part due to my work in BrouHaHa where I achieved this by nefarious means
and then requested the facility of Richard Stallman). But that's
non-portable.  The approach taken in the Squeak VM is to inline much of the
interpreter into one function and have localSP, localFP & localIP as local
variables, and rely on the C compiler's optimiser to put these in
registers.  That means they have to be written to stackPointer,
framePointer and instructionPointer before calling a primitive
(externalize) and read back afterwards (internalize).  Good idea, adds
complexity, doesn't add much to the Cog VM, essential to the Stack &
Interpreter VMs.

>
>   If so, could you spell out
> which is which.  Also do stackPointer, framePointer,
> instructionPointer, localSP, localFP, etc belong to certain of those
> structures?

Yes.  You can see now that they belong to the interpreter and hence to the
C runtime. They point to the current frame in the stack zone for the
interpreter.  In machine code we use the native sp, fp & pc and so every
trampoline does an externalize, a call, and an internalize (which may not
be reached if the call doesn't return), and every enilipmart does an
internalize, but of the native sp, fp & pc rather than localSP, localFP,
etc.

Sorry about the font sizes.  Using the gmail app for the first time and it
mucks things up when copy/pasting.

cheers -ben

> > So...
> > The essential structures are six-fold, three execution state structures,
> and three bodies of code, and in fact there is overlap of one of each.
> >
> > These are the execution state structures:
> >
> > 1. the C stack.
> > 2. the Smalltalk stack zone.
> > 3. the Smalltalk heap (which includes contexts that overflow the
> Smalltalk stack zone).
> >
> > These are the bodies of code:
> > 4. the run-time, the code comprising the VM interpreter, JIT, garbage
> collector, and primitives
> > 5. the jitted code living in the machine code zone, comprising methods,
> polymorphic in line caches, and the glue routines (trampolines and
> enilopmarts) between that machine code and the run-time
> > 6. Smalltalk "source" code, the classes and methods in the Smalltalk
> heap that constitute the "program" under execution
> >
> > So 3. and 6. overlap; code is data, and 2. overflows into 3., the stack
> zone is a "cache", keeping the most recent activations in the most
> efficient form for execution.
> > Further, 4. (the run-time) executes solely on 1. (the C stack), and 5.
> (the jitted code) runs only on 2. (the stack zone), and also, code in 6.
> executed (interpreted) by the interpreter and primitives in 4. runs on 2.
> (the stack zone)
> >
> > Your diagram names some of the surface transitions, but not the deeper
> when and why.  Here they are:
> >
> > a) execution begins on the C stack as the program is launched.  Once the
> heap is loaded, swizzling pointers as required, the interpreter is
> entered.  On first entry it
> >   a1) allocates space for the stack zone on the C stack
> >   a2) "marries" the context in the image that invoked the snapshot
> primitive (a stack frame in the stack zone is built for the context, and
> the context is changed to become a proxy for that stack frame).
> >   a3) captures the top of the C stack (CStackPointer & CFramePointer) as
> interpret is about to be invoked, including creating a "landing pad" for
> jumping back into the interpreter
> >     a3 vm) the landing pad is a jmpbuf created via setjmp, and jumped to
> via longjmp
> >     a3 sim) the landing pad is an exception handler for the
> ReenterInterpreter notification
> >   a4) calls interpret to start interpreting the method that executed the
> snapshot primitive
> >
> >
> > Invoking the run-time:
> > Machine code calls into the run-time for several facilities: adding an
> object to the remembered table if a store check indicates this must happen,
> running a primitive in the run-time, entering the run-time to lookup and
> bind a machine code send, or a linked send that has missed.  To invoke the
> run-time, the machine code saves the native stack and frame pointers (those
> of the current Smalltalk stack frame) in stackPointer and framePointer,
> sets the native stack and frame pointers to CStackPointer and
> CFramePointer, passes parameters (pushing x86, loading registers ARM, x64)
> and calls the run-time routine.  Simple routines (adding element to the
> remembered set) simply perform the operation and return. The code returned
> to then switches back to the Smalltalk stack pointers and continues.
> Routines that change the Smalltalk frame (send-linking routines, complex
> primitives such as perform:) reenter via an enilopmart.
> >
> >
> > Transition to the interpreter:
> > So any time the machine code wants to transition to the interpreter (not
> simply call a routine in the run-time, but to interpret an
> as-yet-unjitted/unjittable method, either via send or return, the machine
> code switches the frame and stack pointers to those captured in a3) and
> longjmps (raises the ReenterInterpreter exception).  It does this by
> calling a run-time routine (as in "Invoking the run-time") that actually
> performs the longjmp.  Any intervening state on the C stack will be
> discarded, and execution will be in the same state as when the interpret
> routine was entered immediately after initialising the stack zone.
> >
> >
> > N.B. Note that if the interpreter merely called the machine-code, and
> the machine-code merely called the run-time, instead of substituting the
> stack and frame pointers with CStackPointer and CFramePointer set up on
> initial invocation of interpret, then the C stack would grow on each
> transition between machine code execution and interpreter/run-time
> execution and the C stack would soon overflow.
> >
> >
> > Call-backs:
> >
> > The C stack /can/ grow however.  If a call-out calls back then the
> call-back executes lower down the C stack.  A call out will have been made
> from some primitive invoked either from the interpreter or machine-code,
> and that primitive will run on the C stack.  On calling back, the VM saves
> the current CStackPointer, CFramePointer and "landing-pad" jmpbuf in state
> associated with the call-back, and then reenters the interpreter, saving
> new values for the CStackPointer, CFramePointer and "landing-pad" jmpbuf.
> Execution now continues in this new part of the C stack below the
> original.  On the call-back returning (again via a primitive), the
> CStackPointer, CFramePointer and "landing-pad" jmpbuf are restored before
> returning to the C code that invoked the call-back.  Once this C code
> returns, the stack is unwound back to the state before the call-out was
> invoked.
> >
> >
> > Transition to machine-code:
> > The interpreter uses the simple policy of jitting a method if it is
> found in the first-level method lookup cache, effectively hitting methods
> that are used more than once.  If the jitter method contains a primitive,
> that primitive routine will be invoked just as if it were an interpreted
> method.  If the method doesn't have a primitive, the interpreter will jump
> into machine code immediately.  t jumps into machine code by pushing any
> parameters (the state of the machine code registers, such as
> ReceiverResultReg, and the machine code address to begin execution) onto
> the top of the Smalltalk stack, and calling an enilopmart that switches
> from the C to the Smalltalk stack, loads the registers and jumps to the
> machine code address via a return instruction that pops the entry point off
> the Smalltalk stack.
> >
> >
> > Simulating these transitions in the Simulator:
> > In the Simulator, the C run-time (4.) are Smalltalk objects, and 1., 2.,
> 3., 5., & 6. live in the memory inst var of the object memory, a large
> ByteArray.  The machine code lives in the bottom of this memory byte array
> (MBA), and has no direct access to the Smalltalk objects.  In the real VM,
> the correlates of these objects all exist at specific addresses and may be
> accessed directly from machine code.  In the simulator this is not
> possible.  Instead, these objects are all assigned out-of-bounds addresses,
> and a dictionary maps from the specific out-of-bounds address to the
> specific object being accessed, e.g. stackPointer, an inst var of
> InterpreterPrimitives, the superclass of StackInterpreter, has an address
> in simulatedAddresses that maps to a block that does a perform to access
> stackPointer's value.  See CoInterpreter>>stackPointerAddress.
> >
> > Machine code is executed by one of the processor aliens via the
> primitiveRunInMemory:minimumAddress:readOnlyBelow: or
> primitiveSingleStepInMemory:minimumAddress:readOnlyBelow: primitives. These
> primitives will fail when they encounter an illegal instruction, including
> an instruction that tried to fetch or store or jump to an out-of-bounds
> address.  The primitive failure code analyses the instruction that failed
> and (when appropriate, the instruction may actually be illegal, the result
> of some bug in the system, but is typically an intended access of some
> run-time object) creates an instance of the ProcessorSimulationTrap
> exception and raises it.  The handler then handles the exception to either
> fetch, store or invoke Smalltalk objects in the simulation, and once
> handled execution can continue.
> >
> > Hence in the simulator
> primitiveRunInMemory:minimumAddress:readOnlyBelow: or
> primitiveSingleStepInMemory:minimumAddress:readOnlyBelow: (actually their
> wrappers singleStepIn:minimumAddress:readOnlyBelow: &
> runInMemory:minimumAddress:readOnlyBelow: are always invoked in the context
> of Cogit>>simulateCogCodeAt:, which provides the handler, and tests for
> machine code break-points using the breakBlock.
> >
> > In the same way that the VM must avoid C stack growth when transitioning
> between machine code and the interpreter/run-time above, so the simulator
> must avoid uncontrolled stack growth when the simulated machine code
> invokes Smalltalk code which again invokes simulated machine code.  So the
> code that invokes the run-time from simulateCogCodeAt:
> (Cogit>>#handleCallOrJumpSimulationTrap:) includes a handler for the
> ReenterMachineCode notification.  Whenever the Smalltalk run-time wants to
> reenter machine code via an enilopmart it sends
> Cogit>>#simulateEnilopmart:numArgs: which raises the notification before
> sending Cogit>>simulateCogCodeAt:.  So the first entry into machine code
> via an enilopmart starts Cogit>>simulateCogCodeAt:, but subsequent ones end
> up returning to that first Cogit>>simulateCogCodeAt: to continue execution.
> >
> >
> > Ben, given the above, can you now see how your yellow boxes name
> specific transitions amongst the structures explained below?  I hope I've
> encouraged you, not discouraged you, to revise and bifurcate your diagram
> into two state transition diagrams for the real and simulated VM.  It would
> be great to have really good diagrammatic representations of the above.
> >
> > And once we have that, we can build the relatively simple extension that
> allows the interpreter and machine code to interleave interpreted and
> machine code frames on the Smalltalk stack (2.) that allow the VM to freely
> switch between interpreted and jitter code, and to fall back on the
> interpreter whenever convenient.
> >
> > On Mon, Jun 13, 2016 at 6:24 AM, Ben Coman <btc at openinworld.com
> <javascript:;>> wrote:
> >>
> >>
> >> In trying to understand the flow of execution (and in particular the
> >> jumps in the jitted VM, I made a first rough pass to map it in the
> >> attached chart.
> >>
> >> I am trying to colourize it to distinguish between paths that can
> >> return to the interpreter, those that circulate in jitted code, and
> >> the transitions.  I'm sure I've missed the mark a bit but its a start.
> >> Of course corrections welcome, even scanned pen sketches.
> >>
> >> cheer -ben
> >>
> >
> >
> >
> > --
> > _,,,^..^,,,_
> > best, Eliot
> >
>

-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160614/3e3f58e4/attachment-0001.htm