[Vm-dev] InterpreterSimulator

Eliot Miranda eliot.miranda at gmail.com
Mon Mar 14 19:12:53 UTC 2016


Hi Chris,

On Sun, Mar 13, 2016 at 11:42 AM, Chris Cunnington <brasspen at gmail.com>
wrote:

>
>
> On Mar 13, 2016, at 2:14 PM, tim Rowledge <tim at rowledge.org> wrote:
>
> The clever bit is that the first part of the machine code we built for
> BoringClass>theRubberPullets does a check that the class of the receiver is
> the same as the boringClassTag we loaded. If it is, no problem, carry on.
> If it is not then we abort to a routine that builds a PIC - a Polymorphic
> Inline Cache, see the 2nd URL above - moves things around a bit (not quite
> randomly) and once more rewrites the call so that now it jumps to the
> beginning of the PIC. And then we carry on again (isn’t that a neat title
> for a movie?) with  our processing.
>
>
> Right. This is the interesting part. But here’s the question: what’s
> different in an image from 5-7 years ago to an image now? Who is carrying
> this information? The receiver or the CompiledMethod?
>

Nothing is different in the image.  The state is all in the VM.  At
start-up an image contains only unjitted bytecoded methods, hence it can
run on wither an Interpreter or a JIT VM; the initial state is the same;
there is no machine code, not PICs, no inline caches, no nothing.

If you're running the image on an interpreter then the VM maintains its
first-level method lookup cache (FLMLC) which records class x selectror ->
{method, primitiveFunctionOrNil} quads, which avoids ~ 97% of all class
hierarchy lookups.  This state could be exposed through a primitive.  It
would reveal something about the current working set of active receiver
classes and selectors.  It's information would not be completely reliable.
The FLMLC is three-way set associative but small (1k entries in Cog VMs,
512 entries in the Interpreter (!!)) so there are frequent conflicts.

If you're running the image on a Cog JIT VM then the VM creates a
machine-code twin for a bytecode method the second time it is used.
Actually, the criterion is that if a method is in the FLMLC and it has 60
literals or less then it is jitted into machine code.  Invisibly to the
image, the header of the bytecoded method is changed to point to the
machine code method, and the bytecoded method's header and the bytecoded
method and its selector are stored in the machine coded method's header.
Let's call these methods BCM and MCM.

Each send bytecode in the BCM has a corresponding machine code send
sequence in the MCM.  Ignoring the inlining of code like #+, the machine
code send sequence in the MCM looks like

    classReg := #selector.
    call lookupRoutineForSendWith[0,1,2]Args

or, if there are 3 or more arguments

    sendNumArgsReg := N.
    classReg := #selector.
    call lookupRoutineForSendWithNArgs

This state is called an unlinked send.

When the lookup routine is run it looks up the method in the FLMLC, and if
there, it JITs it. [ If not, it does a normal lookup, enters the method in
the FLMLC and interprets it.  This way we don't waste time and space
jitting methods we only run once, e.g. at start-up.  It's much faster to
interpret once; jitting is like making a very slow single pass through the
method interpreting it. ]. Once the new jitted method is available, the
send site is rewritten to call it directly, and the selector load is
replaced with a load of the class of the receiver of the send


    classReg := #ReceiversClassIndex.                 (or class, or compact
class index in the pre-Spur VMs)
    call jittedMethodForClassAndSelector.entryPoint

, and then jumps into the newly jitted method to run it.  The next time the
send is executed the VM will call jittedMethodForSelector.entryPoint
directly. The entry-point gets the class of the _current_ receiver and
compare it with that in the classReg.  If they're the same the method
executes.

This state is called a monomorphic inline cache.  There is one class in the
cache.


 If the receiver class is different, when a monomorphic send is executed, a
lookup occurs, and the send site is updated to a PIC with two entries, one
for the first class and one for the new class.

A PIC grabs the class of the receiver and then compares it against
constants, for the various classes.  It looks like


    classReg := #ReceiversClassIndex.                 (or class, or compact
class index in the pre-Spur VMs)
    call PICSelector.entryPoint

PICForSelector.entryPoint
    receiverClass := receiver class.
    receiverClass == #Class1 ifTrue: [jump
jittedMethodForClass1AndSelector.uncheckedEntryPoint].
    receiverClass == #Class2 ifTrue: [jump
jittedMethodForClass2AndSelector.uncheckedEntryPoint].
    call extendClosedPIC

Note that jittedMethodForClass1AndSelector
and jittedMethodForClass2AndSelector might actually be the same method.

This is a polymorphic send.

I call these PICs "closed PICs" because they have a finite number of
cases.  In Cog, the max size is 6.  Subsequent sends to receivers with
different classes will extend the PIC up to 6 cases.  On encountering the
7th class the VM creates an "open" PIC.  This is machine code that does a
FLMLC probe for the selector,class pair (and the selector is now a constant
because this machine code is just for this selector) and either jumps to
the machine code or invokes the interpreter for an unjitted method.

This is a megamorphic send; there are more than 6 classes at the send site,
but the VM no longer records what they are, other than the unreliable
information in the FLMLC.

As an optimization the VM keeps the set of open PICs on a linked list, and
if a monomorphic send misses and the open PIC list contains an open PIC
with the monomorphic send's selector, the VM binds directly to the open IC,
rather than creating a closed PIC, because in most cases a polymorphic send
of a megamorphic selector will soon become megamorphic.

Now, all the above optimization ends up recording class information in the
inline caches which are in send sites in MCMs and each send in an MCM
corresponds to a send bytecode in a BCM.  So what we do in Sista is add a
primitive that answers the state of the inline caches in a BCM, in a form
that hides the MCM reality.  Let's look at an example:


Here's a little doit:

1 to: 4 do: [:i| (#(1 1.0 #one 'one') at: i) class].

It sends #<= and #+ hidden in the inlining of the to:do:.  It sends #at: to
the array, and it sends #species to a SmallInteger, a BoxedFloat64, a
ByteSymbol and a ByteString (I use #species, not #class because the VM
inlines #class so there is no send if we use it).  It contains a
conditional branch that tests the result of #<=.  Here's the bytecode:

57 <76> pushConstant: 1
58 <68> popIntoTemp: 0
59 <10> pushTemp: 0
60 <22> pushConstant: 4
61 <B4> send: <=
62 <AC 0B> jumpFalse: 75
64 <21> pushConstant: #(1 1.0 #one ''one'')
65 <10> pushTemp: 0
66 <C0> send: at:
67 <D0> send: species
68 <87> pop
69 <10> pushTemp: 0
70 <76> pushConstant: 1
71 <B0> send: +
72 <68> popIntoTemp: 0
73 <A3 F0> jumpTo: 59
75 <78> returnSelf

Here's the actual object the send and branch data primitive answers for an
execution of the above:

{#(62 5 4).
   {66 . Array . (Object>>#at: "a CompiledMethod(1789611)")} .
   {67 . SmallInteger . (Object>>#species "a CompiledMethod(3005395)") .
            BoxedFloat64 . (Object>>#species "a CompiledMethod(3005395)") .
            ByteSymbol . (ByteSymbol>>#species "a CompiledMethod(3876413)")
.
            ByteString . (Object>>#species "a CompiledMethod(3005395)")} }
. '

So what does this say?  The entry for bytecode 62 <AC 0B> jumpFalse: 75 has
been executed 5 times and taken 4 times.  We count conditional branches so
that ofetn executed code calls back into Smalltalk to let Sista run, do its
analysis, optimize, and continue in optimized code.

The entry for bytecode 66 <C0> send: at: has one class, Array, and for that
class the method invoked is Object>>#at:.
The entry for bytecode 67 <D0> send: species as four classes, SmallInteger,
BoxedFloat64, ByteSymbol and ByteString, and all of them happen to invoke
Object>>#species.

So the realities of machine code are completely hidden form the image.  it
sees only bytecodes, and indeed it optimizes to bytecodes.  Only the JIT
converts those bytecodes to machine code, and it hides the details, mapping
back all the infrmation into the portable machine-independent bytecode
representation.

The answer (as I understand it) is that the CompiledMethod is carrying a
> cue for the JIT. Something is going on there. The receiver doesn’t know
> anything, I don’t think. (I appreciate the process is going to ping it. But
> the process is not starting there.)
>

Right.  The information is in send sites.  And it is reified through a
primitive that maps the information back to the bytecoded representation of
methods.


There’s been lots of talk about byte codes and new byte code set, but
> that’s not it. If I explore a CompiledMethod in an old image or a new one,
> it’s not going to show me when a PIC is activating. The header gets
> incinerated and replaced with machine code. And then there’s a check to see
> if the header is now machine code ash called #isCogMethodReference:. That
> is the bifurcation between the old way and the new. If Yes, then go to JIT
> related code. If No, then it looks like it always has.
>

Yes.


> Thank you for the links. I’ll check them out in a bit. I think I’m
> reaching saturation for now.
>
> Chris
>

_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160314/ddaaaca8/attachment.htm


More information about the Vm-dev mailing list