[Vm-dev] CogVM Execution Flow

Tue Jun 14 02:21:22 UTC 2016

Hi Ben,

> On Jun 13, 2016, at 6:49 PM, Ben Coman <btc at openinworld.com> wrote:
> 
>> On Tue, Jun 14, 2016 at 2:41 AM, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>> 
>> Hi Ben,
>> 
>>    the diagram below shows the trees, but the wood is arguably more important.  The diagram below is focussing on the transitions, but doesn't clearly show what is being transitioned between.  I imagine a diagram which shows the structures and has what you have in the yellow boxes as transitions.  So...
>> 
>> The essential structures are six-fold, three execution state structures, and three bodies of code, and in fact there is overlap of one of each.
>> 
>> These are the execution state structures:
>> 
>> 1. the C stack.
>> 2. the Smalltalk stack zone.
>> 3. the Smalltalk heap (which includes contexts that overflow the Smalltalk stack zone).
>> 
>> These are the bodies of code:
>> 4. the run-time, the code comprising the VM interpreter, JIT, garbage collector, and primitives
>> 5. the jitted code living in the machine code zone, comprising methods, polymorphic in line caches, and the glue routines (trampolines and enilopmarts) between that machine code and the run-time
>> 6. Smalltalk "source" code, the classes and methods in the Smalltalk heap that constitute the "program" under execution
>> 
>> So 3. and 6. overlap; code is data, and 2. overflows into 3., the stack zone is a "cache", keeping the most recent activations in the most efficient form for execution.
>> Further, 4. (the run-time) executes solely on 1. (the C stack), and 5. (the jitted code) runs only on 2. (the stack zone), and also, code in 6. executed (interpreted) by the interpreter and primitives in 4. runs on 2. (the stack zone)
> 
> I don't have the charting tool where I am atm, so I knocked up the
> above in Excel with a few embellishments that need checking. I am a
> bit confused by "primitives in 4. runs on 2. "  when "4. (the
> run-time) executes solely on 1."  and primitives are part of 4.

All primitives are implemented either in the interpreter, or in plugins, and some of the core primitives (arithmetic, comparison, object access and instantiation, block evaluation and perform) are also implemented by the JIT in machine code versions.  Taking the former first, all primitives in the interpreter and plugins are C functions that get called either from the interpreter (slowPrimitiveResponse) or from a cogged (machine code) method containing a primitive.  When running these primitives are running on the C stack, even though they take their parameters from the Smalltalk stack.  So they are 4. (part of the run-time) running on 1. (the C stack).

The machine code versions of primitives are compiled into the start of cogged methods that include one of the vote primitives the JIT is able to generate machine code for.  This code gets executed directly when a cogged method is invoked.  [Tangent: since 0,1 & 2 argument sends use a register based calling convention, most machine code primitives (the only exception being perform:with:with:) take their arguments from registers and answer their result in a register.  So they're much much faster: direct access to arguments instead of indirecting through stackPointer, no stack-switching call/return from Smalltalk stack to C stack and back, but they have to be written in the JIT's assembler language.]

> Also I'm not clear on what a "linked-send" is?

This is another tangent, but key to how the JIT speeds up normal Smalltalk sends.  A linked send is how the JIT speeds up sends; it is the implementation of an inline per-send-site send cache.  In machine code, sends get compiled as a register load (of a selector) followed by a call (of a trampoline that calls ceSend:super:numArgs:).  When first executed, the send of the selector to the current receiver gets looked up and the instruction sequence gets rewritten into a register load (of the class index if the current receiver) followed by a call (to the method that ceSend:super:numArgs: looked up and jitted).  This latter form is a linked send because it is linked to the entry point of some target method.  That target method's entry code checks that the class index of the current receiver matches that in the register load, and continues executing the method if they match or calling code to rebound the send to a PIC if they don't.  So once the send is linked subsequent executions call the target method directly and perform a relatively cheap class check that succeeds in most cases.  If ever it misses, the send site will get bound to a "closed" PIC, a little jump table created specific to that send site, that can hold up to 6 class index comparison, jump pairs (it can dispatch up to 6 classes of receiver) and if there are more than 6, will get rewritten to an "open" PIC specific to the selector, that probes the first level method lookup cache.  So in practice all sends except about 1% of megamorphic sends such as the sends of basicNew and initialize in Behavior>>#new settle down into calls, with no relinking occurring until either the program changes flow (introducing polymorphism) or the code zone fills up, methods are discarded and sends unlinked and later reexecuted, which doesn't happen very often.

> cheers -ben
> <Cog-structure%code.png>

_,,,^..^,,,_ (phone)