[Vm-dev] context change versus primitive failure return value (was: FFI exception failure support on Win64 (Win32 also?))

Thu Aug 30 16:22:00 UTC 2018

On Wed, 29 Aug 2018 at 11:08, Eliot Miranda <eliot.miranda at gmail.com> wrote:

>
> All other primitives are handled by slowPrimitiveResponse.  This requires
> clearing primErrorCode and calling a function implementing the primitive
> and then testing primErrorCode before either continuing or building an
> activation for the failing primitive. There is another important task of
> internalExecuteNewMethod, which is to store the "internal" frame and stack
> pointers (localFP & localSP) into the global interpreter frame and stack
> pointers (framePointer and stackPointer) before invoking and then restoring
> localFP & localSP from framePointer and stackPointer after having invoked
> slowPrimitiveResponse.  In the Back-to-the-Future (BTTF) VMs (the original
> Squeak VM and the Stack and Cog VMs) primitives access their receiver and
> arguments through framePointer and stackPointer.  But these, being global,
> are slow and without compiler-specific hacks cannot be placed in
> registers.
>

> The Slang translator and the interpreter code collaborate to inline much
> of the interpreter, including every method beginning with internal, into
> the interpret routine in which localFP localSP and localIP are declared,
> hence allowing a C compiler to assign these variables to registers.
>

Ahhh. Maybe I finally get what "internal" means. IIUC, its code gets
generated internal to C interpret() function,
as I can see searching for "internalExecuteNewMethod"
in...
https://raw.githubusercontent.com/OpenSmalltalk/opensmalltalk-vm/Cog/spurstacksrc/vm/interp.c

One thing I'm curious about is why...
searching on "externalNewMethod" shows it is inlined several times,
but StackInterpreter>>executNewMethod doesn't have the inline pragma   ??

Now a naive question, since as a 13,000 line function its a bit hard to
absorb intepret()...
why inline by direct code  generation   rather than as using the "inline"
directive on individually generated functions ??
http://www.drdobbs.com/the-new-c-inline-functions/184401540
(I quite like how that shows the function being folded inline and then
function parameters optimized away)

So another reason slowPrimitiveResponse is slow is that it writes and reads
> localFP, localSP & localIP to/from framePointer, stackPointer &
> instructionPointer.
>

Also, along the same naive track, why not inline the primitive C functions
so that
you don't need to manually {externalize,internalize}IPandSP  and make those
primitives faster ??
I guess it wouldn't work for instructionsPointer, with definitions like...

    _iss char * stackPointer; //gobal
    _iss char * framePointer; //global
    _iss usqInt instructionPointer;  //global
    sqInt
    interpret(void)
    {   char * localFP;
        char * localIP;
        char * localSP;

   StackInterpreter >> externalizeIPandSP
        instructionPointer := self oopForPointer: localIP.
        stackPointer := localSP.
        framePointer := localFP

Now on page 594 of the Bluebook I read "The fetchByte routine fetches the
byte indicated by the activecontext's  instruction pointer and increments
the instructionPointer"
That sounds like each Context has its own instructionPointer, but I didn't
think that was so ??

> But because it does so, primitives that change the execution context
> (process switch primitives that switch to another "stack" of contexts, or
> eval primitives such as perform:with:* and value:value* which build another
> frame) can be written and change the execution context at a send point
> (primitive invocation is always at a send).
>

> Note that all the internal methods have non-internal duals that do the
> same thing but use framePointer, stackPointer & instructionPointer, not
> localFP, localSP & localIP.  These are used to implement the eval
> primitives perform:* and withArgs:executeMethod: since these may also
> invoke primitives.  And hence you might be able to get your head around my
> favorite Smalltalk construction:
> | array |
> array := { #perform:withArguments:. nil }.
> array at: 2 put: array.
> array perform: array first withArguments: array
> ;-)
>

That's rather recursively evil.

Given the primitives are always invoked at a send we can see how elegant
> Dan's invention of primitive failure is.  Primitives are essentially
> transactional and atomic.  They validate their arguments and if validation
> succeeds they carry out their action and answer a result as if from some
> normal send.  But if validation fails, or if they are unimplemented, the
> method containing the primitive reference (<primitive: 61>, <primitive:
> 'primitiveSocketAccept' module: 'SocketPlugin'>) is simply activated as if
> the primitive didn't exist, or as if the method was a normal method.  Hence
> primitives are optional, in that if the method body does what the primitive
> does then no one can tell if the primitive is doing the work or Smalltalk
> code, except by measuring performance.  Hence for example large integer
> arithmetic and string display primitives are optional and serve to
> accelerate the system.
>
> There is one other route by which primitives are executed, also at the
> point of send, and this is via the special selector bytecocdes.  Another of
> Dan's excellent optimizations, the special selectors both save space and
> make the interpreter faster.  They save space by encoding the 32 most
> frequently occurring sends as one byte bytecodes, hence saving the 2
> (16-bit Smalltalk-80), 4 (32-bit Squeak) or 8 (64-bit Squeak) bytes to
> store the selector in a method's literal frame.  But some of them also
> speed up the interpreter by statically predicting the receiver type.  i.e.
> #+, #-, $/ #*, #<, #>, #<= et al are most often sent to integers, and hence
> these bytecodes, as specified in the Blue Book, check for the top two
> elements on the stack being SmallIntegers, and if so replace the two top
> elements by the result of the operation, avoiding a send and primitive
> dispatch.  Note that in the BttF interpreter this checking is extended both
> to apply to Float,
>

In the BlueBook Bluebook p619 I see the simple bytecode dispatch...
    currentBytecode = 176 ifTrue: [ ^self primitiveAdd].
and...
Bluebook Interpreter >> primitiveAdd
    | integerReceiver integerArgument integerResult |
    integerArgument := self poplnteger.
    integerReceiver := self poplnteger.
    self success
        ifTrue: [
    integerResult := integerReceiver + integerArgument.
            self success: (memory islntegerValue: integerResult)].
    self success
        ifTrue:  [self pushlnteger: integerResult]
        ifFalse: [self unPop: 2]

and notice that StackInterpreter is doing a lot more within the bytecode
before a primitive is needed...
spurstacksrc/vm/interp.c has...
     case 176: /* bytecodePrimAdd */
which is...
StackInterpreter >> bytecodePrimAdd
| rcvr arg result |
rcvr := self internalStackValue: 1.
arg := self internalStackValue: 0.
(objectMemory areIntegers: rcvr and: arg)
ifTrue: [result := (objectMemory integerValueOf: rcvr) + (objectMemory
integerValueOf: arg).
(objectMemory isIntegerValue: result) ifTrue:
[self internalPop: 2 thenPush: (objectMemory integerObjectOf: result).
^ self fetchNextBytecode "success"]]
ifFalse: [self initPrimCall.
self externalizeIPandSP.
self primitiveFloatAdd: rcvr toArg: arg.
self internalizeIPandSP.
self successful ifTrue: [^ self fetchNextBytecode "success"]].

messageSelector := self specialSelector: 0.
argumentCount := 1.
self normalSend

Now I'm curious about the different handling above of integers (ifTrue:
path) and floats (ifFalse: path).
I guess the integer code is due to being immediate values where the object
doesn't need to be looked up,
while the non-immediate floats need a primitive call.
Now I'm wondering, since 64-bit has immediate floats, is bytecodePrimAdd
due an update to make them faster?

btw, my impression is that  #areImmediateIntegers:and:  seems a more
explicit name that  #areIntegers:and:
since I guess the ifTrue: path doesn't apply to integers that are
LargePositiveIntegers.

> and to check for a fool,lowing branch after the conditionals #<, #<= et
> al, so that the result doesn't have to be reified into a boolean that is
> tested immediately; effectively the following branch gets folded into the
> relational special selector bytecode.
>

Very interesting to know.

> The JIT uses this same technique, but is able to do a much better job
> because, for example, it can know if a relational special selector send is
> followed by a jump bytecode or not an JIT time.
>
>
>>
>>
>> So I understand that checkForEventsMayContextSwitch: called at the end
>> of internalActivateNewMethod
>> occurs after bytecode execution has completed, so that context switches
>> made there are done "between" bytecodes.
>> However my perspective is that internalExecuteNewMethod is only half way
>> through a bytecode execution when
>> the primitives effect context changes.  So internalActivateNewMethod ends
>> up working on a different Process than internalExecuteNewMethod started
>> with.  The bottom three red lines in the chart are what I considered to be
>> changing threads "half way" through a bytecode.
>>
>
> More accurately, checkForEventsMayContextSwitch: is called on activating a
> method, after the send has occurred, but before the first bytecode has
> executed.  Hence primitive sends are not suspension points unless the
> primitive is a process switch or eval primitive.  Instead, just as a
> non-primitive (or failing primitive) method is being activated
> checkForEventsMayContextSwitch: is invoked.
>

On BlueBook page 594 I see checkProcessSwitch is called at the start of the
cycle rather than the end we have.
Probably its buried in history, but do you know of any particular reason
for the change?

Ahh, I'm slowly coming to grips with this.  It was extremely confusing at
>> the time why my failure code from the primitive was turning up in a
>> different Process, though I then learnt a lot digging to discover why.   In
>> summary, if the primitive succeeds it simply returns
>> from internalExecuteNewMethod and internalActivateNewMethod never sees the
>> new Process B.  My problem violating the primitive-failure side-effect rule
>> was that internalActivateNewMethod trying to run Process A
>> in-Image-primitive-failure code
>> instead ran Process B in-Image-primitive-failure code.
>>
>
> Right.  So that design requirement is key to the VM architecture.  That
> was something I had to explain to Alistair when he did the first cut of the
> FileAttributesPlugin, which used to return failure codes on error instead
> of failing, leaving error recovery to clients.  And so it underscores the
> importance of reading the blue book specification.  [We really should do an
> up-to-date version that describes a simplified 32-bit implementation].
>

That would be great.  First step would be to seek permission to reuse that
chapter.
It would be best to reuse most of its structure and content and update
details.

cheers -ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180831/30432b4d/attachment-0001.html>