[Vm-dev] Re: Changing CogSimulator abi

Mon Jan 28 21:28:10 UTC 2013

Hi Lars,

On Sat, Jan 26, 2013 at 1:16 PM, Lars <lars.wassermann at googlemail.com>wrote:

> Hello Eliot, hello vm-dev,
>
> @vm-dev: I'm still sometimes working on cog ARM, but due to my studies I
> have little time. The problem I'm working on is that IA32 has a different
> function call ABI than ARM. While on IA32, you need to push the return
> address, on ARM, you load it into the LR-register.
>
> A design decision to accommodate this difference in the ARM JIT was to use
> IA32 ABI within all cog code, even when running on ARM. Only when calling
> the (compiled) interpreter, we use ARM ABI. The hope was, that this way we
> need to change little of the existing code.
>
> @all: In the last days of working (spread across several months), I
> implemented the Call opcode (which is used by cogit whenever a function is
> called) by pushing the return address before branching to the target (IA32
> ABI).
> Also, I changed the trampoline generation to ask the compiler for the
> appropriate call opcode for the ABI (so far not committed), which is either
> Call in case of IA32 or BL in case of ARM. I'm not happy with that location
> for this behavior, but I don't know whether there exists a better place.
> Also, #hasLinkRegister is implemented on the compiler.
>
> Now, that calling the interpreter has changed, I run into the problem,
> that the simulator is expecting the stack pointer to point to the return
> address. The simulator is assuming IA32 ABI.
>
> How best to attribute for the changed ABI in the simulator?
>     Subclass the simulator? On which level, VMSimulator or VMSimulatorLSB?
> That change would be orthogonal to the LSB subclass (if there ever will be
> a MSB subclass).
>     Or introduce two classes which do know the ABI and are responsible for
> all places where ABI is used? Also the eventual changes to trampoline and
> enilopmart generation? Which problems might arise from this design decision
> with respect to the C-translation?
>

I would take the same approach that Peter Deutsch took in HPS, the
VisualWorks VM.  The idea is to keep the Interpreter side of things
unchanged and change the glue code and/or the generated method prologue
code to keep the stack the same from the Interpreter's point of view.  So
when an ARM machine code method calls another ARM machine code method the
link register is in use, and the frame building code in a frame-building
non-leaf method pushes the link register as part of building the frame (as
one would expect), and a frameless method may be able to return through the
link register if it contains no runtime calls, but wold have to if it does
(*).  But if a machine-code method calls the run-time through glue it would
push the link register at some point before the glue call, leaving the
stack in the same state as it would be in the IA32 version at the same
point in execution.

For example, here's the prolog for a normal method, expressed in the VM's
assembler:

LstackOverflow:
MoveCq: 0 R: ReceiverResultReg
LsendMiss:
Call: ceMethodAbortTrampoline
AlignmentNops: (BytesPerWord max: 8)
Lentry:
objectRepresentation getInlineCacheClassTagFrom: ReceiverResultReg into:
TempReg
CmpR: ClassReg R: TempReg
JumpNonZero: LsendMiss:
LnoCheckEntry:
... frame bulding code ...
MoveAw: coInterpreter stackLimitAddress R: TempReg
CmpR: TempReg R: SPReg
JumpBelow: LstackOverflow

The ceMethodAbort handles both the send miss when the inline cache fails,
and stack overflow at the end of a stack page or to check for events.  The
link register defnitely needs to be pushed for the send miss.  It doesn't
need to be pushed for the stack overflow (since frame build code has
already saved it in the return pc slot in the frame), but pushing it
unnecessarily can be undone by the glue for ceMethodAbortTrampoline.

So the abort code would become

LstackOverflow:
MoveCq: 0 R: ReceiverResultReg
LsendMiss:
Push: LinkReg
Call: ceMethodAbortTrampoline
AlignmentNops: (BytesPerWord max: 8)
...

and in ceMethodAbortTrampoline there would be a test on ReceiverResultReg
so that if ReceiverResultReg is 0 (the stack overflow case) the link
register is written to the same stack slot as it was pushed to, so that the
top of stack is the return address for the ceMethodAbortTrampoline call,
and if ReceiverResultReg is non-zero (the send miss case), the link
register is pushed, so that the inner return address on top of stack is the
return address for the ceMethodAbortTrampoline call and the outer return
address is that for the send call that missed.  The return addresses are
used to identify the method (whose selector is the selector of the send)
and the calsite at which the send missed.

So with a little modification in the right places the Interpreter sees
exactly the same stack with ARM machine code as it does on IA32.  In fact
we can construct tests to ensure this is the case by running two VMs side
by side, running some test image that exercises the send machinery etc.

As far as the code codes it might look something like:

*Cogit methods for compile abstract instructions*
*compileAbort*
"*The start of a CogMethod has a call to a run-time abort routine that
either*
* handles an in-line cache failure or a stack overflow.  The routine
selects the*
* path depending on ReceiverResultReg; if zero it takes the stack overflow*
* path; if nonzero the in-line cache miss path.  Neither of these paths
returns.*
* The abort routine must be called;  In the callee the method is located by*
* adding the relevant offset to the return address of the call.*"
stackOverflowCall := self MoveCq: 0 R: ReceiverResultReg.
backEnd hasLinkRegister ifTrue:
 [self PushR: LinkReg].
sendMissCall := self Call: (self methodAbortTrampolineFor:
methodOrBlockNumArgs)

StackToRegisterMappingCogit methods for initialization
genMethodAbortTrampolineFor: numArgs
 "Generate the abort for a method.  This abort performs either a call of
ceSICMiss:
 to handle a single-in-line cache miss or a call of ceStackOverflow: to
handle a
 stack overflow.  It distinguishes the two by testing ResultReceiverReg.
 If the
 register is zero then this is a stack-overflow because a) the receiver has
already
 been pushed and so can be set to zero before calling the abort, and b) the
 receiver must always contain an object (and hence be non-zero) on SIC
miss."
| jumpSICMiss |
<var: #jumpSICMiss type: #'AbstractInstruction *'>
opcodeIndex := 0.
self CmpCq: 0 R: ReceiverResultReg.
jumpSICMiss := self JumpNonZero: 0.
backEnd hasLinkRegister ifTrue:
 [self MoveR: LinkReg Mw: 0 r: SPReg]. "overwrite send ret address with
ceMethodAbortTrampoline call ret address"
self compileTrampolineFor: #ceStackOverflow:
callJumpBar: true
numArgs: 1
arg: SendNumArgsReg
arg: nil
arg: nil
arg: nil
saveRegs: false
resultReg: nil.
jumpSICMiss jmpTarget: self Label.
backEnd hasLinkRegister ifTrue:
 [self PushR: LinkReg]. "push ret address for ceMethodAbortTrampoline call"
...

The same goes for the aborts in closed and open PICs.  Does this make sense?
(*) I'm not sure without looking at the code carefully whether any
frameless methods can make calls on the runtime.  If not, then this issue
is moot.  If so, then one solution is to not compile the method frameless
if it makes use of the run-time.  Another approach would be to build a
simple frame (just push the link register).

> All the best,
> Lars
>
>
>

-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20130128/1cbf693c/attachment.htm