[Vm-dev] VM Maker: VMMaker.oscog-cb.1236.mcz

Eliot Miranda eliot.miranda at gmail.com
Thu Apr 23 16:08:50 UTC 2015


On Thu, Apr 23, 2015 at 3:46 AM, Clément Bera <bera.clement at gmail.com>
wrote:

>
>
>
> 2015-04-23 1:20 GMT+02:00 Eliot Miranda <eliot.miranda at gmail.com>:
>
>>
>> Hi Clément,
>>
>>
>> On Wed, Apr 22, 2015 at 2:07 AM, Clément Bera <bera.clement at gmail.com>
>> wrote:
>>
>>>
>>> Eliot here's a good example to stress the register allocation:
>>>
>>> Integer>>#regStress
>>> | t t2 |
>>> t := self yourself.
>>> t2 := self + 1.
>>> ^ { t == t2 .   t == t2 .  t == t2 .  t == t2 .  t == t2 .  t == t2 .  t
>>> == t2 }
>>>
>>> I think the resulting machine code method is beautiful, it needs to
>>> spill only when it runs out of registers :-). Of course it makes sense
>>> mainly when you use inlined bytecodes instead of only #==.
>>>
>>> Some extra register moves are done because the JIT does remember if a
>>> temporary value is currently in a register (It moves each time the temp t
>>> to the same register whereas the register value is not changed). Maybe we
>>> should add a feature that remembers if temporaries are currently in a
>>> register, and if so, when doing push temp, only push the register directly
>>> in the simStack, and in case of temporary store or stack flush the register
>>> associated with a temp is not valid anymore somehow...
>>>
>>
>> Here's a sketch of something that looks to me like it would work.
>>
>> A CogSimStackEntry for a temp var is of type SSBaseOffset.  Its register
>> field is used to hold the frame pointer.  We want to mark it as having its
>> value in a register, so it needs a new field, lets call it
>> allocatedRegOrNil.  At start of compilation all SSBaseOffset entries
>> have allocatedRegOrNil nil.
>>
>> Whenever popToReg: finds it is popping an SSBaseOffset entry it sets that
>> entry's allocatedRegOrNil to the register.
>> Whenever a register is spilled (in ssFlushTo:) the sim stack is also
>> scanned looking for all SSBaseOffset entries whose allocatedRegOrNil equals
>> the register, and simply sets allocatedRegOrNil back to nil.
>>
>> On merge, with the current representation we merely set
>> all allocatedRegOrNil fields back to nil.  But with the more sophisticated
>> stack copy merge we can preserve allocation for entries whose registers
>> match.
>>
>> There are perhaps tricky details in merge (the same register used for
>> different temporaries in different branches, etc) but otherwise it is very
>> simple, no?
>>
>
> Yeah something like that would be nice. There are details such as
> liveRegisters should include those registers. Maybe this logic could be
> somehow merged with the one of ReceiverResultReg which has currently its
> own live status and may not need so.
>

Yes that's a good idea.  One would just do popToRequiredReg:
ResultReceiverReg on the self stack entry and if ResultReceiverReg
currently held the receiver it would be a no-op.  Nice.

Btw I changed #genStorePop: popBoolean LiteralVariable: litVarIndex so it
> uses register allocation instead of ReceiverResultReg and ClassReg. There
> was a flag saying that it could be used in frameless methods if register
> allocation was used. However I wonder, can you have a ceStoreCheck in a
> frameless method ? I could work around the need for ReceiverResultReg in
> the trampoline by adding extra register moves but I am not sure a
> trampoline can work fine in frameless method.
>

Yes, trampolines can work in any state.  They're orthogonal to
framelessness.  Trampolines call into the C run-time.  To do so they must
switch stack, saving the hardware stack pointers into stackPointer and
framePointer. loading the hardware stack pointers with the C stack, making
the call, and on return, switching back to the Smalltalk stack.  But it
doesn't matter what state Smalltalk is in; the stack pointers referring to
the caller method or the callee method makes no difference to the
trampoline, only to the code the trampoline calls.  With something like the
store check, the stack is unexamined; all that happens is that the argument
gets added to the remembered table.

Framelessness occurs in three different forms:

- One is a method that doesn't contain any sends other than the special
selectors #== and #class, doesn't access temporaries beyond the arguments.
So for example Interval>>setFrom:to:by:, which contains three inst var
assignments and three store checks.

- Another is a frameless block.  This is the same as a frameless method
except that block activation implies that the receiver and register
arguments will have been pushed, so the stack layout is a little different.

- the final case is a method with a primitive.  The code up until a
primitive fails is frameless.  If a primitive is implemented in machine
code then the stack and register arguments are undisturbed; the primitive
will access its operands from the registers if that's where they are.  If a
primitive is implemented in C (an interpreter primitive), or the machine
code primitive calls the interpreter primitive when it fails (which allows
the machine code to handle the simple common cases, falling back on slower
more comprehensive C code), then, like a frameless block, receiver and
arguments are pushed on the stack, because interpreter primitives can only
get their operands from the stack via stackPointer.

The above implies that the ceStoreCheck trampoline may be called from any
of the above frameless forms, as well as from a method with a full frame.

So if you've broken frameless inst var store pop, please fix it ;-).  But I
doubt you have because the code is explicit:

genStorePop: popBoolean ReceiverVariable: slotIndex
<inline: false>
| topReg valueReg constVal |
needsFrame ifFalse:
[^self genFramelessStorePop: popBoolean ReceiverVariable: slotIndex].
self ssFlushUpThroughReceiverVariable: slotIndex.
"Avoid store check for immediate values"
...

so the body of genStorePop:ReceiverVariable: does not have to deal with
framelessness.  Remember that in a frameless method (and IIRC in a
frameless block) ReceiverResultReg /always/ contains self.

The limitation of the trampolines is that they expect specific register
arguments; each one expects its own particular sequence of register
arguments that the code generator must use to call them.  But that doesn't
mean you couldn't for example implement ceStoreCheckReceiverRegTrampoline,
ceStoreCheckClassRegTrampoline etc, etc, provided you can handle the
complexity.



>
>>
>> 2015-04-22 10:08 GMT+02:00 <commits at source.squeak.org>:
>>>
>>>>
>>>> Eliot Miranda uploaded a new version of VMMaker to project VM Maker:
>>>> http://source.squeak.org/VMMaker/VMMaker.oscog-cb.1236.mcz
>>>>
>>>> ==================== Summary ====================
>>>>
>>>> Name: VMMaker.oscog-cb.1236
>>>> Author: cb
>>>> Time: 22 April 2015, 10:07:41.028 am
>>>> UUID: cf4af270-9dfa-4f2a-b84c-0cdb3c5f4913
>>>> Ancestors: VMMaker.oscog-tpr.1235
>>>>
>>>> changed the names of register allocation methods for more explict
>>>> names, for instance, allocateOneReg -> allocateRegForStackTop
>>>>
>>>> Fixed a bug where not the best register was allocated in #== (was
>>>> allocating a reg based on stackTop value instead of ssValue: 1)
>>>>
>>>> =============== Diff against VMMaker.oscog-tpr.1235 ===============
>>>>
>>>> Item was changed:
>>>>
>>>
>>>
deletia
-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20150423/c054cc03/attachment-0001.htm


More information about the Vm-dev mailing list