[Vm-dev] VM Maker: VMMaker.oscog-cb.1236.mcz

Clément Bera bera.clement at gmail.com
Sat Apr 25 16:52:32 UTC 2015


Hey Eliot,

I was looking at other code, such as literalVariable access, and I realized
it has the same issue as temporaries: a register is used to temporarily
store the association, and at the end the association is still in the
register but if we have another access to the literal variable it will not
reuse it and move again the association to a register.

I was thinking that we could have a more generic solution.

The JIT, while compiling, could have a list (of a size which is the number
of registers) where it knows a mapping value - register. At the end of each
method that access temps, literal variable and co we could mark in the list
several values (such as temp, association or the receiver) to be in a given
register. When fetching a register for an association or a temp with
allocateAnyReg or whatsoever, we could first look if the value is already
in a register based on the list, and if so, reuse the register directly,
else allocate a new one and put it on the list. When the register
allocation needs to find a new register, it would fetch first a register
that does not hold a value, if there are none, free a random one which is
used in the list but not in the simulated stack, and lastly, if there's
still none, spill a register. When allocating a register for another
purpose than because the value held is reused, we remove the value - reg
entry from the list. Lastly, for interrupt points like sends, we free the
list completely as we don't know which values will remain.

I believe the list could have different values to remember: temporaries,
associations, receiver, tempVectors. Maybe more like constants that are
used several times by moving them to registers.

I am not sure I understand what you want to do with the "has been read"
flag. Maybe I should discuss it with you directly so you can explain me.

I think on the short term we should also add mappings for extra registers
(unused regs on ARM and x86_64) and change the register allocation so it
tries first to allocate those.

I think I found the bug in the SistaCogit, it was crashing randomly in
different branches (not only #==), and I think the bug is that now that the
counters use a random register (based on register allocation) instead of
using SendNumArgsReg, and that the mustBeBoolean trampoline, also used for
counter tripping, does not save the counterReg whereas it is reused after
the trampoline, in the case where counter reg is callerSaved it crashes
(and I believe that SendNumArgsReg was calleeSaved so it worked). I think I
should change register allocation to try to allocate a calleeSaved register
first so we don't have to do anything across trampolines if possible...


2015-04-25 18:09 GMT+02:00 Eliot Miranda <eliot.miranda at gmail.com>:

>
> Hi Clément, Hi All,
>
> On Apr 23, 2015, at 9:08 AM, Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
>
>
>
> On Thu, Apr 23, 2015 at 3:46 AM, Clément Bera <bera.clement at gmail.com>
> wrote:
>
>>
>>
>>
>> 2015-04-23 1:20 GMT+02:00 Eliot Miranda <eliot.miranda at gmail.com>:
>>
>>>
>>> Hi Clément,
>>>
>>>
>>> On Wed, Apr 22, 2015 at 2:07 AM, Clément Bera <bera.clement at gmail.com>
>>> wrote:
>>>
>>>>
>>>> Eliot here's a good example to stress the register allocation:
>>>>
>>>> Integer>>#regStress
>>>> | t t2 |
>>>> t := self yourself.
>>>> t2 := self + 1.
>>>> ^ { t == t2 .   t == t2 .  t == t2 .  t == t2 .  t == t2 .  t == t2 .
>>>>  t == t2 }
>>>>
>>>> I think the resulting machine code method is beautiful, it needs to
>>>> spill only when it runs out of registers :-). Of course it makes sense
>>>> mainly when you use inlined bytecodes instead of only #==.
>>>>
>>>> Some extra register moves are done because the JIT does remember if a
>>>> temporary value is currently in a register (It moves each time the temp t
>>>> to the same register whereas the register value is not changed). Maybe we
>>>> should add a feature that remembers if temporaries are currently in a
>>>> register, and if so, when doing push temp, only push the register directly
>>>> in the simStack, and in case of temporary store or stack flush the register
>>>> associated with a temp is not valid anymore somehow...
>>>>
>>>
>>> Here's a sketch of something that looks to me like it would work.
>>>
>>> A CogSimStackEntry for a temp var is of type SSBaseOffset.  Its register
>>> field is used to hold the frame pointer.  We want to mark it as having its
>>> value in a register, so it needs a new field, lets call it
>>> allocatedRegOrNil.  At start of compilation all SSBaseOffset entries
>>> have allocatedRegOrNil nil.
>>>
>>> Whenever popToReg: finds it is popping an SSBaseOffset entry it sets
>>> that entry's allocatedRegOrNil to the register.
>>> Whenever a register is spilled (in ssFlushTo:) the sim stack is also
>>> scanned looking for all SSBaseOffset entries whose allocatedRegOrNil equals
>>> the register, and simply sets allocatedRegOrNil back to nil.
>>>
>>> On merge, with the current representation we merely set
>>> all allocatedRegOrNil fields back to nil.  But with the more sophisticated
>>> stack copy merge we can preserve allocation for entries whose registers
>>> match.
>>>
>>> There are perhaps tricky details in merge (the same register used for
>>> different temporaries in different branches, etc) but otherwise it is very
>>> simple, no?
>>>
>>
>> Yeah something like that would be nice. There are details such as
>> liveRegisters should include those registers. Maybe this logic could be
>> somehow merged with the one of ReceiverResultReg which has currently its
>> own live status and may not need so.
>>
>
> Yes that's a good idea.  One would just do popToRequiredReg:
> ResultReceiverReg on the self stack entry and if ResultReceiverReg
> currently held the receiver it would be a no-op.  Nice.
>
>
> In thinking about this some more a key issue is that the linear scan
> register allocator we have is greedy.  That is, it allocates registers when
> parameters are first mentioned, an outside-in order, instead of when
> parameters are used, an inside-out order.  I was thinking about the case if
> nested loops where one wants to allocate registers to the innermost
> variables not the outermost, since the innermost change more frequently and
> hence one ends up with fewer memory writes if the innermost variables are
> allocated in registers.
>
> So if the allocator as described above was to allocate temps in registers
> we might end up with a situation where we allocated a register for an
> "outer" temp (a less frequently changing temp) and later reallocated it
> for an "inner" temp.  This is not a problem except if we unnecessarily load the
> register and write it back unnecessarily to the temp.
>
> I think this is easy to avoid by adding a "has been read" flag to the
> temp.  Unless a stack entry for the temp-with-allocated-register has
> actually been popped into a register (& hence potentially had its value
> changed) there's no need to write it back if we reallocate the register.  If
> we reallocate the register and its temp hasn't been read merely scan the
> simstack and delete the allocated register from all occurrences of the
> temp, and when it is accessed it'll be accessed as a normal temp.
>
> So a "has been read" flag would allow the allocator to reallocate for
> innermost temps without having to pay for mistaken greedy allocations
> earlier-in.
>
> At least that's what I thought yesterday in the car while driving.  It
> might be completely half-baked but I wanted to write down the idea just in
> case it held water.
>
>
> Btw I changed #genStorePop: popBoolean LiteralVariable: litVarIndex so it
>> uses register allocation instead of ReceiverResultReg and ClassReg. There
>> was a flag saying that it could be used in frameless methods if register
>> allocation was used. However I wonder, can you have a ceStoreCheck in a
>> frameless method ? I could work around the need for ReceiverResultReg in
>> the trampoline by adding extra register moves but I am not sure a
>> trampoline can work fine in frameless method.
>>
>
> Yes, trampolines can work in any state.  They're orthogonal to
> framelessness.  Trampolines call into the C run-time.  To do so they must
> switch stack, saving the hardware stack pointers into stackPointer and
> framePointer. loading the hardware stack pointers with the C stack, making
> the call, and on return, switching back to the Smalltalk stack.  But it
> doesn't matter what state Smalltalk is in; the stack pointers referring to
> the caller method or the callee method makes no difference to the
> trampoline, only to the code the trampoline calls.  With something like the
> store check, the stack is unexamined; all that happens is that the argument
> gets added to the remembered table.
>
> Framelessness occurs in three different forms:
>
> - One is a method that doesn't contain any sends other than the special
> selectors #== and #class, doesn't access temporaries beyond the arguments.
> So for example Interval>>setFrom:to:by:, which contains three inst var
> assignments and three store checks.
>
> - Another is a frameless block.  This is the same as a frameless method
> except that block activation implies that the receiver and register
> arguments will have been pushed, so the stack layout is a little different.
>
> - the final case is a method with a primitive.  The code up until a
> primitive fails is frameless.  If a primitive is implemented in machine
> code then the stack and register arguments are undisturbed; the primitive
> will access its operands from the registers if that's where they are.  If a
> primitive is implemented in C (an interpreter primitive), or the machine
> code primitive calls the interpreter primitive when it fails (which allows
> the machine code to handle the simple common cases, falling back on slower
> more comprehensive C code), then, like a frameless block, receiver and
> arguments are pushed on the stack, because interpreter primitives can only
> get their operands from the stack via stackPointer.
>
> The above implies that the ceStoreCheck trampoline may be called from any
> of the above frameless forms, as well as from a method with a full frame.
>
> So if you've broken frameless inst var store pop, please fix it ;-).  But
> I doubt you have because the code is explicit:
>
> genStorePop: popBoolean ReceiverVariable: slotIndex
> <inline: false>
> | topReg valueReg constVal |
> needsFrame ifFalse:
> [^self genFramelessStorePop: popBoolean ReceiverVariable: slotIndex].
> self ssFlushUpThroughReceiverVariable: slotIndex.
> "Avoid store check for immediate values"
> ...
>
> so the body of genStorePop:ReceiverVariable: does not have to deal with
> framelessness.  Remember that in a frameless method (and IIRC in a
> frameless block) ReceiverResultReg /always/ contains self.
>
> The limitation of the trampolines is that they expect specific register
> arguments; each one expects its own particular sequence of register
> arguments that the code generator must use to call them.  But that doesn't
> mean you couldn't for example implement ceStoreCheckReceiverRegTrampoline,
> ceStoreCheckClassRegTrampoline etc, etc, provided you can handle the
> complexity.
>
>
>
>>
>>>
>>> 2015-04-22 10:08 GMT+02:00 <commits at source.squeak.org>:
>>>>
>>>>>
>>>>> Eliot Miranda uploaded a new version of VMMaker to project VM Maker:
>>>>> http://source.squeak.org/VMMaker/VMMaker.oscog-cb.1236.mcz
>>>>>
>>>>> ==================== Summary ====================
>>>>>
>>>>> Name: VMMaker.oscog-cb.1236
>>>>> Author: cb
>>>>> Time: 22 April 2015, 10:07:41.028 am
>>>>> UUID: cf4af270-9dfa-4f2a-b84c-0cdb3c5f4913
>>>>> Ancestors: VMMaker.oscog-tpr.1235
>>>>>
>>>>> changed the names of register allocation methods for more explict
>>>>> names, for instance, allocateOneReg -> allocateRegForStackTop
>>>>>
>>>>> Fixed a bug where not the best register was allocated in #== (was
>>>>> allocating a reg based on stackTop value instead of ssValue: 1)
>>>>>
>>>>> =============== Diff against VMMaker.oscog-tpr.1235 ===============
>>>>>
>>>>> Item was changed:
>>>>>
>>>>
>>>>
> deletia
> --
> best,
> Eliot
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20150425/42ff6c30/attachment-0001.htm


More information about the Vm-dev mailing list