[Vm-dev] VM Maker: VMMaker.oscog-cb.1236.mcz

Eliot Miranda eliot.miranda at gmail.com
Sat Apr 25 16:09:32 UTC 2015


Hi Clément, Hi All,

On Apr 23, 2015, at 9:08 AM, Eliot Miranda <eliot.miranda at gmail.com> wrote:

> 
> 
> On Thu, Apr 23, 2015 at 3:46 AM, Clément Bera <bera.clement at gmail.com> wrote:
>>  
>> 
>> 
>> 2015-04-23 1:20 GMT+02:00 Eliot Miranda <eliot.miranda at gmail.com>:
>>>  
>>> Hi Clément,
>>> 
>>> 
>>> On Wed, Apr 22, 2015 at 2:07 AM, Clément Bera <bera.clement at gmail.com> wrote:
>>>>  
>>>> Eliot here's a good example to stress the register allocation:
>>>> 
>>>> Integer>>#regStress
>>>> 	| t t2 |
>>>> 	t := self yourself.
>>>> 	t2 := self + 1.
>>>> 	^ { t == t2 .   t == t2 .  t == t2 .  t == t2 .  t == t2 .  t == t2 .  t == t2 }
>>>> 
>>>> I think the resulting machine code method is beautiful, it needs to spill only when it runs out of registers :-). Of course it makes sense mainly when you use inlined bytecodes instead of only #==.
>>>> 
>>>> Some extra register moves are done because the JIT does remember if a temporary value is currently in a register (It moves each time the temp t to the same register whereas the register value is not changed). Maybe we should add a feature that remembers if temporaries are currently in a register, and if so, when doing push temp, only push the register directly in the simStack, and in case of temporary store or stack flush the register associated with a temp is not valid anymore somehow...
>>> 
>>> Here's a sketch of something that looks to me like it would work.
>>> 
>>> A CogSimStackEntry for a temp var is of type SSBaseOffset.  Its register field is used to hold the frame pointer.  We want to mark it as having its value in a register, so it needs a new field, lets call it allocatedRegOrNil.  At start of compilation all SSBaseOffset entries have allocatedRegOrNil nil.
>>> 
>>> Whenever popToReg: finds it is popping an SSBaseOffset entry it sets that entry's allocatedRegOrNil to the register.
>>> Whenever a register is spilled (in ssFlushTo:) the sim stack is also scanned looking for all SSBaseOffset entries whose allocatedRegOrNil equals the register, and simply sets allocatedRegOrNil back to nil.
>>> 
>>> On merge, with the current representation we merely set all allocatedRegOrNil fields back to nil.  But with the more sophisticated stack copy merge we can preserve allocation for entries whose registers match.
>>> 
>>> There are perhaps tricky details in merge (the same register used for different temporaries in different branches, etc) but otherwise it is very simple, no?
>> 
>> Yeah something like that would be nice. There are details such as liveRegisters should include those registers. Maybe this logic could be somehow merged with the one of ReceiverResultReg which has currently its own live status and may not need so.
> 
> Yes that's a good idea.  One would just do popToRequiredReg: ResultReceiverReg on the self stack entry and if ResultReceiverReg currently held the receiver it would be a no-op.  Nice.

In thinking about this some more a key issue is that the linear scan register allocator we have is greedy.  That is, it allocates registers when parameters are first mentioned, an outside-in order, instead of when parameters are used, an inside-out order.  I was thinking about the case if nested loops where one wants to allocate registers to the innermost variables not the outermost, since the innermost change more frequently and hence one ends up with fewer memory writes if the innermost variables are allocated in registers.

So if the allocator as described above was to allocate temps in registers we might end up with a situation where we allocated a register for an "outer" temp (a less frequently changing temp) and later reallocated it for an "inner" temp.  This is not a problem except if we unnecessarily load the register and write it back unnecessarily to the temp.

I think this is easy to avoid by adding a "has been read" flag to the temp.  Unless a stack entry for the temp-with-allocated-register has actually been popped into a register (& hence potentially had its value changed) there's no need to write it back if we reallocate the register.  If we reallocate the register and its temp hasn't been read merely scan the simstack and delete the allocated register from all occurrences of the temp, and when it is accessed it'll be accessed as a normal temp.

So a "has been read" flag would allow the allocator to reallocate for innermost temps without having to pay for mistaken greedy allocations earlier-in.

At least that's what I thought yesterday in the car while driving.  It might be completely half-baked but I wanted to write down the idea just in case it held water.


>> Btw I changed #genStorePop: popBoolean LiteralVariable: litVarIndex so it uses register allocation instead of ReceiverResultReg and ClassReg. There was a flag saying that it could be used in frameless methods if register allocation was used. However I wonder, can you have a ceStoreCheck in a frameless method ? I could work around the need for ReceiverResultReg in the trampoline by adding extra register moves but I am not sure a trampoline can work fine in frameless method.
> 
> Yes, trampolines can work in any state.  They're orthogonal to framelessness.  Trampolines call into the C run-time.  To do so they must switch stack, saving the hardware stack pointers into stackPointer and framePointer. loading the hardware stack pointers with the C stack, making the call, and on return, switching back to the Smalltalk stack.  But it doesn't matter what state Smalltalk is in; the stack pointers referring to the caller method or the callee method makes no difference to the trampoline, only to the code the trampoline calls.  With something like the store check, the stack is unexamined; all that happens is that the argument gets added to the remembered table.
> 
> Framelessness occurs in three different forms:
> 
> - One is a method that doesn't contain any sends other than the special selectors #== and #class, doesn't access temporaries beyond the arguments.  So for example Interval>>setFrom:to:by:, which contains three inst var assignments and three store checks. 
> 
> - Another is a frameless block.  This is the same as a frameless method except that block activation implies that the receiver and register arguments will have been pushed, so the stack layout is a little different.
> 
> - the final case is a method with a primitive.  The code up until a primitive fails is frameless.  If a primitive is implemented in machine code then the stack and register arguments are undisturbed; the primitive will access its operands from the registers if that's where they are.  If a primitive is implemented in C (an interpreter primitive), or the machine code primitive calls the interpreter primitive when it fails (which allows the machine code to handle the simple common cases, falling back on slower more comprehensive C code), then, like a frameless block, receiver and arguments are pushed on the stack, because interpreter primitives can only get their operands from the stack via stackPointer.
> 
> The above implies that the ceStoreCheck trampoline may be called from any of the above frameless forms, as well as from a method with a full frame.
> 
> So if you've broken frameless inst var store pop, please fix it ;-).  But I doubt you have because the code is explicit:
> 
> genStorePop: popBoolean ReceiverVariable: slotIndex
> 	<inline: false>
> 	| topReg valueReg constVal |
> 	needsFrame ifFalse:
> 		[^self genFramelessStorePop: popBoolean ReceiverVariable: slotIndex].
> 	self ssFlushUpThroughReceiverVariable: slotIndex.
> 	"Avoid store check for immediate values"
> ...
> 
> so the body of genStorePop:ReceiverVariable: does not have to deal with framelessness.  Remember that in a frameless method (and IIRC in a frameless block) ReceiverResultReg /always/ contains self.
> 
> The limitation of the trampolines is that they expect specific register arguments; each one expects its own particular sequence of register arguments that the code generator must use to call them.  But that doesn't mean you couldn't for example implement ceStoreCheckReceiverRegTrampoline, ceStoreCheckClassRegTrampoline etc, etc, provided you can handle the complexity.
> 
> 
>> 
>>> 
>>> 
>>>> 2015-04-22 10:08 GMT+02:00 <commits at source.squeak.org>:
>>>>> 
>>>>> Eliot Miranda uploaded a new version of VMMaker to project VM Maker:
>>>>> http://source.squeak.org/VMMaker/VMMaker.oscog-cb.1236.mcz
>>>>> 
>>>>> ==================== Summary ====================
>>>>> 
>>>>> Name: VMMaker.oscog-cb.1236
>>>>> Author: cb
>>>>> Time: 22 April 2015, 10:07:41.028 am
>>>>> UUID: cf4af270-9dfa-4f2a-b84c-0cdb3c5f4913
>>>>> Ancestors: VMMaker.oscog-tpr.1235
>>>>> 
>>>>> changed the names of register allocation methods for more explict names, for instance, allocateOneReg -> allocateRegForStackTop
>>>>> 
>>>>> Fixed a bug where not the best register was allocated in #== (was allocating a reg based on stackTop value instead of ssValue: 1)
>>>>> 
>>>>> =============== Diff against VMMaker.oscog-tpr.1235 ===============
>>>>> 
>>>>> Item was changed:
>  
> deletia
> -- 
> best,
> Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20150425/66a6d361/attachment.htm


More information about the Vm-dev mailing list