[squeak-dev] [BUG] Cannot compile cascade sent to block

Mon Dec 30 16:51:36 UTC 2019

Hi Christoph, Hi Nicolas,

> On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
> 
> 
> Hi Eliot,
> 
> 
> 
> Of course, a changeset that defers the optimization of ParseNodes until a cascade can be precluded would be perfect - also with respect to the compiler performance. I would be very interested to read such a changeset!
> 
> 
> 
> However, if we can't patch this quickly, would you consider my proposal an appropriate workaround? Should we maybe include it into Trunk, just for 5.3, until we have the perfect solution from Nicolas?
> 

As long as you add a test to capture the issue, one that tests both compilation and, if it succeeds, execution, you should add what you see fit.  And yes, trunk is the right place, which means inbox until after the release.

That suggests we switch over to a variation of inbox for the release process.  We could provide releasebox or done such (I would call it release) and only commit things good for the release.  But in thinking about it it’s a bad idea.  It means a lot of work to change ones working images from trunk to releasebox and back.  However, if in 5.3 we provided a bulk change operation for the Monticello browser I’d consider it.  Has anything like this been discussed? (we should change subject to continue...)
> 
> Best,
> 
> Christoph
> 
> Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
> Gesendet: Sonntag, 29. Dezember 2019 21:01:04
> An: The general-purpose Squeak developers list
> Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>  
> Hi Christoph,
> 
>> On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>> Hi Eliot, thank you very much for the detailed and interesting explanation! I think I got a rough idea of the blockExtent concept; certainly, it will become more clear as I continue to work with it.
>> 
>> 
>> 
>> Concretely to this bug, I suppose that optimized blocks don't need their own blockExtent, because they store all their temps directly on their declaring context, don't they?
>> 
> 
> Exactly.  They're just basic blocks and jumps, within either proper methods or proper blocks.
>  
>> This would explain the following outputs:
>> 
>> [|x| x:=true] whileFalse. "will be optimized"
>> thisContext tempNames. #('x')
>>   compared to:
>> [|x| x:=true] yourself whileFalse. "wont be optimized"
>> thisContext tempNames #()
> 
> Exactly.
>  
>> So I would say that blockExtent may indeed be nil, I would at least note this fact in BlockNode >> #blockExtent.
> 
> And feel free to take as much of my explanation, with rewrites, and add it to the relevant parts of the bytecode compiler.  I like discursive doc like this in class comments, but that implies that a pointer to the class comment from a method is the right thing to do (as in "See the class comment in ...").
>  
>> However, #computeCopiedValues: is called by #constructClosureCreationNode:, and a closure should be only created of non-optimized BlockNodes. Equally, the senders of #reindexingLocalsDo:encoder: should be only activated in case of a non-optimized BlockNode ... So better forget the rest of my first changeset, sigh.
> 
> Right.  So what maybe happening is that the system is realizing that it can't inline the block because it is being used in a cascade.  So the attempt to optimize it in the first place should fail.  We should only consider for optimization blocks that are sent one of the inline messages, *but not in a cascade*.  This is probably why Nicolas expects we should have already fixed this bug. Nicolas if I neglected to integrate your fix I apologize.  I invite you to commit straight to trunk; maybe ask Christoph for review?.
> 
>> It is interesting that in the example, the receiver BlockNode keeps flagged as optimized (whereas the byte code produces a separate closure):
>> 
>> [self confirm: #foo]
>> whileTrue;
>> whileFalse
> 
> Right.  I would guess that the block is marked as being optimizable when the whileTrue is processed, but everything goes pear shaped when the whileFalse is processed.  What we need to do is forestall the optimization caused by processing the whileTrue send iff the whileTrue is part of a cascade (which is probably exactly what Nicolas' as-yet-unintegrated changes effect).
>  
>> 
>> First, the compiler reads the first message only ("[self confirm: #foo] whileTrue") and optimizes it, and only then finds the second message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:, #transform:.)
>> But it does not reset the optimized flag of the BlockNode! To me, this appears to be the actual bug!
> 
> You have it.
>  
>> There is #ensureCanCascade: which should specifically solve this problem by resetting the optimized flag. However, during creation of MessageNode, originalReceiver is assigned a copy of the actual receiver. This leads to the problem that while in each temporary MessageNode, the receiver block is deoptimized, the tempvar rcrv in #cascade holds the copied original receiver which never got deoptimized. For what purpose do we copy the receiver into originalReceiver?
>> Did I explain this understandable? :-)
>> 
>> By applying the following single patch, I can fix the problem:
>> 
>> Parser >> #cascade
>> " {; message} => CascadeNode."
>> 
>> 
>> - | rcvr msgs |
>> 
>> + | rcvr
>> 
>> cascadeRcvr
>> msgs|
>> 
>> 
>> parseNode canCascade ifFalse:
>> [^self expected: 'Cascading not'].
>> parseNode ensureCanCascade: encoder.
>> rcvr := parseNode cascadeReceiver.
>> + 
>> cascadeRcvr := rcvr copy.
>> msgs := OrderedCollection with: parseNode.
>> [self match: #semicolon]
>> whileTrue: 
>> [parseNode := rcvr.
>> (self messagePart: 3 repeat: false)
>> ifFalse: [^self expected: 'Cascade'].
>> parseNode canCascade ifFalse:
>> [^self expected: '<- No special messages'].
>> parseNode ensureCanCascade: encoder.
>> parseNode cascadeReceiver.
>> msgs addLast: parseNode].
>> - parseNode := CascadeNode new receiver: rcvr messages: msgs
>> + 
>> parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs
>> 
>> All relevant tests pass (but I don't know how good their coverage is).
>> Is there any need for respecting possible side-effects on rcvr that might occur while creating the next cascade message (via #messagePart:repeat:)?
>> If not, how would you think about this change? :-)
>> 
>> (Sorry for the long text, I did my best to maximize the compression ratio!)
> 
> Um, it's a pleasure to read your writing, so don't worry about being prolix.  I can hardly complain ;-)
> 
>  
>> 
>> Best,
>> Christoph
>> 
>> Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>> Gesendet: Sonntag, 29. Dezember 2019 16:59 Uhr
>> An: The general-purpose Squeak developers list
>> Betreff: Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>>  
>> Hi Christoph,
>> 
>>> On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>> 
>>> 
>>> Hi all! :-)
>>> 
>>> 
>>> 
>>> Steps to reproduce:
>>> 
>>> Do it:
>>> 
>>> [true]
>>> whileTrue;
>>> whileFalse
>>> 
>>> Expected behavior:
>>> The script is compiled and executed (and an infinite loop occurs).
>>> 
>>> Actual behavior:
>>> <pastedImage.png>
>>> 
>>> 
>>> Further investigations:
>>> In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode [true] is not computed a blockExtent, because it is optimized.
>>> This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
>>> A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:, but that's a dirty workaround.
>>> 
>>> Similar bug:
>>> Try to do:
>>> Object newSubclass compile:  'iWontCompile: foo
>>> foo.
>>> [true]
>>> whileTrue;
>>> whileFalse'
>>> Fails with:
>>> <pastedImage.png>
>>> 
>>> Again, the problem is that blockExtent isNil.
>>> 
>>> Workaround:
>>> Send #yourself to the block before sending the cascade. This prevents the receiver from being optimized.
>>> 
>>> Please note that this problem does not only occur in case of special blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse fails as well.
>>> 
>>> 
>>> How to fix this appropriately? I'm not sure whether I understand the whole idea of blockExtent so far.
>> 
>> blockExtents are the model of blocks that the compiler uses to analyze temporary references so that temporaries are either local or remote.  Local temporaries are temporaries on a Context’s own stack.  Remote temporaries are temporaries stored in an array which is on a Context’s own stack.
>> 
>> The local/remote distinction is to ensure that a Context _never_ has to reference another Context’s stack to access an outer temporary.  Why? In the VM Contexts serve as proxies for active stack frames, with most stack frames not having a context unless the programmer accesses it by accessing some other context’s sender. This optimization is key to the VM’s performance. Cog is fast because it does context-to-stack mapping and creates contexts lazily.
>> 
>> If the execution model (defined by the bytecode set) for temporary access implements access to outer temporaries via direct access to outer contexts stacks then there are two states an outer context can be in. The common case is that an outer context is associated with a stack frame and the temporaries exist in the stack frame; any slots in the context object are stale. But if the block is long lived and runs after its enclosing context has been returned from then the temporaries will be in the context.  To keep the slots in the vm text up-to-date the stack frame slots must be copied to the context in return, and *before* the stack frame is torn down. Every outer temp access must check what state the outer context is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did (with the optimization that an outer context would be converted into having a stack frame if it didn’t, so that temporary access only deals with the common case). And the cost and complexity is very high.  (Forgive the deep dive).
>> 
>> The alternative, that I introduced in VW 5i, and that has been in Cog from the start is to never have contexts access outer temps directly.  This is done using a standard closure implementation model from lisp.  Any outer temporary which cannot possibly change during the execution of an inner block is copied into the closure and copied onto the block context/frame stack where it is effectively read only (an r-value). These are called “copied values”. 
>> 
>> Any outer temp which is written to in the block or possibly modified after being closed over (ie assigned to lexically after the block) must be put in an array (an indirection vector). Since the temporary holding the indirection vector itself is not written to after initialization indirection vectors are always copied values.
>> 
>> So for example in inject:into: nextValue ends up in an indirection vector but aBlock is a copied value (look at the bytecode for the method; I’m on my phone).  Now the VMs context-to-stack mapping is greatly simplified (no need to intercept returns, no need to check outer temp access) and _much_ faster (and less buggy, known from hash experience with HPS).
>> 
>> But to do this the bytecode compiler must identify when closed over temporaries can be treated as copied values or put in indirection vectors.  Block extents model block nesting and allow the bytecode compiler to check whenever a temporary is read if it is being read in an inner block (ie is being closed over), and check whenever a temporary is being written if it is being written to in an inner block, or, if it has been closed over is being written to after a block that closed over it.  In either of these two cases, written to during the block and that write needing to be visible in the outer context, or written to after being closed over and that write needing to be visible within the block, the temp must live in an indirection vector.
>> 
>> Christoph, you *have* to understand this to be able to be effective in working in the execution machinery (eg the runUntil... debugger machinery).  If you do not understand this then the fault is mine (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So take as long as you need and ask yourself if you do or do not understand it.  If you don’t, please ask questions and I’ll try my best to explain things better.  I introduced this machinery and it is my responsibility for it not to be a burden.  This is a case of moving complexity out of the VM into the image, and is justified on performance and reliability grounds.  But it is a severe cognitive burden on those using the bytecode compiler where, yo at the Smalltalk level, it seems they accessing outer temporaries directly is simple and all the block extent machinery horribly complex.  I assure you that the block extent machinery is trivial compared to the support required in the vm to implement direct access to outer temps as efficiently as possible ;which is way less efficiently than our current architecture allows).
>> 
>>> According to the documentation comment, it should be okay to be nil; however, the #blockExtent comment does not mention this fact.
>>> Please see the attached changeset for an approach to fix the issue. All tests from KernelTests-Methods and most tests from Test-Compiler pass it, except of the testAllNodePCs* which time out in my fresh image. It would be great to hear if this is the right way to solve the bug, or just extending a workaround :-)
>> 
>> I’ll take a look some time today.  But I’d be much happier if you can understand the long winded explanation above and answer your own question.  Great bug though :-)
>> 
>>> Best,
>>> Christoph
>>> <bugfix compiling block cascades.2.cs>
>>> 
>> 
> 
> 
> -- 
> _,,,^..^,,,_
> best, Eliot
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20191230/764c837b/attachment-0001.html>