[squeak-dev] [BUG] Cannot compile cascade sent to block

Mon Dec 30 23:56:45 UTC 2019

Oups, I probably missunderstood the question.
You mean why you need to copy the cascadeReceiver...

Well, it's simple: the rcvr is shared by all cascade message, because we do:

    parseNode := rcvr.
    (self messagePart: 3 repeat: false)

So [true] whileTrue is optimized.
Then in #cascade, after #ensureCanCascade: it is #deoptimize(d).
That's when we get the deoptimized rcvr := parseNode cascadeReceiver.

Then in second message [true] whileFalse, it gets optimized again.
The receiver of second message is deoptimized again, but it's the
originalReceiver that gets deoptimized (a copy) not the receiver!
That's a change Bert made to #deoptimize in 2015 for some reason (I guess
for some transform, the receiver was changed).

So your rcvr copy is preserved from re-optimization.

Le lun. 30 déc. 2019 à 19:36, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> a écrit :

> Hi Christoph,
> Because sending a message unstack receiver, and replace it with returned
> object. So we need dup for each cascade.
>
> Le lun. 30 déc. 2019 à 19:22, Thiede, Christoph <
> Christoph.Thiede at student.hpi.uni-potsdam.de> a écrit :
>
>> > As long as you add a test to capture the issue, one that tests both
>> compilation and, if it succeeds, execution, you should add what you see
>> fit.  And yes, trunk is the right place, which means inbox until after the
>> release.
>>
>> Alright, done, hope you like it :)
>>
>> I still don't see why cascadeReceiver is a copy, though ...
>>
>> ------------------------------
>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>> *Gesendet:* Montag, 30. Dezember 2019 17:51:36
>> *An:* The general-purpose Squeak developers list
>> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>>
>> Hi Christoph, Hi Nicolas,
>>
>> On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <
>> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>
>> 
>>
>> Hi Eliot,
>>
>>
>> Of course, a changeset that defers the optimization of ParseNodes until a
>> cascade can be precluded would be perfect - also with respect to the
>> compiler performance. I would be very interested to read such a changeset!
>>
>>
>> However, if we can't patch this quickly, would you consider my proposal
>> an appropriate workaround? Should we maybe include it into Trunk, just for
>> 5.3, until we have the perfect solution from Nicolas?
>>
>>
>> As long as you add a test to capture the issue, one that tests both
>> compilation and, if it succeeds, execution, you should add what you see
>> fit.  And yes, trunk is the right place, which means inbox until after the
>> release.
>>
>> That suggests we switch over to a variation of inbox for the release
>> process.  We could provide releasebox or done such (I would call it
>> release) and only commit things good for the release.  But in thinking
>> about it it’s a bad idea.  It means a lot of work to change ones working
>> images from trunk to releasebox and back.  However, if in 5.3 we provided a
>> bulk change operation for the Monticello browser I’d consider it.  Has
>> anything like this been discussed? (we should change subject to continue...)
>>
>>
>> Best,
>>
>> Christoph
>> ------------------------------
>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>> *Gesendet:* Sonntag, 29. Dezember 2019 21:01:04
>> *An:* The general-purpose Squeak developers list
>> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>>
>> Hi Christoph,
>>
>> On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <
>> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>
>>> Hi Eliot, thank you very much for the detailed and interesting
>>> explanation! I think I got a rough idea of the blockExtent concept;
>>> certainly, it will become more clear as I continue to work with it.
>>>
>>>
>>> Concretely to this bug, I suppose that optimized blocks don't need their
>>> own blockExtent, because they store all their temps directly on their
>>> declaring context, don't they?
>>>
>>
>> Exactly.  They're just basic blocks and jumps, within either proper
>> methods or proper blocks.
>>
>>
>>> This would explain the following outputs:
>>>
>>> [|x| x:=true] whileFalse. "will be optimized"
>>> thisContext tempNames. #('x')
>>>
>>>   compared to:
>>>
>>> [|x| x:=true] yourself whileFalse. "wont be optimized"
>>> thisContext tempNames #()
>>>
>>>
>> Exactly.
>>
>>
>>> So I would say that blockExtent may indeed be nil, *I would at least
>>> note this fact in BlockNode >> #blockExtent.*
>>>
>>
>> And feel free to take as much of my explanation, with rewrites, and add
>> it to the relevant parts of the bytecode compiler.  I like discursive doc
>> like this in class comments, but that implies that a pointer to the class
>> comment from a method is the right thing to do (as in "See the class
>> comment in ...").
>>
>>
>>> However, #computeCopiedValues: is called by #constructClosureCreationNode:,
>>> and a closure should be only created of non-optimized BlockNodes. Equally,
>>> the senders of #reindexingLocalsDo:encoder: should be only activated in
>>> case of a non-optimized BlockNode ... So better forget the rest of my first
>>> changeset, sigh.
>>>
>>
>> Right.  So what maybe happening is that the system is realizing that it
>> can't inline the block because it is being used in a cascade.  So the
>> attempt to optimize it in the first place should fail.  We should only
>> consider for optimization blocks that are sent one of the inline messages,
>> *but not in a cascade*.  This is probably why Nicolas expects we should
>> have already fixed this bug. Nicolas if I neglected to integrate your fix I
>> apologize.  I invite you to commit straight to trunk; maybe ask Christoph
>> for review?.
>>
>> It is interesting that in the example, the receiver BlockNode keeps
>>> flagged as optimized (whereas the byte code produces a separate closure):
>>>
>>> [self confirm: #foo]
>>> whileTrue;
>>> whileFalse
>>>
>>>
>> Right.  I would guess that the block is marked as being optimizable when
>> the whileTrue is processed, but everything goes pear shaped when the
>> whileFalse is processed.  What we need to do is forestall the optimization
>> caused by processing the whileTrue send iff the whileTrue is part of a
>> cascade (which is probably exactly what Nicolas' as-yet-unintegrated
>> changes effect).
>>
>>
>>>
>>> First, the compiler reads the first message only ("[self confirm: #foo]
>>> whileTrue") and optimizes it, and only then finds the second message
>>> and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat,
>>> MessageNode >> #receiver:selector:arguments:precedence:from:,
>>> #transform:.)
>>> But it does not reset the optimized flag of the BlockNode! To me, this
>>> appears to be the actual bug!
>>>
>>
>> You have it.
>>
>>
>>> There is #ensureCanCascade: which should specifically solve this problem
>>> by resetting the optimized flag. However, during creation of MessageNode,
>>> originalReceiver is assigned a copy of the actual receiver. This leads to
>>> the problem that while in each temporary MessageNode, the receiver block is
>>> deoptimized, the tempvar rcrv in #cascade holds the copied original
>>> receiver which never got deoptimized. For what purpose do we copy the
>>> receiver into originalReceiver?
>>> Did I explain this understandable? :-)
>>>
>>> *By applying the following single patch, I can fix the problem:*
>>>
>>> Parser >> #cascade
>>> " {; message} => CascadeNode."
>>>
>>> - | rcvr msgs |
>>> + | rcvr cascadeRcvr msgs|
>>> parseNode canCascade ifFalse:
>>> [^self expected: 'Cascading not'].
>>> parseNode ensureCanCascade: encoder.
>>> rcvr := parseNode cascadeReceiver.
>>> +  cascadeRcvr := rcvr copy.
>>> msgs := OrderedCollection with: parseNode.
>>> [self match: #semicolon]
>>> whileTrue:
>>> [parseNode := rcvr.
>>> (self messagePart: 3 repeat: false)
>>> ifFalse: [^self expected: 'Cascade'].
>>> parseNode canCascade ifFalse:
>>> [^self expected: '<- No special messages'].
>>> parseNode ensureCanCascade: encoder.
>>> parseNode cascadeReceiver.
>>> msgs addLast: parseNode].
>>> - parseNode := CascadeNode new receiver: rcvr messages: msgs
>>> +  parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs
>>>
>>>
>>> All relevant tests pass (but I don't know how good their coverage is).
>>> Is there any need for respecting possible side-effects on rcvr that
>>> might occur while creating the next cascade message
>>> (via #messagePart:repeat:)?
>>> If not, how would you think about this change? :-)
>>>
>>> (Sorry for the long text, I did my best to maximize the compression
>>> ratio!)
>>>
>>
>> Um, it's a pleasure to read your writing, so don't worry about being
>> prolix.  I can hardly complain ;-)
>>
>>
>>
>>>
>>> Best,
>>> Christoph
>>>
>>> ------------------------------
>>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>>> *Gesendet:* Sonntag, 29. Dezember 2019 16:59 Uhr
>>> *An:* The general-purpose Squeak developers list
>>> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>>>
>>> Hi Christoph,
>>>
>>> On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <
>>> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>>
>>> 
>>>
>>> Hi all! :-)
>>>
>>>
>>> *Steps to reproduce:*
>>>
>>> Do it:
>>>
>>> [true]
>>> whileTrue;
>>> whileFalse
>>>
>>>
>>> *Expected behavior:*
>>> The script is compiled and executed (and an infinite loop occurs).
>>>
>>> *Actual behavior:*
>>> <pastedImage.png>
>>>
>>>
>>> *Further investigations:*
>>> In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode
>>> [true] is not computed a blockExtent, because it is optimized.
>>> This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
>>> A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:,
>>> but that's a dirty workaround.
>>>
>>> *Similar bug:*
>>> Try to do:
>>>
>>> Object newSubclass compile:  'iWontCompile: foo
>>> foo.
>>> [true]
>>> whileTrue;
>>> whileFalse'
>>>
>>> Fails with:
>>> <pastedImage.png>
>>>
>>> Again, the problem is that blockExtent isNil.
>>>
>>> *Workaround:*
>>> Send #yourself to the block before sending the cascade. This prevents
>>> the receiver from being optimized.
>>>
>>> Please note that this problem does not only occur in case of special
>>> blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse
>>> fails as well.
>>>
>>>
>>> How to fix this appropriately? I'm not sure whether I understand
>>> the whole idea of blockExtent so far.
>>>
>>>
>>> blockExtents are the model of blocks that the compiler uses to analyze
>>> temporary references so that temporaries are either local or remote.  Local
>>> temporaries are temporaries on a Context’s own stack.  Remote temporaries
>>> are temporaries stored in an array which is on a Context’s own stack.
>>>
>>> The local/remote distinction is to ensure that a Context _never_ has to
>>> reference another Context’s stack to access an outer temporary.  Why? In
>>> the VM Contexts serve as proxies for active stack frames, with most stack
>>> frames not having a context unless the programmer accesses it by accessing
>>> some other context’s sender. This optimization is key to the VM’s
>>> performance. Cog is fast because it does context-to-stack mapping and
>>> creates contexts lazily.
>>>
>>> If the execution model (defined by the bytecode set) for temporary
>>> access implements access to outer temporaries via direct access to outer
>>> contexts stacks then there are two states an outer context can be in. The
>>> common case is that an outer context is associated with a stack frame and
>>> the temporaries exist in the stack frame; any slots in the context object
>>> are stale. But if the block is long lived and runs after its enclosing
>>> context has been returned from then the temporaries will be in the
>>> context.  To keep the slots in the vm text up-to-date the stack frame slots
>>> must be copied to the context in return, and *before* the stack frame is
>>> torn down. Every outer temp access must check what state the outer context
>>> is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks
>>> 2.5) did (with the optimization that an outer context would be converted
>>> into having a stack frame if it didn’t, so that temporary access only deals
>>> with the common case). And the cost and complexity is very high.  (Forgive
>>> the deep dive).
>>>
>>> The alternative, that I introduced in VW 5i, and that has been in Cog
>>> from the start is to never have contexts access outer temps directly.  This
>>> is done using a standard closure implementation model from lisp.  Any outer
>>> temporary which cannot possibly change during the execution of an inner
>>> block is copied into the closure and copied onto the block context/frame
>>> stack where it is effectively read only (an r-value). These are called
>>> “copied values”.
>>>
>>> Any outer temp which is written to in the block or possibly modified
>>> after being closed over (ie assigned to lexically after the block) must be
>>> put in an array (an indirection vector). Since the temporary holding
>>> the indirection vector itself is not written to after initialization
>>> indirection vectors are always copied values.
>>>
>>> So for example in inject:into: nextValue ends up in an indirection
>>> vector but aBlock is a copied value (look at the bytecode for the method;
>>> I’m on my phone).  Now the VMs context-to-stack mapping is greatly
>>> simplified (no need to intercept returns, no need to check outer temp
>>> access) and _much_ faster (and less buggy, known from hash experience with
>>> HPS).
>>>
>>> But to do this the bytecode compiler must identify when closed over
>>> temporaries can be treated as copied values or put in indirection vectors.
>>> Block extents model block nesting and allow the bytecode compiler to check
>>> whenever a temporary is read if it is being read in an inner block (ie is
>>> being closed over), and check whenever a temporary is being written if it
>>> is being written to in an inner block, or, if it has been closed over is
>>> being written to after a block that closed over it.  In either of these two
>>> cases, written to during the block and that write needing to be visible in
>>> the outer context, or written to after being closed over and that write
>>> needing to be visible within the block, the temp must live in an
>>> indirection vector.
>>>
>>> Christoph, you *have* to understand this to be able to be effective in
>>> working in the execution machinery (eg the runUntil... debugger
>>> machinery).  If you do not understand this then the fault is mine
>>> (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So
>>> take as long as you need and ask yourself if you do or do not understand
>>> it.  If you don’t, please ask questions and I’ll try my best to explain
>>> things better.  I introduced this machinery and it is my responsibility for
>>> it not to be a burden.  This is a case of moving complexity out of the VM
>>> into the image, and is justified on performance and reliability grounds.
>>> But it is a severe cognitive burden on those using the bytecode compiler
>>> where, yo at the Smalltalk level, it seems they accessing outer temporaries
>>> directly is simple and all the block extent machinery horribly complex.  I
>>> assure you that the block extent machinery is trivial compared to the
>>> support required in the vm to implement direct access to outer temps as
>>> efficiently as possible ;which is way less efficiently than our current
>>> architecture allows).
>>>
>>> According to the documentation comment, it should be okay to be nil;
>>> however, the #blockExtent comment does not mention this fact.
>>> Please see the attached changeset for an approach to fix the issue. All
>>> tests from KernelTests-Methods and most tests from Test-Compiler pass it,
>>> except of the testAllNodePCs* which time out in my fresh image. It would be
>>> great to hear if this is the right way to solve the bug, or just extending
>>> a workaround :-)
>>>
>>>
>>> I’ll take a look some time today.  But I’d be much happier if you can
>>> understand the long winded explanation above and answer your own question.
>>> Great bug though :-)
>>>
>>> Best,
>>> Christoph
>>> <bugfix compiling block cascades.2.cs>
>>>
>>>
>>>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20191231/65ab346e/attachment-0001.html>