[squeak-dev] [BUG] Cannot compile cascade sent to block

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Mon Dec 30 18:36:46 UTC 2019


Hi Christoph,
Because sending a message unstack receiver, and replace it with returned
object. So we need dup for each cascade.

Le lun. 30 déc. 2019 à 19:22, Thiede, Christoph <
Christoph.Thiede at student.hpi.uni-potsdam.de> a écrit :

> > As long as you add a test to capture the issue, one that tests both
> compilation and, if it succeeds, execution, you should add what you see
> fit.  And yes, trunk is the right place, which means inbox until after the
> release.
>
> Alright, done, hope you like it :)
>
> I still don't see why cascadeReceiver is a copy, though ...
>
> ------------------------------
> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
> *Gesendet:* Montag, 30. Dezember 2019 17:51:36
> *An:* The general-purpose Squeak developers list
> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>
> Hi Christoph, Hi Nicolas,
>
> On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <
> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>
> 
>
> Hi Eliot,
>
>
> Of course, a changeset that defers the optimization of ParseNodes until a
> cascade can be precluded would be perfect - also with respect to the
> compiler performance. I would be very interested to read such a changeset!
>
>
> However, if we can't patch this quickly, would you consider my proposal an
> appropriate workaround? Should we maybe include it into Trunk, just for
> 5.3, until we have the perfect solution from Nicolas?
>
>
> As long as you add a test to capture the issue, one that tests both
> compilation and, if it succeeds, execution, you should add what you see
> fit.  And yes, trunk is the right place, which means inbox until after the
> release.
>
> That suggests we switch over to a variation of inbox for the release
> process.  We could provide releasebox or done such (I would call it
> release) and only commit things good for the release.  But in thinking
> about it it’s a bad idea.  It means a lot of work to change ones working
> images from trunk to releasebox and back.  However, if in 5.3 we provided a
> bulk change operation for the Monticello browser I’d consider it.  Has
> anything like this been discussed? (we should change subject to continue...)
>
>
> Best,
>
> Christoph
> ------------------------------
> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
> *Gesendet:* Sonntag, 29. Dezember 2019 21:01:04
> *An:* The general-purpose Squeak developers list
> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>
> Hi Christoph,
>
> On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <
> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>
>> Hi Eliot, thank you very much for the detailed and interesting
>> explanation! I think I got a rough idea of the blockExtent concept;
>> certainly, it will become more clear as I continue to work with it.
>>
>>
>> Concretely to this bug, I suppose that optimized blocks don't need their
>> own blockExtent, because they store all their temps directly on their
>> declaring context, don't they?
>>
>
> Exactly.  They're just basic blocks and jumps, within either proper
> methods or proper blocks.
>
>
>> This would explain the following outputs:
>>
>> [|x| x:=true] whileFalse. "will be optimized"
>> thisContext tempNames. #('x')
>>
>>   compared to:
>>
>> [|x| x:=true] yourself whileFalse. "wont be optimized"
>> thisContext tempNames #()
>>
>>
> Exactly.
>
>
>> So I would say that blockExtent may indeed be nil, *I would at least
>> note this fact in BlockNode >> #blockExtent.*
>>
>
> And feel free to take as much of my explanation, with rewrites, and add it
> to the relevant parts of the bytecode compiler.  I like discursive doc like
> this in class comments, but that implies that a pointer to the class
> comment from a method is the right thing to do (as in "See the class
> comment in ...").
>
>
>> However, #computeCopiedValues: is called by #constructClosureCreationNode:,
>> and a closure should be only created of non-optimized BlockNodes. Equally,
>> the senders of #reindexingLocalsDo:encoder: should be only activated in
>> case of a non-optimized BlockNode ... So better forget the rest of my first
>> changeset, sigh.
>>
>
> Right.  So what maybe happening is that the system is realizing that it
> can't inline the block because it is being used in a cascade.  So the
> attempt to optimize it in the first place should fail.  We should only
> consider for optimization blocks that are sent one of the inline messages,
> *but not in a cascade*.  This is probably why Nicolas expects we should
> have already fixed this bug. Nicolas if I neglected to integrate your fix I
> apologize.  I invite you to commit straight to trunk; maybe ask Christoph
> for review?.
>
> It is interesting that in the example, the receiver BlockNode keeps
>> flagged as optimized (whereas the byte code produces a separate closure):
>>
>> [self confirm: #foo]
>> whileTrue;
>> whileFalse
>>
>>
> Right.  I would guess that the block is marked as being optimizable when
> the whileTrue is processed, but everything goes pear shaped when the
> whileFalse is processed.  What we need to do is forestall the optimization
> caused by processing the whileTrue send iff the whileTrue is part of a
> cascade (which is probably exactly what Nicolas' as-yet-unintegrated
> changes effect).
>
>
>>
>> First, the compiler reads the first message only ("[self confirm: #foo]
>> whileTrue") and optimizes it, and only then finds the second message and
>> discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat,
>> MessageNode >> #receiver:selector:arguments:precedence:from:,
>> #transform:.)
>> But it does not reset the optimized flag of the BlockNode! To me, this
>> appears to be the actual bug!
>>
>
> You have it.
>
>
>> There is #ensureCanCascade: which should specifically solve this problem
>> by resetting the optimized flag. However, during creation of MessageNode,
>> originalReceiver is assigned a copy of the actual receiver. This leads to
>> the problem that while in each temporary MessageNode, the receiver block is
>> deoptimized, the tempvar rcrv in #cascade holds the copied original
>> receiver which never got deoptimized. For what purpose do we copy the
>> receiver into originalReceiver?
>> Did I explain this understandable? :-)
>>
>> *By applying the following single patch, I can fix the problem:*
>>
>> Parser >> #cascade
>> " {; message} => CascadeNode."
>>
>> - | rcvr msgs |
>> + | rcvr cascadeRcvr msgs|
>> parseNode canCascade ifFalse:
>> [^self expected: 'Cascading not'].
>> parseNode ensureCanCascade: encoder.
>> rcvr := parseNode cascadeReceiver.
>> +  cascadeRcvr := rcvr copy.
>> msgs := OrderedCollection with: parseNode.
>> [self match: #semicolon]
>> whileTrue:
>> [parseNode := rcvr.
>> (self messagePart: 3 repeat: false)
>> ifFalse: [^self expected: 'Cascade'].
>> parseNode canCascade ifFalse:
>> [^self expected: '<- No special messages'].
>> parseNode ensureCanCascade: encoder.
>> parseNode cascadeReceiver.
>> msgs addLast: parseNode].
>> - parseNode := CascadeNode new receiver: rcvr messages: msgs
>> +  parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs
>>
>>
>> All relevant tests pass (but I don't know how good their coverage is).
>> Is there any need for respecting possible side-effects on rcvr that
>> might occur while creating the next cascade message
>> (via #messagePart:repeat:)?
>> If not, how would you think about this change? :-)
>>
>> (Sorry for the long text, I did my best to maximize the compression
>> ratio!)
>>
>
> Um, it's a pleasure to read your writing, so don't worry about being
> prolix.  I can hardly complain ;-)
>
>
>
>>
>> Best,
>> Christoph
>>
>> ------------------------------
>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>> *Gesendet:* Sonntag, 29. Dezember 2019 16:59 Uhr
>> *An:* The general-purpose Squeak developers list
>> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>>
>> Hi Christoph,
>>
>> On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <
>> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>
>> 
>>
>> Hi all! :-)
>>
>>
>> *Steps to reproduce:*
>>
>> Do it:
>>
>> [true]
>> whileTrue;
>> whileFalse
>>
>>
>> *Expected behavior:*
>> The script is compiled and executed (and an infinite loop occurs).
>>
>> *Actual behavior:*
>> <pastedImage.png>
>>
>>
>> *Further investigations:*
>> In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode
>> [true] is not computed a blockExtent, because it is optimized.
>> This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
>> A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:,
>> but that's a dirty workaround.
>>
>> *Similar bug:*
>> Try to do:
>>
>> Object newSubclass compile:  'iWontCompile: foo
>> foo.
>> [true]
>> whileTrue;
>> whileFalse'
>>
>> Fails with:
>> <pastedImage.png>
>>
>> Again, the problem is that blockExtent isNil.
>>
>> *Workaround:*
>> Send #yourself to the block before sending the cascade. This prevents the
>> receiver from being optimized.
>>
>> Please note that this problem does not only occur in case of special
>> blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse
>> fails as well.
>>
>>
>> How to fix this appropriately? I'm not sure whether I understand
>> the whole idea of blockExtent so far.
>>
>>
>> blockExtents are the model of blocks that the compiler uses to analyze
>> temporary references so that temporaries are either local or remote.  Local
>> temporaries are temporaries on a Context’s own stack.  Remote temporaries
>> are temporaries stored in an array which is on a Context’s own stack.
>>
>> The local/remote distinction is to ensure that a Context _never_ has to
>> reference another Context’s stack to access an outer temporary.  Why? In
>> the VM Contexts serve as proxies for active stack frames, with most stack
>> frames not having a context unless the programmer accesses it by accessing
>> some other context’s sender. This optimization is key to the VM’s
>> performance. Cog is fast because it does context-to-stack mapping and
>> creates contexts lazily.
>>
>> If the execution model (defined by the bytecode set) for temporary access
>> implements access to outer temporaries via direct access to outer contexts
>> stacks then there are two states an outer context can be in. The common
>> case is that an outer context is associated with a stack frame and the
>> temporaries exist in the stack frame; any slots in the context object are
>> stale. But if the block is long lived and runs after its enclosing context
>> has been returned from then the temporaries will be in the context.  To
>> keep the slots in the vm text up-to-date the stack frame slots must be
>> copied to the context in return, and *before* the stack frame is torn down.
>> Every outer temp access must check what state the outer context is in. This
>> is what Peter Deutsch’s first bytecode set for HPS (objectworks 2.5) did
>> (with the optimization that an outer context would be converted into having
>> a stack frame if it didn’t, so that temporary access only deals with the
>> common case). And the cost and complexity is very high.  (Forgive the deep
>> dive).
>>
>> The alternative, that I introduced in VW 5i, and that has been in Cog
>> from the start is to never have contexts access outer temps directly.  This
>> is done using a standard closure implementation model from lisp.  Any outer
>> temporary which cannot possibly change during the execution of an inner
>> block is copied into the closure and copied onto the block context/frame
>> stack where it is effectively read only (an r-value). These are called
>> “copied values”.
>>
>> Any outer temp which is written to in the block or possibly modified
>> after being closed over (ie assigned to lexically after the block) must be
>> put in an array (an indirection vector). Since the temporary holding
>> the indirection vector itself is not written to after initialization
>> indirection vectors are always copied values.
>>
>> So for example in inject:into: nextValue ends up in an indirection vector
>> but aBlock is a copied value (look at the bytecode for the method; I’m on
>> my phone).  Now the VMs context-to-stack mapping is greatly simplified (no
>> need to intercept returns, no need to check outer temp access) and _much_
>> faster (and less buggy, known from hash experience with HPS).
>>
>> But to do this the bytecode compiler must identify when closed over
>> temporaries can be treated as copied values or put in indirection vectors.
>> Block extents model block nesting and allow the bytecode compiler to check
>> whenever a temporary is read if it is being read in an inner block (ie is
>> being closed over), and check whenever a temporary is being written if it
>> is being written to in an inner block, or, if it has been closed over is
>> being written to after a block that closed over it.  In either of these two
>> cases, written to during the block and that write needing to be visible in
>> the outer context, or written to after being closed over and that write
>> needing to be visible within the block, the temp must live in an
>> indirection vector.
>>
>> Christoph, you *have* to understand this to be able to be effective in
>> working in the execution machinery (eg the runUntil... debugger
>> machinery).  If you do not understand this then the fault is mine
>> (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So
>> take as long as you need and ask yourself if you do or do not understand
>> it.  If you don’t, please ask questions and I’ll try my best to explain
>> things better.  I introduced this machinery and it is my responsibility for
>> it not to be a burden.  This is a case of moving complexity out of the VM
>> into the image, and is justified on performance and reliability grounds.
>> But it is a severe cognitive burden on those using the bytecode compiler
>> where, yo at the Smalltalk level, it seems they accessing outer temporaries
>> directly is simple and all the block extent machinery horribly complex.  I
>> assure you that the block extent machinery is trivial compared to the
>> support required in the vm to implement direct access to outer temps as
>> efficiently as possible ;which is way less efficiently than our current
>> architecture allows).
>>
>> According to the documentation comment, it should be okay to be nil;
>> however, the #blockExtent comment does not mention this fact.
>> Please see the attached changeset for an approach to fix the issue. All
>> tests from KernelTests-Methods and most tests from Test-Compiler pass it,
>> except of the testAllNodePCs* which time out in my fresh image. It would be
>> great to hear if this is the right way to solve the bug, or just extending
>> a workaround :-)
>>
>>
>> I’ll take a look some time today.  But I’d be much happier if you can
>> understand the long winded explanation above and answer your own question.
>> Great bug though :-)
>>
>> Best,
>> Christoph
>> <bugfix compiling block cascades.2.cs>
>>
>>
>>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20191230/a2d5cb91/attachment.html>


More information about the Squeak-dev mailing list