[squeak-dev] [BUG] Cannot compile cascade sent to block

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue Dec 31 00:04:05 UTC 2019


Ah, yes, here is why:

Name: Compiler-bf.293
Author: bf
Time: 19 February 2015, 6:22:49.349 pm
UUID: 737af1fb-e8f0-4653-851b-0c1b3ed0f65f
Ancestors: Compiler-topa.292

Fix deoptimization of ifNil: etc.

--------

Forgot to say, your changes in Compiler-ct.414 are correct.
But there are other typos in BlockNode comments, I see at least 2:

cose over -> close over
throguh -> through

Le mar. 31 déc. 2019 à 00:56, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> a écrit :

> Oups, I probably missunderstood the question.
> You mean why you need to copy the cascadeReceiver...
>
> Well, it's simple: the rcvr is shared by all cascade message, because we
> do:
>
>     parseNode := rcvr.
>     (self messagePart: 3 repeat: false)
>
> So [true] whileTrue is optimized.
> Then in #cascade, after #ensureCanCascade: it is #deoptimize(d).
> That's when we get the deoptimized rcvr := parseNode cascadeReceiver.
>
> Then in second message [true] whileFalse, it gets optimized again.
> The receiver of second message is deoptimized again, but it's the
> originalReceiver that gets deoptimized (a copy) not the receiver!
> That's a change Bert made to #deoptimize in 2015 for some reason (I guess
> for some transform, the receiver was changed).
>
> So your rcvr copy is preserved from re-optimization.
>
> Le lun. 30 déc. 2019 à 19:36, Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> a écrit :
>
>> Hi Christoph,
>> Because sending a message unstack receiver, and replace it with returned
>> object. So we need dup for each cascade.
>>
>> Le lun. 30 déc. 2019 à 19:22, Thiede, Christoph <
>> Christoph.Thiede at student.hpi.uni-potsdam.de> a écrit :
>>
>>> > As long as you add a test to capture the issue, one that tests both
>>> compilation and, if it succeeds, execution, you should add what you see
>>> fit.  And yes, trunk is the right place, which means inbox until after the
>>> release.
>>>
>>> Alright, done, hope you like it :)
>>>
>>> I still don't see why cascadeReceiver is a copy, though ...
>>>
>>> ------------------------------
>>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>>> *Gesendet:* Montag, 30. Dezember 2019 17:51:36
>>> *An:* The general-purpose Squeak developers list
>>> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>>>
>>> Hi Christoph, Hi Nicolas,
>>>
>>> On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <
>>> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>>
>>> 
>>>
>>> Hi Eliot,
>>>
>>>
>>> Of course, a changeset that defers the optimization of ParseNodes until
>>> a cascade can be precluded would be perfect - also with respect to the
>>> compiler performance. I would be very interested to read such a changeset!
>>>
>>>
>>> However, if we can't patch this quickly, would you consider my proposal
>>> an appropriate workaround? Should we maybe include it into Trunk, just for
>>> 5.3, until we have the perfect solution from Nicolas?
>>>
>>>
>>> As long as you add a test to capture the issue, one that tests both
>>> compilation and, if it succeeds, execution, you should add what you see
>>> fit.  And yes, trunk is the right place, which means inbox until after the
>>> release.
>>>
>>> That suggests we switch over to a variation of inbox for the release
>>> process.  We could provide releasebox or done such (I would call it
>>> release) and only commit things good for the release.  But in thinking
>>> about it it’s a bad idea.  It means a lot of work to change ones working
>>> images from trunk to releasebox and back.  However, if in 5.3 we provided a
>>> bulk change operation for the Monticello browser I’d consider it.  Has
>>> anything like this been discussed? (we should change subject to continue...)
>>>
>>>
>>> Best,
>>>
>>> Christoph
>>> ------------------------------
>>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>>> *Gesendet:* Sonntag, 29. Dezember 2019 21:01:04
>>> *An:* The general-purpose Squeak developers list
>>> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>>>
>>> Hi Christoph,
>>>
>>> On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <
>>> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>>
>>>> Hi Eliot, thank you very much for the detailed and interesting
>>>> explanation! I think I got a rough idea of the blockExtent concept;
>>>> certainly, it will become more clear as I continue to work with it.
>>>>
>>>>
>>>> Concretely to this bug, I suppose that optimized blocks don't need
>>>> their own blockExtent, because they store all their temps directly on their
>>>> declaring context, don't they?
>>>>
>>>
>>> Exactly.  They're just basic blocks and jumps, within either proper
>>> methods or proper blocks.
>>>
>>>
>>>> This would explain the following outputs:
>>>>
>>>> [|x| x:=true] whileFalse. "will be optimized"
>>>> thisContext tempNames. #('x')
>>>>
>>>>   compared to:
>>>>
>>>> [|x| x:=true] yourself whileFalse. "wont be optimized"
>>>> thisContext tempNames #()
>>>>
>>>>
>>> Exactly.
>>>
>>>
>>>> So I would say that blockExtent may indeed be nil, *I would at least
>>>> note this fact in BlockNode >> #blockExtent.*
>>>>
>>>
>>> And feel free to take as much of my explanation, with rewrites, and add
>>> it to the relevant parts of the bytecode compiler.  I like discursive doc
>>> like this in class comments, but that implies that a pointer to the class
>>> comment from a method is the right thing to do (as in "See the class
>>> comment in ...").
>>>
>>>
>>>> However, #computeCopiedValues: is called by #constructClosureCreationNode:,
>>>> and a closure should be only created of non-optimized BlockNodes. Equally,
>>>> the senders of #reindexingLocalsDo:encoder: should be only activated in
>>>> case of a non-optimized BlockNode ... So better forget the rest of my first
>>>> changeset, sigh.
>>>>
>>>
>>> Right.  So what maybe happening is that the system is realizing that it
>>> can't inline the block because it is being used in a cascade.  So the
>>> attempt to optimize it in the first place should fail.  We should only
>>> consider for optimization blocks that are sent one of the inline messages,
>>> *but not in a cascade*.  This is probably why Nicolas expects we should
>>> have already fixed this bug. Nicolas if I neglected to integrate your fix I
>>> apologize.  I invite you to commit straight to trunk; maybe ask Christoph
>>> for review?.
>>>
>>> It is interesting that in the example, the receiver BlockNode keeps
>>>> flagged as optimized (whereas the byte code produces a separate closure):
>>>>
>>>> [self confirm: #foo]
>>>> whileTrue;
>>>> whileFalse
>>>>
>>>>
>>> Right.  I would guess that the block is marked as being optimizable when
>>> the whileTrue is processed, but everything goes pear shaped when the
>>> whileFalse is processed.  What we need to do is forestall the optimization
>>> caused by processing the whileTrue send iff the whileTrue is part of a
>>> cascade (which is probably exactly what Nicolas' as-yet-unintegrated
>>> changes effect).
>>>
>>>
>>>>
>>>> First, the compiler reads the first message only ("[self confirm:
>>>> #foo] whileTrue") and optimizes it, and only then finds the second
>>>> message and discards the optimized bytecode. (See Parser >> #cascade, #messagePart:repeat,
>>>> MessageNode >> #receiver:selector:arguments:precedence:from:,
>>>> #transform:.)
>>>> But it does not reset the optimized flag of the BlockNode! To me, this
>>>> appears to be the actual bug!
>>>>
>>>
>>> You have it.
>>>
>>>
>>>> There is #ensureCanCascade: which should specifically solve this
>>>> problem by resetting the optimized flag. However, during creation of
>>>> MessageNode, originalReceiver is assigned a copy of the actual receiver.
>>>> This leads to the problem that while in each temporary MessageNode, the
>>>> receiver block is deoptimized, the tempvar rcrv in #cascade holds the
>>>> copied original receiver which never got deoptimized. For what purpose do
>>>> we copy the receiver into originalReceiver?
>>>> Did I explain this understandable? :-)
>>>>
>>>> *By applying the following single patch, I can fix the problem:*
>>>>
>>>> Parser >> #cascade
>>>> " {; message} => CascadeNode."
>>>>
>>>> - | rcvr msgs |
>>>> + | rcvr cascadeRcvr msgs|
>>>> parseNode canCascade ifFalse:
>>>> [^self expected: 'Cascading not'].
>>>> parseNode ensureCanCascade: encoder.
>>>> rcvr := parseNode cascadeReceiver.
>>>> +  cascadeRcvr := rcvr copy.
>>>> msgs := OrderedCollection with: parseNode.
>>>> [self match: #semicolon]
>>>> whileTrue:
>>>> [parseNode := rcvr.
>>>> (self messagePart: 3 repeat: false)
>>>> ifFalse: [^self expected: 'Cascade'].
>>>> parseNode canCascade ifFalse:
>>>> [^self expected: '<- No special messages'].
>>>> parseNode ensureCanCascade: encoder.
>>>> parseNode cascadeReceiver.
>>>> msgs addLast: parseNode].
>>>> - parseNode := CascadeNode new receiver: rcvr messages: msgs
>>>> +  parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs
>>>>
>>>>
>>>> All relevant tests pass (but I don't know how good their coverage is).
>>>> Is there any need for respecting possible side-effects on rcvr that
>>>> might occur while creating the next cascade message
>>>> (via #messagePart:repeat:)?
>>>> If not, how would you think about this change? :-)
>>>>
>>>> (Sorry for the long text, I did my best to maximize the compression
>>>> ratio!)
>>>>
>>>
>>> Um, it's a pleasure to read your writing, so don't worry about being
>>> prolix.  I can hardly complain ;-)
>>>
>>>
>>>
>>>>
>>>> Best,
>>>> Christoph
>>>>
>>>> ------------------------------
>>>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>>>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>>>> *Gesendet:* Sonntag, 29. Dezember 2019 16:59 Uhr
>>>> *An:* The general-purpose Squeak developers list
>>>> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>>>>
>>>> Hi Christoph,
>>>>
>>>> On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <
>>>> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>>>
>>>> 
>>>>
>>>> Hi all! :-)
>>>>
>>>>
>>>> *Steps to reproduce:*
>>>>
>>>> Do it:
>>>>
>>>> [true]
>>>> whileTrue;
>>>> whileFalse
>>>>
>>>>
>>>> *Expected behavior:*
>>>> The script is compiled and executed (and an infinite loop occurs).
>>>>
>>>> *Actual behavior:*
>>>> <pastedImage.png>
>>>>
>>>>
>>>> *Further investigations:*
>>>> In #analyseTempsWithin:rootNode:assignmentPools:, the inner BlockNode
>>>> [true] is not computed a blockExtent, because it is optimized.
>>>> This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
>>>> A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:,
>>>> but that's a dirty workaround.
>>>>
>>>> *Similar bug:*
>>>> Try to do:
>>>>
>>>> Object newSubclass compile:  'iWontCompile: foo
>>>> foo.
>>>> [true]
>>>> whileTrue;
>>>> whileFalse'
>>>>
>>>> Fails with:
>>>> <pastedImage.png>
>>>>
>>>> Again, the problem is that blockExtent isNil.
>>>>
>>>> *Workaround:*
>>>> Send #yourself to the block before sending the cascade. This prevents
>>>> the receiver from being optimized.
>>>>
>>>> Please note that this problem does not only occur in case of special
>>>> blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse
>>>> fails as well.
>>>>
>>>>
>>>> How to fix this appropriately? I'm not sure whether I understand
>>>> the whole idea of blockExtent so far.
>>>>
>>>>
>>>> blockExtents are the model of blocks that the compiler uses to analyze
>>>> temporary references so that temporaries are either local or remote.  Local
>>>> temporaries are temporaries on a Context’s own stack.  Remote temporaries
>>>> are temporaries stored in an array which is on a Context’s own stack.
>>>>
>>>> The local/remote distinction is to ensure that a Context _never_ has to
>>>> reference another Context’s stack to access an outer temporary.  Why? In
>>>> the VM Contexts serve as proxies for active stack frames, with most stack
>>>> frames not having a context unless the programmer accesses it by accessing
>>>> some other context’s sender. This optimization is key to the VM’s
>>>> performance. Cog is fast because it does context-to-stack mapping and
>>>> creates contexts lazily.
>>>>
>>>> If the execution model (defined by the bytecode set) for temporary
>>>> access implements access to outer temporaries via direct access to outer
>>>> contexts stacks then there are two states an outer context can be in. The
>>>> common case is that an outer context is associated with a stack frame and
>>>> the temporaries exist in the stack frame; any slots in the context object
>>>> are stale. But if the block is long lived and runs after its enclosing
>>>> context has been returned from then the temporaries will be in the
>>>> context.  To keep the slots in the vm text up-to-date the stack frame slots
>>>> must be copied to the context in return, and *before* the stack frame is
>>>> torn down. Every outer temp access must check what state the outer context
>>>> is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks
>>>> 2.5) did (with the optimization that an outer context would be converted
>>>> into having a stack frame if it didn’t, so that temporary access only deals
>>>> with the common case). And the cost and complexity is very high.  (Forgive
>>>> the deep dive).
>>>>
>>>> The alternative, that I introduced in VW 5i, and that has been in Cog
>>>> from the start is to never have contexts access outer temps directly.  This
>>>> is done using a standard closure implementation model from lisp.  Any outer
>>>> temporary which cannot possibly change during the execution of an inner
>>>> block is copied into the closure and copied onto the block context/frame
>>>> stack where it is effectively read only (an r-value). These are called
>>>> “copied values”.
>>>>
>>>> Any outer temp which is written to in the block or possibly modified
>>>> after being closed over (ie assigned to lexically after the block) must be
>>>> put in an array (an indirection vector). Since the temporary holding
>>>> the indirection vector itself is not written to after initialization
>>>> indirection vectors are always copied values.
>>>>
>>>> So for example in inject:into: nextValue ends up in an indirection
>>>> vector but aBlock is a copied value (look at the bytecode for the method;
>>>> I’m on my phone).  Now the VMs context-to-stack mapping is greatly
>>>> simplified (no need to intercept returns, no need to check outer temp
>>>> access) and _much_ faster (and less buggy, known from hash experience with
>>>> HPS).
>>>>
>>>> But to do this the bytecode compiler must identify when closed over
>>>> temporaries can be treated as copied values or put in indirection vectors.
>>>> Block extents model block nesting and allow the bytecode compiler to check
>>>> whenever a temporary is read if it is being read in an inner block (ie is
>>>> being closed over), and check whenever a temporary is being written if it
>>>> is being written to in an inner block, or, if it has been closed over is
>>>> being written to after a block that closed over it.  In either of these two
>>>> cases, written to during the block and that write needing to be visible in
>>>> the outer context, or written to after being closed over and that write
>>>> needing to be visible within the block, the temp must live in an
>>>> indirection vector.
>>>>
>>>> Christoph, you *have* to understand this to be able to be effective in
>>>> working in the execution machinery (eg the runUntil... debugger
>>>> machinery).  If you do not understand this then the fault is mine
>>>> (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So
>>>> take as long as you need and ask yourself if you do or do not understand
>>>> it.  If you don’t, please ask questions and I’ll try my best to explain
>>>> things better.  I introduced this machinery and it is my responsibility for
>>>> it not to be a burden.  This is a case of moving complexity out of the VM
>>>> into the image, and is justified on performance and reliability grounds.
>>>> But it is a severe cognitive burden on those using the bytecode compiler
>>>> where, yo at the Smalltalk level, it seems they accessing outer temporaries
>>>> directly is simple and all the block extent machinery horribly complex.  I
>>>> assure you that the block extent machinery is trivial compared to the
>>>> support required in the vm to implement direct access to outer temps as
>>>> efficiently as possible ;which is way less efficiently than our current
>>>> architecture allows).
>>>>
>>>> According to the documentation comment, it should be okay to be nil;
>>>> however, the #blockExtent comment does not mention this fact.
>>>> Please see the attached changeset for an approach to fix the issue. All
>>>> tests from KernelTests-Methods and most tests from Test-Compiler pass it,
>>>> except of the testAllNodePCs* which time out in my fresh image. It would be
>>>> great to hear if this is the right way to solve the bug, or just extending
>>>> a workaround :-)
>>>>
>>>>
>>>> I’ll take a look some time today.  But I’d be much happier if you can
>>>> understand the long winded explanation above and answer your own question.
>>>> Great bug though :-)
>>>>
>>>> Best,
>>>> Christoph
>>>> <bugfix compiling block cascades.2.cs>
>>>>
>>>>
>>>>
>>>
>>> --
>>> _,,,^..^,,,_
>>> best, Eliot
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20191231/13ffa39c/attachment.html>


More information about the Squeak-dev mailing list