[squeak-dev] [BUG] Cannot compile cascade sent to block

Wed Jan 1 19:50:50 UTC 2020

Le mer. 1 janv. 2020 à 20:12, Thiede, Christoph <
Christoph.Thiede at student.hpi.uni-potsdam.de> a écrit :

> Hi Nicolas,
>
>
> this appears to work as well. Could you - or someone else - imagine why
> Bert added a copy in "Compiler-bf.293"? Don't want to break something
> else.
>
>
> I can't speak for Bert, but IMO, either by mimetism with
originalArguments, or fear of possible side effects if sharing.

> Best,
>
> Christoph
> ------------------------------
> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
> Auftrag von Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>
> *Gesendet:* Dienstag, 31. Dezember 2019 23:59:40
> *An:* The general-purpose Squeak developers list
> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>
> Hi Christoph,
> what about removing the copy in originalReceiver:
>
> receiver: rcvr arguments: args precedence: p
>      receiver := rcvr.
>      originalReceiver := rcvr.
>      arguments := args.
>      originalArguments := arguments copy.
>      sizes := Array new: arguments size.
>      precedence := p
>
> Le mar. 31 déc. 2019 à 02:08, Thiede, Christoph <
> Christoph.Thiede at student.hpi.uni-potsdam.de> a écrit :
>
>> Ooooh, my patch was erroneous ... I discovered this while working at a
>> total different thing and it took me several hours to trace recurring VM
>> crashes back to this change ...
>>
>>
>> Compare the following before and after loading my commit:
>>
>> PreferenceWizardMorph compile: (PreferenceWizardMorph sourceCodeAt:
>> #updateWindowBounds).
>> (PreferenceWizardMorph >> #updateWindowBounds) symbolic edit
>>
>> There is a difference!
>>
>> (blue is correct)
>> I don't get the full explanation at the moment, but apparently, there
>> *are* made some relevant transformations to the receiver, which is
>> offsetting the temp index because of the nested closure or something.
>> Transformations are more than optimization, I guess.
>>
>> The following change appears to fix the current problem:
>>
>> But I won't commit this to the inbox until I will have made sure that I
>> understand whether and why this is (hopefully) correct. So long, please
>> move Compiler-ct.414 to the Treated Inbox. So sorry!
>>
>>
>> > cose over -> close over
>> > throguh -> through
>>
>> Thank you, will fix them with my next commit.
>>
>> Best,
>> Christoph
>>
>> ------------------------------
>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>> Auftrag von Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>
>> *Gesendet:* Dienstag, 31. Dezember 2019 01:04 Uhr
>> *An:* The general-purpose Squeak developers list
>> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>>
>> Ah, yes, here is why:
>>
>> Name: Compiler-bf.293
>> Author: bf
>> Time: 19 February 2015, 6:22:49.349 pm
>> UUID: 737af1fb-e8f0-4653-851b-0c1b3ed0f65f
>> Ancestors: Compiler-topa.292
>>
>> Fix deoptimization of ifNil: etc.
>>
>> --------
>>
>> Forgot to say, your changes in Compiler-ct.414 are correct.
>> But there are other typos in BlockNode comments, I see at least 2:
>>
>> cose over -> close over
>> throguh -> through
>>
>> Le mar. 31 déc. 2019 à 00:56, Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com> a écrit :
>>
>>> Oups, I probably missunderstood the question.
>>> You mean why you need to copy the cascadeReceiver...
>>>
>>> Well, it's simple: the rcvr is shared by all cascade message, because we
>>> do:
>>>
>>>     parseNode := rcvr.
>>>     (self messagePart: 3 repeat: false)
>>>
>>> So [true] whileTrue is optimized.
>>> Then in #cascade, after #ensureCanCascade: it is #deoptimize(d).
>>> That's when we get the deoptimized rcvr := parseNode cascadeReceiver.
>>>
>>> Then in second message [true] whileFalse, it gets optimized again.
>>> The receiver of second message is deoptimized again, but it's the
>>> originalReceiver that gets deoptimized (a copy) not the receiver!
>>> That's a change Bert made to #deoptimize in 2015 for some reason (I
>>> guess for some transform, the receiver was changed).
>>>
>>> So your rcvr copy is preserved from re-optimization.
>>>
>>> Le lun. 30 déc. 2019 à 19:36, Nicolas Cellier <
>>> nicolas.cellier.aka.nice at gmail.com> a écrit :
>>>
>>>> Hi Christoph,
>>>> Because sending a message unstack receiver, and replace it with
>>>> returned object. So we need dup for each cascade.
>>>>
>>>> Le lun. 30 déc. 2019 à 19:22, Thiede, Christoph <
>>>> Christoph.Thiede at student.hpi.uni-potsdam.de> a écrit :
>>>>
>>>>> > As long as you add a test to capture the issue, one that tests both
>>>>> compilation and, if it succeeds, execution, you should add what you see
>>>>> fit.  And yes, trunk is the right place, which means inbox until after the
>>>>> release.
>>>>>
>>>>> Alright, done, hope you like it :)
>>>>>
>>>>> I still don't see why cascadeReceiver is a copy, though ...
>>>>>
>>>>> ------------------------------
>>>>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>>>>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>>>>> *Gesendet:* Montag, 30. Dezember 2019 17:51:36
>>>>> *An:* The general-purpose Squeak developers list
>>>>> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>>>>>
>>>>> Hi Christoph, Hi Nicolas,
>>>>>
>>>>> On Dec 30, 2019, at 5:28 AM, Thiede, Christoph <
>>>>> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>>>>
>>>>> 
>>>>>
>>>>> Hi Eliot,
>>>>>
>>>>>
>>>>> Of course, a changeset that defers the optimization of ParseNodes
>>>>> until a cascade can be precluded would be perfect - also with respect to
>>>>> the compiler performance. I would be very interested to read such a
>>>>> changeset!
>>>>>
>>>>>
>>>>> However, if we can't patch this quickly, would you consider my
>>>>> proposal an appropriate workaround? Should we maybe include it into Trunk,
>>>>> just for 5.3, until we have the perfect solution from Nicolas?
>>>>>
>>>>>
>>>>> As long as you add a test to capture the issue, one that tests both
>>>>> compilation and, if it succeeds, execution, you should add what you see
>>>>> fit.  And yes, trunk is the right place, which means inbox until after the
>>>>> release.
>>>>>
>>>>> That suggests we switch over to a variation of inbox for the release
>>>>> process.  We could provide releasebox or done such (I would call it
>>>>> release) and only commit things good for the release.  But in thinking
>>>>> about it it’s a bad idea.  It means a lot of work to change ones working
>>>>> images from trunk to releasebox and back.  However, if in 5.3 we provided a
>>>>> bulk change operation for the Monticello browser I’d consider it.  Has
>>>>> anything like this been discussed? (we should change subject to continue...)
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Christoph
>>>>> ------------------------------
>>>>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>>>>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>>>>> *Gesendet:* Sonntag, 29. Dezember 2019 21:01:04
>>>>> *An:* The general-purpose Squeak developers list
>>>>> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to block
>>>>>
>>>>> Hi Christoph,
>>>>>
>>>>> On Sun, Dec 29, 2019 at 11:51 AM Thiede, Christoph <
>>>>> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>>>>
>>>>>> Hi Eliot, thank you very much for the detailed and interesting
>>>>>> explanation! I think I got a rough idea of the blockExtent concept;
>>>>>> certainly, it will become more clear as I continue to work with it.
>>>>>>
>>>>>>
>>>>>> Concretely to this bug, I suppose that optimized blocks don't need
>>>>>> their own blockExtent, because they store all their temps directly on their
>>>>>> declaring context, don't they?
>>>>>>
>>>>>
>>>>> Exactly.  They're just basic blocks and jumps, within either proper
>>>>> methods or proper blocks.
>>>>>
>>>>>
>>>>>> This would explain the following outputs:
>>>>>>
>>>>>> [|x| x:=true] whileFalse. "will be optimized"
>>>>>> thisContext tempNames. #('x')
>>>>>>
>>>>>>   compared to:
>>>>>>
>>>>>> [|x| x:=true] yourself whileFalse. "wont be optimized"
>>>>>> thisContext tempNames #()
>>>>>>
>>>>>>
>>>>> Exactly.
>>>>>
>>>>>
>>>>>> So I would say that blockExtent may indeed be nil, *I would at least
>>>>>> note this fact in BlockNode >> #blockExtent.*
>>>>>>
>>>>>
>>>>> And feel free to take as much of my explanation, with rewrites, and
>>>>> add it to the relevant parts of the bytecode compiler.  I like discursive
>>>>> doc like this in class comments, but that implies that a pointer to the
>>>>> class comment from a method is the right thing to do (as in "See the class
>>>>> comment in ...").
>>>>>
>>>>>
>>>>>> However, #computeCopiedValues: is called by #constructClosureCreationNode:,
>>>>>> and a closure should be only created of non-optimized BlockNodes. Equally,
>>>>>> the senders of #reindexingLocalsDo:encoder: should be only activated in
>>>>>> case of a non-optimized BlockNode ... So better forget the rest of my first
>>>>>> changeset, sigh.
>>>>>>
>>>>>
>>>>> Right.  So what maybe happening is that the system is realizing that
>>>>> it can't inline the block because it is being used in a cascade.  So the
>>>>> attempt to optimize it in the first place should fail.  We should only
>>>>> consider for optimization blocks that are sent one of the inline messages,
>>>>> *but not in a cascade*.  This is probably why Nicolas expects we should
>>>>> have already fixed this bug. Nicolas if I neglected to integrate your fix I
>>>>> apologize.  I invite you to commit straight to trunk; maybe ask Christoph
>>>>> for review?.
>>>>>
>>>>> It is interesting that in the example, the receiver BlockNode keeps
>>>>>> flagged as optimized (whereas the byte code produces a separate closure):
>>>>>>
>>>>>> [self confirm: #foo]
>>>>>> whileTrue;
>>>>>> whileFalse
>>>>>>
>>>>>>
>>>>> Right.  I would guess that the block is marked as being optimizable
>>>>> when the whileTrue is processed, but everything goes pear shaped when the
>>>>> whileFalse is processed.  What we need to do is forestall the optimization
>>>>> caused by processing the whileTrue send iff the whileTrue is part of a
>>>>> cascade (which is probably exactly what Nicolas' as-yet-unintegrated
>>>>> changes effect).
>>>>>
>>>>>
>>>>>>
>>>>>> First, the compiler reads the first message only ("[self confirm:
>>>>>> #foo] whileTrue") and optimizes it, and only then finds the second
>>>>>> message and discards the optimized bytecode. (See Parser >>
>>>>>> #cascade, #messagePart:repeat, MessageNode >> #receiver:selector:arguments:precedence:from:,
>>>>>> #transform:.)
>>>>>> But it does not reset the optimized flag of the BlockNode! To me,
>>>>>> this appears to be the actual bug!
>>>>>>
>>>>>
>>>>> You have it.
>>>>>
>>>>>
>>>>>> There is #ensureCanCascade: which should specifically solve this
>>>>>> problem by resetting the optimized flag. However, during creation of
>>>>>> MessageNode, originalReceiver is assigned a copy of the actual receiver.
>>>>>> This leads to the problem that while in each temporary MessageNode, the
>>>>>> receiver block is deoptimized, the tempvar rcrv in #cascade holds the
>>>>>> copied original receiver which never got deoptimized. For what purpose do
>>>>>> we copy the receiver into originalReceiver?
>>>>>> Did I explain this understandable? :-)
>>>>>>
>>>>>> *By applying the following single patch, I can fix the problem:*
>>>>>>
>>>>>> Parser >> #cascade
>>>>>> " {; message} => CascadeNode."
>>>>>>
>>>>>> - | rcvr msgs |
>>>>>> + | rcvr cascadeRcvr msgs|
>>>>>> parseNode canCascade ifFalse:
>>>>>> [^self expected: 'Cascading not'].
>>>>>> parseNode ensureCanCascade: encoder.
>>>>>> rcvr := parseNode cascadeReceiver.
>>>>>> +  cascadeRcvr := rcvr copy.
>>>>>> msgs := OrderedCollection with: parseNode.
>>>>>> [self match: #semicolon]
>>>>>> whileTrue:
>>>>>> [parseNode := rcvr.
>>>>>> (self messagePart: 3 repeat: false)
>>>>>> ifFalse: [^self expected: 'Cascade'].
>>>>>> parseNode canCascade ifFalse:
>>>>>> [^self expected: '<- No special messages'].
>>>>>> parseNode ensureCanCascade: encoder.
>>>>>> parseNode cascadeReceiver.
>>>>>> msgs addLast: parseNode].
>>>>>> - parseNode := CascadeNode new receiver: rcvr messages: msgs
>>>>>> +  parseNode := CascadeNode new receiver: cascadeRcvr messages: msgs
>>>>>>
>>>>>>
>>>>>> All relevant tests pass (but I don't know how good their coverage is).
>>>>>> Is there any need for respecting possible side-effects on rcvr that
>>>>>> might occur while creating the next cascade message
>>>>>> (via #messagePart:repeat:)?
>>>>>> If not, how would you think about this change? :-)
>>>>>>
>>>>>> (Sorry for the long text, I did my best to maximize the compression
>>>>>> ratio!)
>>>>>>
>>>>>
>>>>> Um, it's a pleasure to read your writing, so don't worry about being
>>>>> prolix.  I can hardly complain ;-)
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Christoph
>>>>>>
>>>>>> ------------------------------
>>>>>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>>>>>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>>>>>> *Gesendet:* Sonntag, 29. Dezember 2019 16:59 Uhr
>>>>>> *An:* The general-purpose Squeak developers list
>>>>>> *Betreff:* Re: [squeak-dev] [BUG] Cannot compile cascade sent to
>>>>>> block
>>>>>>
>>>>>> Hi Christoph,
>>>>>>
>>>>>> On Dec 28, 2019, at 6:53 AM, Thiede, Christoph <
>>>>>> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>>>>>
>>>>>> 
>>>>>>
>>>>>> Hi all! :-)
>>>>>>
>>>>>>
>>>>>> *Steps to reproduce:*
>>>>>>
>>>>>> Do it:
>>>>>>
>>>>>> [true]
>>>>>> whileTrue;
>>>>>> whileFalse
>>>>>>
>>>>>>
>>>>>> *Expected behavior:*
>>>>>> The script is compiled and executed (and an infinite loop occurs).
>>>>>>
>>>>>> *Actual behavior:*
>>>>>> <pastedImage.png>
>>>>>>
>>>>>>
>>>>>> *Further investigations:*
>>>>>> In #analyseTempsWithin:rootNode:assignmentPools:, the inner
>>>>>> BlockNode [true] is not computed a blockExtent, because it is optimized.
>>>>>> This leads to a nil key in the ByteCodeEncoder's blockExtentsToLocals.
>>>>>> A fast solution would be to skip nil blockExtents in #noteBlockExtent:hasLocals:,
>>>>>> but that's a dirty workaround.
>>>>>>
>>>>>> *Similar bug:*
>>>>>> Try to do:
>>>>>>
>>>>>> Object newSubclass compile:  'iWontCompile: foo
>>>>>> foo.
>>>>>> [true]
>>>>>> whileTrue;
>>>>>> whileFalse'
>>>>>>
>>>>>> Fails with:
>>>>>> <pastedImage.png>
>>>>>>
>>>>>> Again, the problem is that blockExtent isNil.
>>>>>>
>>>>>> *Workaround:*
>>>>>> Send #yourself to the block before sending the cascade. This prevents
>>>>>> the receiver from being optimized.
>>>>>>
>>>>>> Please note that this problem does not only occur in case of special
>>>>>> blocks. For example, [Sensor anyButtonPressed] whileTrue; whileFalse
>>>>>> fails as well.
>>>>>>
>>>>>>
>>>>>> How to fix this appropriately? I'm not sure whether I understand
>>>>>> the whole idea of blockExtent so far.
>>>>>>
>>>>>>
>>>>>> blockExtents are the model of blocks that the compiler uses to
>>>>>> analyze temporary references so that temporaries are either local or
>>>>>> remote.  Local temporaries are temporaries on a Context’s own stack.
>>>>>> Remote temporaries are temporaries stored in an array which is on a
>>>>>> Context’s own stack.
>>>>>>
>>>>>> The local/remote distinction is to ensure that a Context _never_ has
>>>>>> to reference another Context’s stack to access an outer temporary.  Why? In
>>>>>> the VM Contexts serve as proxies for active stack frames, with most stack
>>>>>> frames not having a context unless the programmer accesses it by accessing
>>>>>> some other context’s sender. This optimization is key to the VM’s
>>>>>> performance. Cog is fast because it does context-to-stack mapping and
>>>>>> creates contexts lazily.
>>>>>>
>>>>>> If the execution model (defined by the bytecode set) for temporary
>>>>>> access implements access to outer temporaries via direct access to outer
>>>>>> contexts stacks then there are two states an outer context can be in. The
>>>>>> common case is that an outer context is associated with a stack frame and
>>>>>> the temporaries exist in the stack frame; any slots in the context object
>>>>>> are stale. But if the block is long lived and runs after its enclosing
>>>>>> context has been returned from then the temporaries will be in the
>>>>>> context.  To keep the slots in the vm text up-to-date the stack frame slots
>>>>>> must be copied to the context in return, and *before* the stack frame is
>>>>>> torn down. Every outer temp access must check what state the outer context
>>>>>> is in. This is what Peter Deutsch’s first bytecode set for HPS (objectworks
>>>>>> 2.5) did (with the optimization that an outer context would be converted
>>>>>> into having a stack frame if it didn’t, so that temporary access only deals
>>>>>> with the common case). And the cost and complexity is very high.  (Forgive
>>>>>> the deep dive).
>>>>>>
>>>>>> The alternative, that I introduced in VW 5i, and that has been in Cog
>>>>>> from the start is to never have contexts access outer temps directly.  This
>>>>>> is done using a standard closure implementation model from lisp.  Any outer
>>>>>> temporary which cannot possibly change during the execution of an inner
>>>>>> block is copied into the closure and copied onto the block context/frame
>>>>>> stack where it is effectively read only (an r-value). These are called
>>>>>> “copied values”.
>>>>>>
>>>>>> Any outer temp which is written to in the block or possibly modified
>>>>>> after being closed over (ie assigned to lexically after the block) must be
>>>>>> put in an array (an indirection vector). Since the temporary holding
>>>>>> the indirection vector itself is not written to after initialization
>>>>>> indirection vectors are always copied values.
>>>>>>
>>>>>> So for example in inject:into: nextValue ends up in an indirection
>>>>>> vector but aBlock is a copied value (look at the bytecode for the method;
>>>>>> I’m on my phone).  Now the VMs context-to-stack mapping is greatly
>>>>>> simplified (no need to intercept returns, no need to check outer temp
>>>>>> access) and _much_ faster (and less buggy, known from hash experience with
>>>>>> HPS).
>>>>>>
>>>>>> But to do this the bytecode compiler must identify when closed over
>>>>>> temporaries can be treated as copied values or put in indirection vectors.
>>>>>> Block extents model block nesting and allow the bytecode compiler to check
>>>>>> whenever a temporary is read if it is being read in an inner block (ie is
>>>>>> being closed over), and check whenever a temporary is being written if it
>>>>>> is being written to in an inner block, or, if it has been closed over is
>>>>>> being written to after a block that closed over it.  In either of these two
>>>>>> cases, written to during the block and that write needing to be visible in
>>>>>> the outer context, or written to after being closed over and that write
>>>>>> needing to be visible within the block, the temp must live in an
>>>>>> indirection vector.
>>>>>>
>>>>>> Christoph, you *have* to understand this to be able to be effective
>>>>>> in working in the execution machinery (eg the runUntil... debugger
>>>>>> machinery).  If you do not understand this then the fault is mine
>>>>>> (ENTIRELY!), and the fault is that my explanation doesn’t make sense. So
>>>>>> take as long as you need and ask yourself if you do or do not understand
>>>>>> it.  If you don’t, please ask questions and I’ll try my best to explain
>>>>>> things better.  I introduced this machinery and it is my responsibility for
>>>>>> it not to be a burden.  This is a case of moving complexity out of the VM
>>>>>> into the image, and is justified on performance and reliability grounds.
>>>>>> But it is a severe cognitive burden on those using the bytecode compiler
>>>>>> where, yo at the Smalltalk level, it seems they accessing outer temporaries
>>>>>> directly is simple and all the block extent machinery horribly complex.  I
>>>>>> assure you that the block extent machinery is trivial compared to the
>>>>>> support required in the vm to implement direct access to outer temps as
>>>>>> efficiently as possible ;which is way less efficiently than our current
>>>>>> architecture allows).
>>>>>>
>>>>>> According to the documentation comment, it should be okay to be nil;
>>>>>> however, the #blockExtent comment does not mention this fact.
>>>>>> Please see the attached changeset for an approach to fix the issue.
>>>>>> All tests from KernelTests-Methods and most tests from Test-Compiler pass
>>>>>> it, except of the testAllNodePCs* which time out in my fresh image. It
>>>>>> would be great to hear if this is the right way to solve the bug, or just
>>>>>> extending a workaround :-)
>>>>>>
>>>>>>
>>>>>> I’ll take a look some time today.  But I’d be much happier if you can
>>>>>> understand the long winded explanation above and answer your own question.
>>>>>> Great bug though :-)
>>>>>>
>>>>>> Best,
>>>>>> Christoph
>>>>>> <bugfix compiling block cascades.2.cs>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> _,,,^..^,,,_
>>>>> best, Eliot
>>>>>
>>>>>
>>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200101/cff233c5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pastedImage.png
Type: image/png
Size: 12558 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200101/cff233c5/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pastedImage.png
Type: image/png
Size: 44509 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200101/cff233c5/attachment-0003.png>