Hi Jaromir,

this is great! One more time, thanks for your work! I have never found the time or energy to continue working on this issue for more than 2 years and am very happy to see progress here! :) As far as I understand it, repositioning the simulation guards instead of stopping the simulation sounds ingenious.

Regarding other use cases, let me collect a few here:

* Generators: You could play with a simple example like (Generator on: [:gen | gen nextPut: 1]) next or (Generator on: [:gen | gen nextPut: 1; nextPut: 2]) next: 2 and step over the next/nextPuts (or into them and then over some of their parts). Effectively every next/nextPut is performing a coroutine switch, so the control flow will jump from the next back into the generator block and vice versa. I am not even sure what would be the requirements in this example. When stepping over next, do you expect to end in the generator block or after the next? Also consider more sophisticated examples where the generator block is far away from the consumer. E.g., evaluate the following statements separately and debug the last one:

g := Generator on: [:gen |
    gen nextPut: 1.
    "do something interesting here which the debugging person does not want to miss ..."
    gen nextPut: 2].
g := g collect: [:ea | ea * 2].
g next: 2.

Step through would not help you in this situation. I'm not sure what we would want here. Maybe the current behavior is already the best one. :-)

* #jump:

Debug the following and step into #runUntilErrorOrReturnFrom:, then over the #jump at the bottom (WARNING: for me, this even crashes the VM!).

ctxt := [2/3] asContext.
ctxt := ctxt stepToSendOrReturn.
ctxt := ctxt step.
ctxt runUntilErrorOrReturnFrom: ctxt.

As opposed to generators, for an arbitrary #jump we cannot know whether control will ever get back to the original context. Thus, my expectation here would be to stop right behind the context switch, which would be SmallInteger>>/ in this example.

Here is another one - step into contextEnsure:, through, and over the #jump (raises a BCR for me):

thisContext insertSender:
    (Context contextEnsure: [2])

* #swapSender:

You could construct similar situations to #jump here. Exercise left to the reader because my dinner is getting cold. :-)

Regarding performance: Note that #findNextRunUntilErrorOrReturnFromCalleeContextUpTo: is already somewhat optimized as it uses #primitiveFindHandlerContext, so we do not need to walk up the entire context stack. Nevertheless, when there are dozens or hundreds of marker contexts on the stack, the difference might be notable. Plus, my original implementation of this method uses expensive sends such as Context>>#selector (it should better compare CompiledMethods instead, maybe even use pragmas to annotate guard-context installing simulation methods?).

Before doing any optimizations, we should write a few simple benchmarks. Something like:

[[2/0] on: ZeroDivide do: [0]] bench

And at least a couple of others, e.g., for the generator example above etc.
My first experiments suggest a slowdown of this one by >100%, which would most likely be inacceptable.
So probably simulation flags would be required. We would need to find the right place for them. Maybe we could use an environment variable in the activeProcess (possibly a subclass of DynamicVariable or ProcessLocalVariable, which is faster than the former) that points to the existing simulation guard, if any. I don't like this approach very much because I would rather keep independent of processes, but I cannot see a better solution right now. Maybe you have one :)

Again, thanks a lot!

Best,
Christoph

---
Sent from Squeak Inbox Talk

On 2024-01-09T22:02:41+00:00, mail@jaromir.net wrote:

> Hi Christoph, all,
>
> I've studied your proposal.
>
> I've sent Kernel-jar.1551 to share an idea how to fix the stepOver bug:
> instead of recomputing the first unwind context in #resume:through: I
> suggest to simply adjust the position of the simulation guard contexts
> in the sender chain that would reflect the target of the non-local
> return that is about to happen.
>
> It uses your #nextRunUntilErrorOrReturnFromCalleeContext to do the heavy
> lifting - to detect the simulation guard contexts (this is great!) and
> then I just cut the sender chain and sew it together :)
>
> I think this works nicely; however, as you noted, I not sure what would
> be the impact on the runtime performance. In the next step I'll try to
> think about a way to optimize it a bit if possible - e.g. using
> simulation flags.
>
> Regarding the general approach including jumps, sender swaps - could you
> please give me an example (or two) of situations you have in mind? I'm
> not sure #jump needs such a protection but I haven't though about it
> much yet and any example would be very appreciated.
>
> I tried your original approach but the disruptions while debugging felt
> a bit too drastic a solution. I often lost focus when the debugger
> stopped unexpectedly. Sometimes I had to trudge through lots of contexts
> I didn't care about to get where I wanted to be in the first place :)
>
> So my idea is to check what other situations you may have considered as
> dangerous and try to deal with them one by one or possibly in a more
> general manner if a pattern emerges :)
>
> Thanks again, I look forward to your thoughts.
>
> best,
> Jaromir
>
>
> On 30-Dec-23 5:17:14 PM, christoph.thiede(a)student.hpi.uni-potsdam.de
> wrote:
>
> >Hi Jaromir,
> >
> >thanks for the reply!
> >
> >I like about your new solution that it restores the original value of
> >the #aboutToReturn:through:/#resumeEvaluating:through: second argument
> >and thus fixes my SimulationStudio tests. Stepping over the non-local
> >return in your example still seems to work, which is great. :-)
> >I am a bit confused about the added complexity in BlockClosure>>ensure:
> >and BlockClosure>>ifCurtailed:. IMO none of such logic should be aware
> >of the existence of a possible simulator.
> >(Besides, just a small comment, I believe #isSimulationFlagSet is an
> >underspecific name. thisContext sender isSimulationFlagSet in a do-it
> >evaluates to true, which is surprising. Maybe
> >#isUnwindContextSimulationg or something similar would be better. Or
> >just avoid this extra message as you send it only once.)
> >
> >I still wonder whether we need to fix this bug specifically to
> >non-local returns, since the problem occurs for all kinds of irregular
> >context switches while a simulation guard is present. Have you had a
> >chance to take a look at runUntilErrorOrReturnFrom.8.cs? It is similar
> >to your patch in that i recognizes an existing simulation guard and
> >halts the return earlier when present. However, it is more generic and
> >works for #jump and #swapSender too. While yours assumes an existing
> >guard context in #return:from:, mine scans the current context stack
> >for guard contexts: I think your assumption does not always hold true
> >(e.g., Context>>#runSimulated:... does not install a guard context); on
> >the other hand, my approach might be obviously slower (though I did not
> >run benchmarks yet. I might draw inspiration from your approach and
> >store a simulation flag in the Process rather than walking up the
> >entire stack).
> >
> >tl;dr: I like your new solution (Kernel-jar.1545) better than what's
> >currently in the trunk, but I still wonder whether we can refine and
> >merge runUntilErrorOrReturnFrom.8.cs instead, for which my main concern
> >remains performance. And as always, if anyone else is reading this
> >(Eliot? Marcel?), further perspectives would be very welcome. :-)
> >
> >Looking forward to your feedback!
> >
> >Best,
> >Christoph
> >
> >---
> >Sent from Squeak Inbox Talk
> ><https://github.com/hpi-swa-lab/squeak-inbox-talk>
> >
> >On 2023-12-28T18:37:24+00:00, mail(a)jaromir.net wrote:
> >
> > > Hi Christoph,
> > >
> > > Try this please: Kernel-jar.1545
> > >
> > > Now the simulation for stepOver works as expected even without a
> >dirty
> > > trick :)
> > >
> > > Please check whether this solution is consistent with your Simulation
> > > Studio.
> > >
> > > Here's an FYI:
> > >
> > > The whole issue emerged probably in 2012 when #aboutToReturn:through:
> > > started using the `firstUnwindContext` argument, I guess as an
> > > optimization, and forced a starting point to searching for unwind
> > > contexts in the resumtion logic (#resumeEvaluating:through:) - which
> >was
> > > inconsistent with inserting the ensure guard context **before** this
> > > starting point during simulation (by #runUntilErrorOrReturnFrom:
> >called
> > > by stepping methods). This resulted in erroneous skipping over the
> > > inserted guard ensure context during stepOver simulation and the
> > > unexpected behavior.
> > >
> > > The idea is for the resumption logic (#resumeEvaluating:through:) to
> > > recognize a simulated execution and search for a potential guard
> >(i.e.
> > > #ensure:) context inserted during the simulation. I suggest using a
> > > `simulationFlag` tempVar in #ensure: which would be set in
> > > #return:from:, i.e. if, and only if, the `firstUnwindContext` is
> > > supplied in simulaton (i.e. potentially dangerous). When the
> >resumption
> > > logic recognizes a simulation supplied `firstUnwindContext` it
> > > recomputes the real first unwind context at the moment and proceeds
> >as
> > > expected.
> > >
> > > I look forward to your reply.
> > >
> > > Best,
> > > Jaromir
> > >
> > >
> > > On 27-Dec-23 10:41:45 AM, "Jaromir Matas" <mail(a)jaromir.net> wrote:
> > >
> > > >Hi Christoph,
> > > >
> > > >yes, I know it's not a clean solution; I understand your objection
> >and
> > > >agree with it: nilling the firstUnwindContext argument during
> > > >simulation departs from mimicking the VM behavior but sticking with
> >the
> > > >firstUnwindContext breaks the simulation. Please give me a few days
> >to
> > > >revisit the issue again.
> > > >
> > > >Coincidentally, I found another simulation bug in #return:from: -
> >see
> > > >Kernel-jar.1538 (it's a part of a series of changesets
> > > >Kernel-jar.1537-1539). I'd appreciate your feedback; the plan is to
> > > >doublecheck and consolidate the three changesets for final review.
> > > >
> > > >Thanks and best regards,
> > > >Jaromir
> > > >
> > > >On 27-Dec-23 1:53:47 AM,
> >christoph.thiede(a)student.hpi.uni-potsdam.de
> > > >wrote:
> > > >
> > > >>Hi Jaromir,
> > > >>
> > > >>sorry to bring this up again with so much delay, but honestly I
> >don't
> > > >>think this is a good patch and we should revert it in the trunk.
> > > >>
> > > >>One core assumption about the simulation engine is that it mimics
> >the
> > > >>behavior of the VM interpreter in every utmost detail. This
> >includes
> > > >>the arguments sent through special messages sent by the VM such as
> > > >>#aboutToReturn:through:. Just because the current implementation of
> > > >>Context>>aboutToReturn:through: does not require the value of
> > > >>firstUnwindContext, this does not mean that we can just break the
> > > >>existing contract of the VM and provide nil for that argument from
> >the
> > > >>simulator.
> > > >>
> > > >>Besides that theoretical perspective, your change also broke a
> >couple
> > > >>of tests in SimulationStudio as I discovered today, because
> > > >>SimulationStudio relies on many of these assumptions about the code
> > > >>execution. If you would like to, you can also explore the failing
> >test
> > > >>yourself: SimulationContextTest>>#testNonLocalReturn.
> > > >>SimulationContext>>#runSimulated:contextAtEachStep: manually checks
> > > >>the arguments passed to #aboutToReturn:through: (or more precisely,
> >of
> > > >>#resume:through:) to detect when the simulator is unwinding the
> >code
> > > >>provided to a custom simulator. As you niled out that value, I can
> >no
> > > >>longer check that.
> > > >>
> > > >>I think that your patch is just a hack that fixes a symptom rather
> > > >>than resolving the original problem, which is still the guard
> >contexts
> > > >>inserted by Context>>#runUntilErrorOrReturnFrom: that are not fully
> > > >>transparent to the executed process. See
> > >
> > >>https://lists.squeakfoundation.org/archives/list/squeak-dev(a)lists.squeakfoundation.org/message/JXLLHUWWSL4ICZPHMNZVVXPTC2WUHITB/.
> > > >>The changeset that I proposed there
> >(runUntilErrorOrReturnFrom.8.cs),
> > > >>which basically ensures before every irregular context switch that
> >the
> > > >>guard contexts are removed and the debugger is halted again, fixes
> >the
> > > >>original issue in the debugger as well. Try reverting your patch
> >from
> > > >>Kernel-jar.1503 and loading my change set instead and you will end
> >up
> > > >>in Context>>#resume:through: after stepping over the non-local
> >return,
> > > >>which I would consider even favorable to the behavior of your patch
> > > >>which makes the logic from the unwind blocks unaccessible in that
> > > >>situation.
> > > >>
> > > >>So, tl;dr, I see three issues with this patch: (i) it violates the
> >VM
> > > >>contract of #aboutToReturn:through:, which (ii) breaks
> > > >>SimulationStudio due to a missing piece of information, and (iii)
> > > >>stepping over non-local returns no longer allows programmers to
> > > >>observe or step into unwind blocks. Given that, I would ask you for
> > > >>your agreement to reverting that patch and working on fixing the
> > > >>underlying issue with non-transparent guard contexts in general
> > > >>instead.
> > > >>
> > > >>Despite that, please feel not discouraged to continue contributing
> >to
> > > >>this important part of Squeak: It's just that you're sharing so
> >many
> > > >>interesting ideas and Eliot and me and others lack the time to read
> > > >>all of them but certain changes nevertheless should require a
> >proper
> > > >>review ... Have a nice end of Christmas! :-)
> > > >>
> > > >>Best,
> > > >>Christoph
> > > >>
> > > >>---
> > > >>Sent from Squeak Inbox Talk
> > > >><https://github.com/hpi-swa-lab/squeak-inbox-talk>
> > > >>
> > > >>On 2023-02-14T22:08:39+00:00, mail(a)jaromir.net wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > I'm enclosing a changeset that entirely removes unwind code
> > > >>duplicity in Context AND at the same time fixes the "stepOver bug"
> > > >>still persisting in the system.
> > > >> >
> > > >> > To refresh what the bug is: You cannot #stepOver '^2' when you
> > > >>debug;
> > > >> > [^2] ensure: [] "step through to ^2 and try step over --> you
> >get
> > > >>BCR"
> > > >> >
> > > >> > The root cause is #return:from: supplies 'firstUnwindContext' to
> > > >>#aboutToReturn:through: but before this gets evaluated the real
> >first
> > > >>unwind context changes during simulation and that causes the bug.
> > > >> >
> > > >> > Originally (prior 2012), #aboutToReturn:through: just called
> >'self
> > > >>methodReturnContext return: result' and didn't use the
> > > >>'firstUnwindContext' but in 2012 the call has been replaced with
> >'self
> > > >>methodReturnContext return: result through: firstUnwindContext' and
> > > >>the bug was born.
> > > >> >
> > > >> > In the changeset I suggest to use one common method replacing
> > > >>#resume:through: and #resumeEvaluating: which solves the bug as a
> > > >>side-effect with only a tiny change in #return:from: (simply supply
> > > >>nil instead of 'firstUnwindContext').
> > > >> >
> > > >> > I will post the changeset to the Inbox soon.
> > > >> >
> > > >> > Best,
> > > >> > Jaromir
> > > >> >
> > > >> >
> > > >> > --
> > > >> >
> > > >> > Jaromír Matas
> > > >> >
> > > >> > mail at jaromir.net
> > > >> >
> > > >> >
> > > >> > From: Jaromir Matas<mailto:mail at jaromir.net>
> > > >> > Sent: Thursday, February 2, 2023 15:07
> > > >> > To: Squeak Dev<mailto:squeak-dev at lists.squeakfoundation.org>
> > > >> > Subject: [squeak-dev] Code duplication around unwinding
> > > >> >
> > > >> > Hi all,
> > > >> >
> > > >> > When I started learning Squeak I wondered why so much code
> > > >>duplication around unwinding: Context>>#restart, #resume:through:,
> > > >>#resumeEvaluating:, #terminate and #unwindTo: all implementing the
> > > >>exact same algorithm. Kernel-jar.1499 suggests removing all this
> > > >>duplicity while keeping the current unwind semantics. All Process
> >and
> > > >>Exceptions tests are green.
> > > >> >
> > > >> > This changeset is the first step in my effort to reduce the
> >"noise"
> > > >>around the "collection" of resume/return methods in Context and
> > > >>Exception. One idea is to return calling the unwind mechanism back
> >to
> > > >>the Exception class, as originally implemented in Squeak 2.x up to
> > > >>3.5; please let me know if you find something fundamentally wrong
> >with
> > > >>this idea.
> > > >> >
> > > >> > Example: I find this call sequence begging for simplification:
> > > >> > Exception>>resume: --> resumeUnchecked: --> resumeEvaluating:
> >-->
> > > >>Context>>returnEvaluating: --> resumeEvaluating:
> > > >> >
> > > >> > Best,
> > > >> >
> > > >> > Jaromir
> > > >> >
> > > >> >
> > > >> > --
> > > >> >
> > > >> > Jaromír Matas
> > > >> >
> > > >> > mail at jaromir.net