[squeak-dev] Fixing the infinite debugger chains? (was: Code simulation error (was Re: I broke the debugger?))

Tue Jan 28 13:47:11 UTC 2020

Hi all,

I took a closer look at the ActiveProcess concept and I think I found a solution approach to fix all these nasty infinite debugger chains that occurred when any bytecode simulation error occurred. Please see the attached changeset.

The main cognition was the issue that any bytecode simulation error, which is executed behind an #evaluate:onBehalfOf: call, should *not* be debugged on the "behalf-of" process but indeed on the "basic" active process who called #evaluate:onBehalfOf:. Otherwise, the simulating process would not have been stopped by the debugger. See Process >> #step:, for example.

So I added Process >> #basicActiveProcess, which returns the real activeProcess without regard to #effectiveProcess. This method is used by StandardToolSet & debugger in order to identify the process to debug. At other places, #effectiveProcess must be never used as this would take the idea of ActiveProcess ad absurdum and make it impossible to debug certain processing logic.

The changeset also consists of a new test method in DebuggerTests that tests this regression.

So this is a list of recent issues that *won't* crash your image after loading this changeset:

  *
http://forum.world.st/BUG-s-in-Context-control-jump-runUntilErrorOrReturnFrom-td5107263.html (both scenarios)<http://forum.world.st/BUG-REGRESSION-while-debugging-Generator-nextPut-tp5108125p5109109.html>
  *   http://forum.world.st/BUG-REGRESSION-while-debugging-Generator-nextPut-tp5108125p5109109.html
  *
Processor activeProcess
evaluate: [self error. self inform: #foo]
onBehalfOf: [] newProcess.
  *
p := [| x |
[x := 5] value.
x] newProcess.
[p suspendedContext selector = thisContext selector]
whileFalse: [p step].
10 timesRepeat: [p step].
p suspendedContext pop.
p step
"(I used this one to reproduce the situation fixed by Kernel-ct.1296)"

They still may raise a "usual" error because the context simulation is buggy in some respects, but so far I did not manage to find another bug that crashes your image. And this should make it so much easier to debug and fix the remaining simulation bugs!

In any case, please review! :-) I'm almost sure you will have some criticism, but how do you think about the approach in general? I wonder whether there will be any situation where we cannot debug the senders of #basicActiveProcess properly because they don't follow the ActiveProcess concept. But in general, I think it's clearly an improvement against the current state of Trunk. I'm looking forward to your feedback!

Best,
Christoph

________________________________
Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Thiede, Christoph
Gesendet: Dienstag, 28. Januar 2020 09:17 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] Code simulation error (was Re: I broke the debugger?)

Hi Tim, excellent work!

Coincidentally, I studied the same problem yesterday, but I did not yet complete to report my observations to you. So let me to this hereby:

After many hours of funny debugging, now I could create this minimum failing example:

Processor activeProcess
evaluate: [self error. self inform: #foo]
onBehalfOf: [] newProcess

Expected behavior: First, a debugger is shown, and after proceeding it, a dialog window is shown.

Actual behavior: Both the debugger and the dialog window are shown asynchronously!

Suspicion of someone who did not yet dive deeply into the activeProcess concept: The debugger resumes the wrong process, as the activeProcess concept simulates a different running process for the error, even against the debugger.

If my theory is correct, we would need to find a way to look behind the scenes of the activeProcess and use it in the debugging code. But first, I really need to learn more about this concept.

(Connection to our Context >> #at: problems: Probably no primitive issue at all, just the fact, that #at: calls itself recursively after the error was proceeded - similar like #doesNotUnderstand: does.)

This is an in-midst-of-work message; just did not want us to any duplicate or redundant work. Will have a closer look at this disgusting problem ASAP!

And vice versa, it would be very nice if you could keep me/us up-to-date!

(Oh, what a fun to debug a self-simulating system ...)

Best,

Christoph

________________________________
Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von tim Rowledge <tim at rowledge.org>
Gesendet: Dienstag, 28. Januar 2020 03:08 Uhr
An: The general-purpose Squeak developers list
Betreff: [squeak-dev] Code simulation error (was Re: I broke the debugger?)

Something pretty weird is happening when the break is hit. I *finally* got a debugger open on a backtrace that includes the problem with Context>at: failing becasue the argument is 0. It's all a bit strange and unless I managed to do something very odd it looks like a fairly serious bug.

I actually caught this because something went wrong in code I added to try to log the dan initial error. Despite that it does appear to be a trace on the #break problem.

debugger>doStep called from #stepOver.
#handleLabelUpdatesIn:whenExecuting: used and does [interruptedProcess completeStep: currentContext] which uses...
Process>>evaluate:onBehalfOf:
Context>runUntilErrorOrReturnFrom:

Context>jump
 - *we are checking for stackp = 0 which is the very thing that causes problems later with the #pop*

In the #stepToSendOrReturn we use interpretNextInstructionFor: which leads to InterpretV3ClosuresExtension: 7 in: (Object>>break) for: ( aContext sender #on:do:, pc 24 stackp 0 method Object>>break, etc)
 -> doPop
 -> pop  (presumably stackp was 0 here? See above re: #jump)
 -> at: ... but if so why did the error code appear to skip over the first two tests of it?
        <primitive: 210>
        index = 0 ifTrue:[FileStream newFileNamed: 'squeakBreak.log' do:[:f| self errorReportOn: f]].
        index isInteger ifTrue:
                [self errorSubscriptBounds: index].
        index isNumber
                ifTrue: [^self at: index asInteger]"<--- it went here and on the second go around it picked up that index = 0 properly."
                ifFalse: [self errorNonIntegerIndex]

So I *think* that there is an issue in Context>jump where we explicitly check for stackp = 0 but then call code that carefully does a pop via #at:. Something about the primitive: 210 (maybe?) does something weird and the index is both 0 and not 0 - nor even an Integer.

As an interesting bonus, the clause I added to log things when 'index = 0' went very wrong because the 'f' getting passed to the block is apparently the MultiByteFileStream *class* rather than the opened file!

The bit bothering me at the moment is just how this can be a problem that hasn't whacked us before. I haven't been able to cause it with 'normal' code but all I was doing to fall over this was loading & testing the old Plumbing demo code.

Oh, I did just try to see if a newer vm (I am running the 201912311458 ARMv6) would have a fix for the prim 210 but no newer ARM VM will run at all. The latest Mac vm runs but fails with the same huge list of notifiers reporting the error in Context>at: - the fact that there is a *lot* of #at: on the stack is a little more odd.

tim
--
tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
Useful random insult:- If you stand close enough to him, you can hear the ocean

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200128/719a25f0/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fix-infinite-debuggers.2.cs
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200128/719a25f0/attachment.ksh>