On Wed, Jan 3 2024 at 04:33:11 PM +0000, "Thiede, Christoph" <Christoph.Thiede@student.hpi.uni-potsdam.de> wrote:

Hi all,

As promised, I would like to illustrate some of the potential of our recently announced TraceDebugger [1, 2] by describing a few possible use cases. In this post, I will explain how we can use the TraceDebugger to debug a bug in the Shout (syntax highlighting as you type) parser.

Let's start with the bug we want to examine: When you type the following expression into a workspace or browser with syntax highlighting enabled, it will be displayed as valid:

self griffle; plonk wiffy.

However, if we try to evaluate it, we can see that it is actually not valid:

self griffle; plonk "Nothing more expected ->"wiffy

We can reproduce this bug in Shout with the following do-it:

SHParserST80 new

    source: 'self griffle; plonk wiffy';

    parse;

    ranges. -->  an OrderedCollection(a SHRange(#self, 1, 4) a SHRange(#unary, 6, 9) a SHRange(#cascadeSeparator, 10, 10) a SHRange(#unary, 12, 14) a SHRange(#unary, 16, 18))

As we can see, the last token (wiffy) is parsed as a unary range. What we would want to see instead would be #excessCode like in this example:

SHParserST80 new
    source: 'self griffle ^2';
    parse;

    ranges. --> an OrderedCollection(a SHRange(#self, 1, 4) a SHRange(#unary, 6, 12) a SHRange(#excessCode, 13, 15))

So, let's start to debug this bug by selecting the expression and performing a normal debug-it (Cmd + Shift + D). We could now use the regular debugger to step into the different methods of the parser, but this is not without a challenge. For instance, let's assume we have reached SHParserST80>>#parseStatementList, and the next message send is parseStatement:

If we are not sure where the last range is parsed, we might step over this message send. However, this turns out as wrong, and we can see that after that message send, all the five ranges have already been produced:

Unfortunately, there is no easy way back from here. We might try to restart from this method, but the parser has already changed the value of sourcePosition, so if we just executed this method again, these side-effects would not be undone and we could not reproduce the original behavior again. Alternatively, we could restart the debugger from the original do-it and tediously navigate back to the previous position in this method. This is a general limitation of traditional debuggers: not being able to go back again. In fear of such situations, I often find myself stepping into probably irrelevant messages on spec, just to avoid stepping too far. Obviously, this causes me to lose a lot of time and distract myself with irrelevant implementation details.

After all of this introduction, how can the TraceDebugger help us investigate such situations more conveniently and efficiently? To try that out, let's revert our existing debugger to the beginning again. We can do that by selecting the DoIt method and restarting from there, or just by performing a debug-it on the original snippet again:

From here, we can now press the new Trace It button at the right to convert the existing debugger into a trace debugger:

We can notice that the trace debugger looks very similar to a normal debugger, but there are three important differences: first, instead of the context stack, it displays us a context tree (which currently looks similar to a stack, but this will change in the next step), and second, there is a new Step Back button (which is disabled right now, but that will change, too). Third, the window title displays the current time index of the execution ("@ 0"). Let's use this trace debugger to perform the same steps we have done in the regular Smalltalk debugger before:

As we stepped into the different parser methods, we can see that the time index in the window title has advanced ("@ 779", the number of instructions performed so far), and the context tree has grown step by step and displays not only the current stack (the black methods) but also all prior method invocations (in gray) that we have stepped over. Note that I have set up the DoIt method as a border context from its context menu to declutter the tree a little bit.

And that tree is not there for decoration only but we can actually use it to explore all these method invocations. For example, we can expand some of the context nodes or hover one of them to see all its arguments and its return value:

If we find that method interesting, we can select it to explore its execution again:

There are two things to note here: first, the time index in the window title has decreased and has been complemented by the note "[historic]", as we are now watching a method that has been executed in the past. This is also visualized through the background color of the window which has changed to brown. Second, the color of the methods in the tree has changed, and now the stack of SHParserST80>>parseVerticalBarForTemporaries: is displayed as active while SHParserST80>>parseStatementList has turned gray. Despite we are now exploring a historic method execution, we can still use the stepping buttons to navigate through this execution and reenact what messages it has sent.

Anyway, the TraceDebugger does not only record past method invocations but also past states. We can see that when we navigate back to SHParserST80>>parseTemporaries and press Restart to rewind this method to its beginning. If we select one of the receiver's or temporary variables at the bottom of the trace debugger, we can see their prior value at this point in time:

Only if we step after the assignment to our variable, the new value is displayed in this inspector:

So, we can retrace all state changes or side effects in the TraceDebugger. And that's not all! Next to selecting variables in the inspector, we can also send any messages to them, right back into the past:

While that's obviously a toy example, it shows that we can execute any code in the historic context of the currently selected time index in the trace debugger, which can be very useful to see earlier printStrings of objects, retrieve information by sending test selectors to them, or add custom inspector fields to ask objects how they would have responded to a certain question in the past. I will expand on that in a later post.

However, after all of that playing around, let's come back to our actual question (why is the wiffy token not parsed as an excessCode range?). To do that, we need to go back (or rather forward) to SHParserST80>>parseStatementList again. We can do that by selecting this method in the tree again, but another way is to press Cmd + Shift + D while focusing the context tree to jump to present (you can also find that command, plus a few others, in the context menu of the tree). Note that this also removes the [historic] note in the window title and changes the trace debugger's color back to a vivid red:

At this point, the ranges array of the parser is still empty, so we can step a bit forward again. After stepping over parseStatement, we have that "Oh no, I've stepped too far" situation from above again as all the ranges have already been produced:

But this time, this is not a problem, since we have traced everything inside that message send. We can now press the new Step Back button to revert the trace debugger just by one step before the parseStatement again:

And then finally step into it to retrace its behavior:

So in other words, with the TraceDebugger we can overcome the limitation of unidirectional navigation (i.e., only being able to step forward) from traditional debuggers and navigate multidirectionally instead: forward, backward, or arbitrarily by selecting specific methods in the context tree. In the same fashion, we can continue to descend into the tree by performing something like a hierarchical binary search and stepping over or into message sends as they seem interesting or back as we missed a relevant message send. A bit later, we discover SHParserST80>>parseCascade which first parses the receiver with the first message send, then the semicolon (#cascadeSeparator), and finally the last two tokens as unaries as we step over the second #parseKeyword send:

What can this discovery tell us? #parseKeyword can parse an entire chain of messages which is legal for the first part of the cascade, but illegal after the first semicolon. Thus the shout parser should likely not send the same parsing method for both cases. A possible fix might extract the first #scanPast: sends from #parseUnary and #parseBinary into separate methods so #parseCacade could ask for a single message send after the cascade separator.

However, this post is not primarily about a small bug in Shout but about using the TraceDebugger to investigate bugs like it. So to sum up, in this post we have seen how we can invoke a trace debugger, record expressions in it, explore their executions in the context tree, retrace them by navigating using the tree and the forward/stepping buttons in a multidirectional fashion, and inspect and interact with the prior state of objects and temporary variables. This allows us to overcome the limitation of regular debuggers where we can only explore programs in the order of their original execution, but navigate through them with an increased degree of freedom. More broadly, the TraceDebugger provides first-class objects for the execution of programs that can be explored in an unconstrained, declarative fashion.

I hope you like it! This is the first time I wrote a detailed tutorial like this on the mailing list, so any feedback is welcome - both regarding the tutorial and the TraceDebugger. :-)

Best,

Christoph

[1] https://github.com/hpi-swa-lab/squeak-tracedebugger

[2] https://lists.squeakfoundation.org/archives/list/squeak-dev@lists.squeakfoundation.org/thread/DTEFS6KIV7PS3FLE4RRNPTYEES52FVKX/