The Trunk: SUnit-cmm.146.mcz

List overview All Threads
Download

newer

older

keyCode vs keyValue

The Trunk: SUnit-cmm.147.mcz

commits＠source.squeak.org

30 Apr 2024 30 Apr '24

7:43 a.m.

Chris Muller uploaded a new version of SUnit to project The Trunk: http://source.squeak.org/trunk/SUnit-cmm.146.mcz

==================== Summary ====================

Name: SUnit-cmm.146 Author: cmm Time: 30 April 2024, 12:43:19.759559 am UUID: 51424d55-94d0-4d2e-b8dc-f470b05cb44d Ancestors: SUnit-mt.145

Let SUnit provide consistent results when running vs. debugging, rather than quietly hiding SUnit timeout configuration bugs when debugging.

=============== Diff against SUnit-mt.145 ===============

Item was changed: ----- Method: TestCase>>debug (in category 'running') ----- debug "Run the receiver and open a debugger on the first failure or error."

+ ^ self assureResourcesDuring: [self runCase]! - ^ self assureResourcesDuring: [self runCaseWithoutTimeout]!

Item was changed: ----- Method: TestCase>>debugAsFailure (in category 'running') ----- debugAsFailure "Spawn a debugger that is ready to debug the receiver."

(Process forBlock: [self debug] runUntil: [:context | context isClosureContext "navigate the process directly to the point where it is about to send #setUp" + and: [context selector = #runCase]]) - and: [context selector = #runCaseWithoutTimeout]]) debug.!

Show replies by date

christoph.thiede＠student.hpi.uni-potsdam.de

30 Apr 30 Apr

5:05 p.m.

Hi Chris,

I think this breaks some of the debugging experience for SUnit significantly. Can we please revise this again? :-)

Here are some constraints that I would expect to work:

1) When I run a test that errors or fails from a system browser (message menu), a debugger is invoked on the error of the test. 2) When I debug a test from a system browser (message menu), a debugger is invoked that's execution is very close to test method, so I need only a few clicks to actually debug the test. (But you still should be able to reach the specific #setUp method from your test class.) 3) While I am debugging a test (i.e.: watching and stepping it in a debugger) and also thereafter, it must not time out. This is because I often need more than 5 seconds to investigate what it does, and it is fatal when a timeout failure error pops out and disallows me to continue stepping through the test execution.

Your change complicates 2) and breaks 3): Regarding 2), navigating to the test method previously required 4 steps (Over, Into, Over, Into), now it requires 7 steps (Over, Over, Through, Proceed, Into, Over, Into). Until I have performed these steps, the test often already has timed out - see 3).

Yet, I think I understand the motivation to your change:

4) When I run a test that times out from a system browser (message menu), I should be informed about the timeout.

Before your patch, this never (at least not for years?) has worked, because SUnit used to first run the test invisibly, catch the timeout failure, and then run the test again without catching exceptions but also without applying a timeout, thus the test would pass in the second run (or block the image forever if the test has an infinite loop). So I like that you have fixed that, this has been confusing me and some other people who I have seen using SUnit in the past.

But I still think we need to fix the above requirements 2) and 3). Especially 3) is critical IMHO. What would be some other strategies to make timeouts more discoverable? Display a pop-up notification when the automatic debugging of a test after it failed (see CodeHolder>>#testRunSuite: and TestRunner>>#debugSuite:) completes WITHOUT having raised a debugger (a simple exception handler should do this)? Automatically unattach the timeout watchdog in the moment the running test is being debugged (I don't know a simple way to do this, and if interrupting a running test would affect its execution semantics, this might cause a lot of confusion*). What else? :-)

*Note that there is a variant of 3) that has never worked so far when you run a test and interrupt it before it succeeds/fails, which already before your patch caused the mentioned timeout failures. This one has probably never been addressed to avoid exactly such kind of confusion. It's still not perfect ...

Please let's get this fixed again. Looking forward to your ideas! :-)

Best, Christoph

--- Sent from Squeak Inbox Talk

On 2024-04-30T05:43:21+00:00, commits@source.squeak.org wrote:

...

Chris Muller uploaded a new version of SUnit to project The Trunk: http://source.squeak.org/trunk/SUnit-cmm.146.mcz

==================== Summary ====================

Name: SUnit-cmm.146 Author: cmm Time: 30 April 2024, 12:43:19.759559 am UUID: 51424d55-94d0-4d2e-b8dc-f470b05cb44d Ancestors: SUnit-mt.145

Let SUnit provide consistent results when running vs. debugging, rather than quietly hiding SUnit timeout configuration bugs when debugging.

=============== Diff against SUnit-mt.145 ===============

Item was changed: ----- Method: TestCase>>debug (in category 'running') ----- debug "Run the receiver and open a debugger on the first failure or error."

^ self assureResourcesDuring: [self runCase]!

^ self assureResourcesDuring: [self runCaseWithoutTimeout]!

Item was changed: ----- Method: TestCase>>debugAsFailure (in category 'running') ----- debugAsFailure "Spawn a debugger that is ready to debug the receiver."

(Process forBlock: [self debug] runUntil: [:context | context isClosureContext "navigate the process directly to the point where it is about to send #setUp"
	and: [context selector = #runCase]])
	and: [context selector = #runCaseWithoutTimeout]])
	debug.!

Chris Muller

10:24 p.m.

Hi Christoph,

...

I think this breaks some of the debugging experience for SUnit significantly. Can we please revise this again? :-)

Of course we can. As you can see from the description, my intent was simply to fix the bug of silently hiding mis-configured SUnit timeouts.

What happened is I wrote a new TestCase and was puzzled as to why I was getting inconsistent results from the SUnit browser. When I ran it by clicking the "Run Selected" button in the lower left corner of the SUnit browser, a yellow failure appeared on a certain test. But clicking on that test would work just fine.

Attempting to use an old-school "halt" in the failing test to reveal the cause of the issue (lack of #defaultTimeout override) works in 5.3. But because you put in a suppression of #allErrors into TestResult>>#runCase: a while back, it didn't help me get to the bottom of the issue in trunk this time.

So, it was a confusing situation caused solely by the tool designed to provide clarity into our code. I'm not against the fancy IDE enhancements as long as the old-school ways of using SUnit can still work too.

...

Here are some constraints that I would expect to work:

When I run a test that errors or fails from a system browser (message menu), a debugger is invoked on the error of the test.

When I debug a test from a system browser (message menu), a debugger is invoked that's execution is very close to test method, so I need only a few clicks to actually debug the test. (But you still should be able to reach the specific #setUp method from your test class.)

I'd like to make sure things are running normally first (e.g., use of #run, #debug, #debugAsFailure from workspace code), THEN working from within the SUnit browser, THEN invocation from the message menu (et al) of regular browsers. I feel SUnit should not be responsible for having to be aware of all of these different ways of being run and, instead, those responsibilities relegated to the runners (more on this below).

...

While I am debugging a test (i.e.: watching and stepping it in a debugger) and also thereafter, it must not time out. This is because I often need more than 5 seconds to investigate what it does, and it is fatal when a timeout failure error pops out and disallows me to continue stepping through the test execution.

It's been an issue for a long time. You can "fix" it by doing what I've had to do over the years ever since "timeouts" were put into SUnit, by adding the following into all of your TestCase subclasses:

defaultTimeout ^ 999

I strongly fought against the introduction of timeouts into SUnit when it was originally proposed. The impetus at the time was a test got stuck in a loop. But all that was needed was for the *runner* of the test, the Jenkins server, to wrap its own running in some sort of onTimeout: [...] block. I tried to explain that it was wrong to write in specific run times into SUnit code, because it couples two worlds at a layer where they shouldn't be coupled. Unfortunately, I was overruled and it was decided to integrate specific timeouts at the TestCase instance level and make the framework aware of them.

Now through the years #defaultTimeout overrides have sprouted like weeds, and the painful symptoms of the masochistic guessing-game have caused absurd salves like #isLowerPerformance to emerge, which make the inappropriate coupling "relatively" less uncomfortable, thereby allowing the flaw to remain in the system.

...

Your change complicates 2) and breaks 3): Regarding 2), navigating to the test method previously required 4 steps (Over, Into, Over, Into), now it requires 7 steps (Over, Over, Through, Proceed, Into, Over, Into). Until I have performed these steps, the test often already has timed out - see 3).

So how do you want to proceed? I'm happy to roll back my change (or for you to) if you have a better fix. You do understand the motivation of my change:

...

Yet, I think I understand the motivation to your change:

When I run a test that times out from a system browser (message menu), I should be informed about the timeout.

Either that, or the concept of "timeouts" should be stripped from SUnit entirely. I mean, honestly, when I read stuff like this...

_____

...

What would be some other strategies to make timeouts more discoverable? Display a pop-up notification when the automatic debugging of a test after it failed (see CodeHolder>>#testRunSuite: and TestRunner>>#debugSuite:) completes WITHOUT having raised a debugger (a simple exception handler should do this)? Automatically unattach the timeout watchdog in the moment the running test is being debugged (I don't know a simple way to do this, and if interrupting a running test would affect its execution semantics, this might cause a lot of confusion*). What else?

____

... I can't help but shake my head at how the scourge of timeouts continues its cancerous growth, consuming and transforming SUnit from something that was really simple back in the day into something virtually incomprehensible today. We really should take a step back and ask ourselves, "How bad does it have to get before we reconsider the wisdom of this decision?"

Best, Chris

christoph.thiede＠student.hpi.uni-potsdam.de

1 May 1 May

5:19 p.m.

Hi Chris, Hi all,

thanks for your reply! Please find the attached changeset for an alternative approach. Does it cover all your use cases, and what do you think about the implementation?

...

Attempting to use an old-school "halt" in the failing test to reveal the cause of the issue (lack of #defaultTimeout override) works in 5.3. But because you put in a suppression of #allErrors into TestResult>>#runCase: a while back, it didn't help me get to the bottom of the issue in trunk this time.

I see, I think this was another trade-off for CI runners. Then again, I also don't think I would like to receive 20 debuggers in an unstoppable cascade when running a faulty suite in my local image. Maybe we should have separate runner( behavior)s for bulk-running and debugging? Open to discussing the inclusion of Halt in #allErrors again.

Also thank you for the background on timeouts. I think practice has shown some kind of real need for them (especially in non-interactive context such as CIs), so it might be a good idea to provide convenient ways through the framework to declare and control them. Not to mention compatibility. But I agree that clock time is an insufficient measurement (but other cost models are significantly more complex) and it increases the effort of writing tests. Maybe time-outs should be opt-in? Maybe "999" can be an effortless-enough trade-off? By the way, I think overriding #defaultTimeout like ^ super defaultTimeout * 10 should be noted a best practice, which allows us to define test complexities at least in relation to each other.

Best, Christoph

=============== Summary ===============

Change Set: SUnit-debug-timeouts Date: 1 May 2024 Author: Christoph Thiede

Revises handling of timeouts and unexpected passes in debugged tests: 1) Do not time out a test case while it is attached to a debugger. For this, store attached debuggers in a process' environment variables. 2) Merge Jaromir's race condition fixes from BlockClosure>>valueWithin:onTimeout: (Chronology-Core-jar.91) into TestCase>>timeout:after:. 3) Revert #debugAsFailure to skip timeouts to streamline stepping to the test method. 4) In #debug, notify the user if the test passed unexpectedly. 5) Document the breaking changes in TestCase>>debug and restore the previous behavior as #debugWithoutTimeout.

=============== Diff ===============

Debugger>>attachToProcess {initialize-release} · ct 5/1/2024 16:14 + attachToProcess + + (interruptedProcess environmentAt: #debuggers ifAbsentPut: [WeakSet new]) add: self.

Debugger>>proceed {context stack menu} · ct 5/1/2024 16:22 (changed) proceed "Proceed from the interrupted state of the currently selected context. The argument is the topView of the receiver. That view is closed. The active process usually suspends (or terminates) after this call."

| processToResume canResume |

Smalltalk okayToProceedEvenIfSpaceIsLow ifFalse: [^ self]. self okToChange ifFalse: [^ self]. self checkContextSelection. processToResume := interruptedProcess. canResume := self interruptedProcessShouldResume. + self unattachFromProcess. interruptedProcess := nil. "Before delete, so release doesn't terminate it" self close. Smalltalk installLowSpaceWatcher. "restart low space handler" canResume ifTrue: [self resumeProcess: processToResume] ifFalse: [self notify: 'This process should not resume.\Debugger will close now.' translated withCRs].

Debugger>>process:context: {initialize-release} · ct 5/1/2024 16:14 (changed) process: aProcess context: aContext

interruptedProcess := aProcess. + self attachToProcess.

self newStack: (aContext stackOfSize: 1). contextStackIndex := 1.

Debugger>>unattachFromProcess {initialize-release} · ct 5/1/2024 16:25 + unattachFromProcess + + (interruptedProcess environmentAt: #debuggers ifAbsent: [^ self]) + remove: self; + ifEmpty: [interruptedProcess environmentRemoveKey: #debuggers ifAbsent: []].

Debugger>>windowIsClosing {initialize-release} · ct 5/1/2024 16:14 (changed) windowIsClosing "My window is being closed; clean up. Restart the low space watcher."

contextStack := nil. receiverInspector := nil. contextVariablesInspector := nil. interruptedProcess == nil ifTrue: [^ self]. + self unattachFromProcess. self flag: #discuss. "mt: Maybe #fork the termination of the process. See - http://lists.squeakfoundation.org/pipermail/squeak-dev/2022-May/220675.html - http://lists.squeakfoundation.org/pipermail/squeak-dev/2022-June/221044.html" interruptedProcess perform: terminateProcessSelector. interruptedProcess := nil. Smalltalk installLowSpaceWatcher. "restart low space handler"

MyTestCase + TestCase subclass: #MyTestCase + instanceVariableNames: '' + classVariableNames: '' + poolDictionaries: '' + category: 'as yet unclassified' + + MyTestCase class + instanceVariableNames: '' + + ""

MyTestCase>>expectedFailures {as yet unclassified} · ct 5/1/2024 16:34 + expectedFailures + + ^ #(testPassUnexpected)

MyTestCase>>testFail {as yet unclassified} · ct 5/1/2024 16:04 + testFail + + self fail

MyTestCase>>testPass {as yet unclassified} · ct 5/1/2024 16:04 + testPass

MyTestCase>>testPassUnexpected {as yet unclassified} · ct 5/1/2024 16:34 + testPassUnexpected

MyTestCase>>testTimeout {as yet unclassified} · ct 5/1/2024 16:05 + testTimeout + <timeout: 0.1> + + 0.2 seconds wait.

MyTestCase>>testTimeoutLong {as yet unclassified} · ct 5/1/2024 16:28 + testTimeoutLong + <timeout: 5> + + Sensor shiftPressed ifTrue: [^ self]. + 10 seconds wait.

Object class>>releaseNotes {documentation} · ct 5/1/2024 16:52 (changed) releaseNotes "This is a scratch pad of release notes for the 6.0 release this version is building towards. Feel free to add to this comment mention of things that should appear in the release notes.

- <tbd>" + * SUnit: TestCase>>debug no longer omits timeouts and notifies the user about unexpected passes. Use #debugWithoutTimeout for the previous behavior."

self error: 'comment only'

Process>>isAttachedToDebugger {*System-debugging support} · ct 5/1/2024 16:40 + isAttachedToDebugger + + ^ (self environmentAt: #debuggers ifAbsent: [^ false]) notEmpty

TestCase>>debug {running} · ct 5/1/2024 16:51 (changed) debug - "Run the receiver and open a debugger on the first failure or error." + "Run the receiver and open a debugger on the first failure or error. Notify the user if the test passes without raising any debugger."

- ^ self assureResourcesDuring: [self runCaseWithoutTimeout] + | result signaled | + signaled := false. + result := self assureResourcesDuring: + [[self runCase] + on: TestResult allErrors , TestFailure ensure: + [signaled := true]]. + signaled ifFalse: [self inform: 'Test passed unexpectedly!' translated]. + ^ result

TestCase>>debugAsFailure {running} · ct 5/1/2024 16:49 (changed) debugAsFailure - "Spawn a debugger that is ready to debug the receiver." + "Spawn a debugger that is ready to debug the receiver without timeouts."

- self assureResourcesDuring: [ "also happens inside #debug but simulation is slower" - (Process - forBlock: [self debug] - runUntil: [:context | context isClosureContext "navigate the process directly to the point where it is about to send #setUp" - and: [context selector = #runCaseWithoutTimeout]]) - debug]. + (Process + forBlock: [self debugWithoutTimeout] "skip timeouts here to streamline stepping to the test method" + runUntil: [:context | context isClosureContext "navigate the process directly to the point where it is about to send #setUp" + and: [context selector = #runCaseWithoutTimeout]]) + debug.

TestCase>>debugWithoutTimeout {running} · ct 5/1/2024 16:07 + debugWithoutTimeout + "Run the receiver without timeout and open a debugger on the first failure or error." + + ^ self assureResourcesDuring: [self runCaseWithoutTimeout]

TestCase>>timeout:after: {private} · ct 5/1/2024 17:00 (changed) timeout: aBlock after: seconds "Evaluate the argument block. Time out if the evaluation is not complete after the given number of seconds. Handle the situation that a timeout may occur after a failure (during debug)"

| theProcess delay watchdog |

"the block will be executed in the current process" theProcess := Processor activeProcess. delay := Delay forSeconds: seconds.

"make a watchdog process" watchdog := [ - delay wait. "wait for timeout or completion" - theProcess ifNotNil:[ theProcess signalException: - (TestFailure new messageText: 'Test timed out' translated) ] + [delay wait] "wait for timeout or completion" + doWhileTrue: [theProcess isAttachedToDebugger]. "if the receiver is being debugged, restart the watchdog -- this prevents interrupting a debugging session but avoids infinite loops after proceeding from a debugger" + theProcess signalException: + (TestFailure new messageText: 'Test timed out' translated) ] newProcess.

"Watchdog needs to run at high priority to do its job (but not at timing priority)" watchdog priority: Processor timingPriority-1.

"catch the timeout signal" - watchdog resume. "start up the watchdog" - ^[aBlock on: TestFailure, TestResult allErrors do: [:ex| - theProcess := nil. - ex pass. - ]] ensure:[ "evaluate the receiver" - theProcess := nil. "it has completed, so ..." - delay delaySemaphore signal. "arrange for the watchdog to exit" - ] + ^ [watchdog resume. "start up the watchdog" + aBlock value] "evaluate the block" + ensure: + [watchdog terminate] "exit the watchdog"

--- Sent from Squeak Inbox Talk

On 2024-04-30T15:24:30-05:00, ma.chris.m@gmail.com wrote:

...

Hi Christoph,

...
I think this breaks some of the debugging experience for SUnit significantly. Can we please revise this again? :-)

Of course we can. As you can see from the description, my intent was simply to fix the bug of silently hiding mis-configured SUnit timeouts.

What happened is I wrote a new TestCase and was puzzled as to why I was getting inconsistent results from the SUnit browser. When I ran it by clicking the "Run Selected" button in the lower left corner of the SUnit browser, a yellow failure appeared on a certain test. But clicking on that test would work just fine.

Attempting to use an old-school "halt" in the failing test to reveal the cause of the issue (lack of #defaultTimeout override) works in 5.3. But because you put in a suppression of #allErrors into TestResult>>#runCase: a while back, it didn't help me get to the bottom of the issue in trunk this time.

So, it was a confusing situation caused solely by the tool designed to provide clarity into our code. I'm not against the fancy IDE enhancements as long as the old-school ways of using SUnit can still work too.

...
Here are some constraints that I would expect to work:

When I run a test that errors or fails from a system browser (message menu), a debugger is invoked on the error of the test.

When I debug a test from a system browser (message menu), a debugger is invoked that's execution is very close to test method, so I need only a few clicks to actually debug the test. (But you still should be able to reach the specific #setUp method from your test class.)

I'd like to make sure things are running normally first (e.g., use of #run, #debug, #debugAsFailure from workspace code), THEN working from within the SUnit browser, THEN invocation from the message menu (et al) of regular browsers. I feel SUnit should not be responsible for having to be aware of all of these different ways of being run and, instead, those responsibilities relegated to the runners (more on this below).

...

While I am debugging a test (i.e.: watching and stepping it in a debugger) and also thereafter, it must not time out. This is because I often need more than 5 seconds to investigate what it does, and it is fatal when a timeout failure error pops out and disallows me to continue stepping through the test execution.

It's been an issue for a long time. You can "fix" it by doing what I've had to do over the years ever since "timeouts" were put into SUnit, by adding the following into all of your TestCase subclasses:

defaultTimeout ^ 999

I strongly fought against the introduction of timeouts into SUnit when it was originally proposed. The impetus at the time was a test got stuck in a loop. But all that was needed was for the *runner* of the test, the Jenkins server, to wrap its own running in some sort of onTimeout: [...] block. I tried to explain that it was wrong to write in specific run times into SUnit code, because it couples two worlds at a layer where they shouldn't be coupled. Unfortunately, I was overruled and it was decided to integrate specific timeouts at the TestCase instance level and make the framework aware of them.

Now through the years #defaultTimeout overrides have sprouted like weeds, and the painful symptoms of the masochistic guessing-game have caused absurd salves like #isLowerPerformance to emerge, which make the inappropriate coupling "relatively" less uncomfortable, thereby allowing the flaw to remain in the system.

...
Your change complicates 2) and breaks 3): Regarding 2), navigating to the test method previously required 4 steps (Over, Into, Over, Into), now it requires 7 steps (Over, Over, Through, Proceed, Into, Over, Into). Until I have performed these steps, the test often already has timed out - see 3).

So how do you want to proceed? I'm happy to roll back my change (or for you to) if you have a better fix. You do understand the motivation of my change:

...
Yet, I think I understand the motivation to your change:

When I run a test that times out from a system browser (message menu), I should be informed about the timeout.

Either that, or the concept of "timeouts" should be stripped from SUnit entirely. I mean, honestly, when I read stuff like this...

...
What would be some other strategies to make timeouts more discoverable? Display a pop-up notification when the automatic debugging of a test after it failed (see CodeHolder>>#testRunSuite: and TestRunner>>#debugSuite:) completes WITHOUT having raised a debugger (a simple exception handler should do this)? Automatically unattach the timeout watchdog in the moment the running test is being debugged (I don't know a simple way to do this, and if interrupting a running test would affect its execution semantics, this might cause a lot of confusion*). What else?

... I can't help but shake my head at how the scourge of timeouts continues its cancerous growth, consuming and transforming SUnit from something that was really simple back in the day into something virtually incomprehensible today. We really should take a step back and ask ourselves, "How bad does it have to get before we reconsider the wisdom of this decision?"

Best, Chris

Chris Muller

11:01 p.m.

New subject: SUnit timeouts considered harmful (was: The Trunk: SUnit-cmm.146.mcz)

Hi Christoph,

This is a really bold solution, and impressive implementation. I wonder, though, if it furthers the growth of the overly-complex "solution" to a not-an-actual-problem. Let's step back and review the actual use-cases again:

1) debug interactively 2) run interactively 3) run in batch (CI server)

For both the interactive cases, the "perfect timeout" is no timeout. If there's an issue in the time domain running interactively, the user will use the tool designed for that -- the interrupt key. So, interactive = no timeout. That leaves use-case 3.

...

I also don't think I would like to receive 20 debuggers in an unstoppable

cascade when running a faulty suite in my local image.

You just (inadvertently) made the key point right there -- the *suite* is where the fault lies, and is where the fix should be made, not in SUnit's base code. Test suite code IS part of SUnit just as much as its base code is. When that one test locked up the CI server 25 years ago, it should've just been fixed. Instead, we let it change how we think about SUnit from an integrated "tool" that serves developers, to an "App" that they have to serve, and maintain, even at cost to its original purpose.

And if the CI server trusting all TestCases to behave was its first "mistake," why is trusting the SUnit framework to behave not its second? Shouldn't it need to bother to take care of ITSELF properly, constructing TestCase instances one by one via the #selector: constructor if necessary, and running them each wrapped in its own onTimeout: block? It could start each test at 5 seconds and give the ones that time out an additional 5 seconds for next time until they have enough. This reassignment of responsibility alleviates developers from ever having to fiddle with timeouts, and the dynamic calculation allows it to work across different speed platforms seamlessly. It maybe even facilitates calculation of something useful like if a subsequent run of the test requires significantly longer...

If we want to make Squeak smaller and simpler, ridding SUnit of all notion of timeouts is a low-hanging fruit. It would result in the clean out of dozens of unnecessary methods, classes, preferences, and pragmas. Can anyone else support the idea?

Until then, I've gone ahead and reverted my change.

Regards, Chris

On Wed, May 1, 2024 at 10:20 AM christoph.thiede@student.hpi.uni-potsdam.de wrote:

...

Hi Chris, Hi all,

thanks for your reply! Please find the attached changeset for an alternative approach. Does it cover all your use cases, and what do you think about the implementation?

...
Attempting to use an old-school "halt" in the failing test to reveal the

cause of the issue (lack of #defaultTimeout override) works in 5.3. But because you put in a suppression of #allErrors into TestResult>>#runCase: a while back, it didn't help me get to the bottom of the issue in trunk this time.

I see, I think this was another trade-off for CI runners. Then again, I also don't think I would like to receive 20 debuggers in an unstoppable cascade when running a faulty suite in my local image. Maybe we should have separate runner( behavior)s for bulk-running and debugging? Open to discussing the inclusion of Halt in #allErrors again.

Also thank you for the background on timeouts. I think practice has shown some kind of real need for them (especially in non-interactive context such as CIs), so it might be a good idea to provide convenient ways through the framework to declare and control them. Not to mention compatibility. But I agree that clock time is an insufficient measurement (but other cost models are significantly more complex) and it increases the effort of writing tests. Maybe time-outs should be opt-in? Maybe "999" can be an effortless-enough trade-off? By the way, I think overriding #defaultTimeout like ^ super defaultTimeout * 10 should be noted a best practice, which allows us to define test complexities at least in relation to each other.

Best, Christoph

*=============== Summary ===============*

Change Set: SUnit-debug-timeouts Date: 1 May 2024 Author: Christoph Thiede

Revises handling of timeouts and unexpected passes in debugged tests:

Do not time out a test case while it is attached to a debugger. For

this, store attached debuggers in a process' environment variables. 2) Merge Jaromir's race condition fixes from BlockClosure>>valueWithin:onTimeout: (Chronology-Core-jar.91) into TestCase>>timeout:after:. 3) Revert #debugAsFailure to skip timeouts to streamline stepping to the test method. 4) In #debug, notify the user if the test passed unexpectedly. 5) Document the breaking changes in TestCase>>debug and restore the previous behavior as #debugWithoutTimeout.

*=============== Diff ===============*

*Debugger>>attachToProcess {initialize-release} · ct 5/1/2024 16:14*
attachToProcess
(interruptedProcess environmentAt: #debuggers ifAbsentPut: [WeakSet
new]) add: self.

*Debugger>>proceed {context stack menu} · ct 5/1/2024 16:22 (changed)* proceed "Proceed from the interrupted state of the currently selected context. The argument is the topView of the receiver. That view is closed. The active process usually suspends (or terminates) after this call."
| processToResume canResume |

Smalltalk okayToProceedEvenIfSpaceIsLow ifFalse: [^ self].

self okToChange ifFalse: [^ self].
self checkContextSelection.

processToResume := interruptedProcess.
canResume := self interruptedProcessShouldResume.
self unattachFromProcess.
interruptedProcess := nil. "Before delete, so release doesn't
terminate it" self close.
Smalltalk installLowSpaceWatcher. "restart low space handler"

canResume
    ifTrue: [self resumeProcess: processToResume]
    ifFalse: [self notify: 'This process should not resume.\Debugger
will close now.' translated withCRs].

*Debugger>>process:context: {initialize-release} · ct 5/1/2024 16:14 (changed)* process: aProcess context: aContext
interruptedProcess := aProcess.
self attachToProcess.
self newStack: (aContext stackOfSize: 1). contextStackIndex := 1.
*Debugger>>unattachFromProcess {initialize-release} · ct 5/1/2024 16:25*
unattachFromProcess
(interruptedProcess environmentAt: #debuggers ifAbsent: [^ self])
    remove: self;
    ifEmpty: [interruptedProcess environmentRemoveKey: #debuggers
ifAbsent: []].

*Debugger>>windowIsClosing {initialize-release} · ct 5/1/2024 16:14 (changed)* windowIsClosing "My window is being closed; clean up. Restart the low space watcher."
contextStack := nil.
receiverInspector := nil.
contextVariablesInspector := nil.

interruptedProcess == nil ifTrue: [^ self].
self unattachFromProcess.
self flag: #discuss. "mt: Maybe #fork the termination of the process. See -
http://lists.squeakfoundation.org/pipermail/squeak-dev/2022-May/220675.html - http://lists.squeakfoundation.org/pipermail/squeak-dev/2022-June/221044.html " interruptedProcess perform: terminateProcessSelector. interruptedProcess := nil.
Smalltalk installLowSpaceWatcher. "restart low space handler"
*MyTestCase*
TestCase subclass: #MyTestCase
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'as yet unclassified'
MyTestCase class
instanceVariableNames: ''
""
*MyTestCase>>expectedFailures {as yet unclassified} · ct 5/1/2024 16:34*
expectedFailures
^ #(testPassUnexpected)
*MyTestCase>>testFail {as yet unclassified} · ct 5/1/2024 16:04*
testFail
self fail
*MyTestCase>>testPass {as yet unclassified} · ct 5/1/2024 16:04*

testPass

*MyTestCase>>testPassUnexpected {as yet unclassified} · ct 5/1/2024 16:34*

testPassUnexpected

*MyTestCase>>testTimeout {as yet unclassified} · ct 5/1/2024 16:05*
testTimeout
<timeout: 0.1>
0.2 seconds wait.
*MyTestCase>>testTimeoutLong {as yet unclassified} · ct 5/1/2024 16:28*
testTimeoutLong
<timeout: 5>
Sensor shiftPressed ifTrue: [^ self].
10 seconds wait.
*Object class>>releaseNotes {documentation} · ct 5/1/2024 16:52 (changed)* releaseNotes "This is a scratch pad of release notes for the 6.0 release this version is building towards. Feel free to add to this comment mention of things that should appear in the release notes.
<tbd>"
* SUnit: TestCase>>debug no longer omits timeouts and notifies the
user about unexpected passes. Use #debugWithoutTimeout for the previous behavior."
self error: 'comment only'
*Process>>isAttachedToDebugger {*System-debugging support} · ct 5/1/2024 16:40*
isAttachedToDebugger
^ (self environmentAt: #debuggers ifAbsent: [^ false]) notEmpty
*TestCase>>debug {running} · ct 5/1/2024 16:51 (changed)* debug
"Run the receiver and open a debugger on the first failure or error."
"Run the receiver and open a debugger on the first failure or error.
Notify the user if the test passes without raising any debugger."
^ self assureResourcesDuring: [self runCaseWithoutTimeout]
| result signaled |
signaled := false.
result := self assureResourcesDuring:
    [[self runCase]
        on: TestResult allErrors , TestFailure ensure:
            [signaled := true]].
signaled ifFalse: [self inform: 'Test passed unexpectedly!'
translated].
^ result
*TestCase>>debugAsFailure {running} · ct 5/1/2024 16:49 (changed)* debugAsFailure
"Spawn a debugger that is ready to debug the receiver."
"Spawn a debugger that is ready to debug the receiver without
timeouts."
self assureResourcesDuring: [ "also happens inside #debug but
simulation is slower"
    (Process
        forBlock: [self debug]
        runUntil: [:context | context isClosureContext "navigate the
process directly to the point where it is about to send #setUp"
            and: [context selector = #runCaseWithoutTimeout]])
                debug].
(Process
    forBlock: [self debugWithoutTimeout] "skip timeouts here to
streamline stepping to the test method"
    runUntil: [:context | context isClosureContext "navigate the
process directly to the point where it is about to send #setUp"
        and: [context selector = #runCaseWithoutTimeout]])
            debug.
*TestCase>>debugWithoutTimeout {running} · ct 5/1/2024 16:07*
debugWithoutTimeout
"Run the receiver without timeout and open a debugger on the first
failure or error."
^ self assureResourcesDuring: [self runCaseWithoutTimeout]
*TestCase>>timeout:after: {private} · ct 5/1/2024 17:00 (changed)* timeout: aBlock after: seconds "Evaluate the argument block. Time out if the evaluation is not complete after the given number of seconds. Handle the situation that a timeout may occur after a failure (during debug)"
| theProcess delay watchdog |

"the block will be executed in the current process"
theProcess := Processor activeProcess.
delay := Delay forSeconds: seconds.

"make a watchdog process"
watchdog := [
    delay wait.     "wait for timeout or completion"
    theProcess ifNotNil:[ theProcess signalException:
        (TestFailure new messageText: 'Test timed out' translated) ]
    [delay wait]     "wait for timeout or completion"
        doWhileTrue: [theProcess isAttachedToDebugger].    "if the
receiver is being debugged, restart the watchdog -- this prevents interrupting a debugging session but avoids infinite loops after proceeding from a debugger"
    theProcess signalException:
        (TestFailure new messageText: 'Test timed out' translated)
] newProcess.

"Watchdog needs to run at high priority to do its job (but not at
timing priority)" watchdog priority: Processor timingPriority-1.
"catch the timeout signal"
watchdog resume.                "start up the watchdog"
^[aBlock on: TestFailure, TestResult allErrors do: [:ex|
    theProcess := nil.
    ex pass.
]] ensure:[                            "evaluate the receiver"
    theProcess := nil.                "it has completed, so ..."
    delay delaySemaphore signal.    "arrange for the watchdog to
exit"
]
^ [watchdog resume.                "start up the watchdog"
 aBlock value]                        "evaluate the block"
    ensure:
        [watchdog terminate]        "exit the watchdog"
*Sent from **Squeak Inbox Talk https://github.com/hpi-swa-lab/squeak-inbox-talk*

On 2024-04-30T15:24:30-05:00, ma.chris.m@gmail.com wrote:

...
Hi Christoph,

...
I think this breaks some of the debugging experience for SUnit

significantly. Can we please revise this again? :-)

...
Of course we can. As you can see from the description, my intent was simply to fix the bug of silently hiding mis-configured SUnit timeouts.

What happened is I wrote a new TestCase and was puzzled as to why I was getting inconsistent results from the SUnit browser. When I ran it by clicking the "Run Selected" button in the lower left corner of the SUnit browser, a yellow failure appeared on a certain test. But clicking on that test would work just fine.

Attempting to use an old-school "halt" in the failing test to reveal the cause of the issue (lack of #defaultTimeout override) works in 5.3. But because you put in a suppression of #allErrors into TestResult>>#runCase: a while back, it didn't help me get to the bottom of the issue in trunk this time.

So, it was a confusing situation caused solely by the tool designed to provide clarity into our code. I'm not against the fancy IDE enhancements as long as the old-school ways of using SUnit can still work too.

...
Here are some constraints that I would expect to work:

When I run a test that errors or fails from a system browser

(message menu), a debugger is invoked on the error of the test.

...
...

When I debug a test from a system browser (message menu), a

debugger is invoked that's execution is very close to test method, so I need only a few clicks to actually debug the test. (But you still should be able to reach the specific #setUp method from your test class.)

...
I'd like to make sure things are running normally first (e.g., use of #run, #debug, #debugAsFailure from workspace code), THEN working from within the SUnit browser, THEN invocation from the message menu (et al) of regular browsers. I feel SUnit should not be responsible for having to be aware of all of these different ways of being run and, instead, those responsibilities relegated to the runners (more on this below).

...

While I am debugging a test (i.e.: watching and stepping it in a

debugger) and also thereafter, it must not time out. This is because I often need more than 5 seconds to investigate what it does, and it is fatal when a timeout failure error pops out and disallows me to continue stepping through the test execution.

...
It's been an issue for a long time. You can "fix" it by doing what I've had to do over the years ever since "timeouts" were put into SUnit, by adding the following into all of your TestCase subclasses:

defaultTimeout ^ 999

I strongly fought against the introduction of timeouts into SUnit when it was originally proposed. The impetus at the time was a test got stuck in a loop. But all that was needed was for the *runner* of the test, the Jenkins server, to wrap its own running in some sort of onTimeout: [...] block. I tried to explain that it was wrong to write in specific run times into SUnit code, because it couples two worlds at a layer where they shouldn't be coupled. Unfortunately, I was overruled and it was decided to integrate specific timeouts at the TestCase instance level and make the framework aware of them.

Now through the years #defaultTimeout overrides have sprouted like weeds, and the painful symptoms of the masochistic guessing-game have caused absurd salves like #isLowerPerformance to emerge, which make the inappropriate coupling "relatively" less uncomfortable, thereby allowing the flaw to remain in the system.

...
Your change complicates 2) and breaks 3): Regarding 2), navigating to

the test method previously required 4 steps (Over, Into, Over, Into), now it requires 7 steps (Over, Over, Through, Proceed, Into, Over, Into). Until I have performed these steps, the test often already has timed out - see 3).

...
So how do you want to proceed? I'm happy to roll back my change (or for you to) if you have a better fix. You do understand the motivation of my change:

...
Yet, I think I understand the motivation to your change:

When I run a test that times out from a system browser (message

menu), I should be informed about the timeout.

...
Either that, or the concept of "timeouts" should be stripped from SUnit entirely. I mean, honestly, when I read stuff like this...

...
What would be some other strategies to make timeouts more

discoverable? Display a pop-up notification when the automatic debugging of a test after it failed (see CodeHolder>>#testRunSuite: and TestRunner>>#debugSuite:) completes WITHOUT having raised a debugger (a simple exception handler should do this)? Automatically unattach the timeout watchdog in the moment the running test is being debugged (I don't know a simple way to do this, and if interrupting a running test would affect its execution semantics, this might cause a lot of confusion*). What else?

...

... I can't help but shake my head at how the scourge of timeouts continues its cancerous growth, consuming and transforming SUnit from something that was really simple back in the day into something virtually incomprehensible today. We really should take a step back and ask ourselves, "How bad does it have to get before we reconsider the wisdom of this decision?"

Best, Chris

Taeumel, Marcel

2 May 2 May

10:36 a.m.

New subject: SUnit timeouts considered harmful (was: The Trunk: SUnit-cmm.146.mcz)

Hi Chris, hi Christoph, hi all --

Task-specific upper bounds for space and/or time are practical to get *some* insight in batch runs. They are annoying in single-task (or selected-tasks), interactive runs. In Squeak, we have the user-interrupt for suspiciously long-running or memory-hogging tasks.

Magical, static numbers for such upper bounds are bad as we cannot foresee the circumstances where such tasks will be executed (e.g., locally vs. CI). Such numbers will likely not be "correct" for the average use case (or run).

For a given platform, I like the idea of dynamically managed upper bounds (e.g., time-outs). Still, for example, GitHub Actions might not be fit to update such dynamic information.

In general, I prefer simpler, cleaner implementations over the attempts that try to "cover every use case", but lead to quite complex implementations. Sometimes, one should re-design the underlying concepts to then get a clean and simple implementation again. :-) ... while caring for backwards compatibility if feasible ;-)

Best, Marcel

Am 01.05.2024 23:17:59 schrieb Chris Muller asqueaker@gmail.com:

Hi Christoph,

1) debug interactively 2) run interactively 3) run in batch (CI server)

...

I also don't think I would like to receive 20 debuggers in an unstoppable cascade when running a faulty suite in my local image.

You just (inadvertently) made the key point right there -- the suite is where the fault lies, and is where the fix should be made, not in SUnit's base code. Test suite code IS part of SUnit just as much as its base code is. When that one test locked up the CI server 25 years ago, it should've just been fixed. Instead, we let it change how we think about SUnit from an integrated "tool" that serves developers, to an "App" that they have to serve, and maintain, even at cost to its original purpose.

Until then, I've gone ahead and reverted my change.

Regards, Chris

On Wed, May 1, 2024 at 10:20 AM <christoph.thiede@student.hpi.uni-potsdam.demailto:christoph.thiede@student.hpi.uni-potsdam.de> wrote: Hi Chris, Hi all,

thanks for your reply! Please find the attached changeset for an alternative approach. Does it cover all your use cases, and what do you think about the implementation?

...

Attempting to use an old-school "halt" in the failing test to reveal the cause of the issue (lack of #defaultTimeout override) works in 5.3. But because you put in a suppression of #allErrors into TestResult>>#runCase: a while back, it didn't help me get to the bottom of the issue in trunk this time.

Best, Christoph

=============== Summary ===============

Change Set: SUnit-debug-timeouts Date: 1 May 2024 Author: Christoph Thiede

=============== Diff ===============

Debugger>>attachToProcess {initialize-release} · ct 5/1/2024 16:14 + attachToProcess + + (interruptedProcess environmentAt: #debuggers ifAbsentPut: [WeakSet new]) add: self.

| processToResume canResume |

Smalltalk okayToProceedEvenIfSpaceIsLow ifFalse: [^ self].

self okToChange ifFalse: [^ self]. self checkContextSelection.

processToResume := interruptedProcess. canResume := self interruptedProcessShouldResume.

+ self unattachFromProcess. interruptedProcess := nil. "Before delete, so release doesn't terminate it" self close.

Smalltalk installLowSpaceWatcher. "restart low space handler"

canResume ifTrue: [self resumeProcess: processToResume] ifFalse: [self notify: 'This process should not resume.\Debugger will close now.' translated withCRs].

Debugger>>process:context: {initialize-release} · ct 5/1/2024 16:14 (changed) process: aProcess context: aContext

interruptedProcess := aProcess. + self attachToProcess.

self newStack: (aContext stackOfSize: 1). contextStackIndex := 1.

Debugger>>windowIsClosing {initialize-release} · ct 5/1/2024 16:14 (changed) windowIsClosing "My window is being closed; clean up. Restart the low space watcher."

contextStack := nil. receiverInspector := nil. contextVariablesInspector := nil.

interruptedProcess == nil ifTrue: [^ self]. + self unattachFromProcess. self flag: #discuss. "mt: Maybe #fork the termination of the process. See - http://lists.squeakfoundation.org/pipermail/squeak-dev/2022-May/220675.html - http://lists.squeakfoundation.org/pipermail/squeak-dev/2022-June/221044.html" interruptedProcess perform: terminateProcessSelector. interruptedProcess := nil.