[squeak-dev] Re: [Pharo-project] SUnit Time out

Wed Jun 2 04:36:48 UTC 2010

Hi Chris -

Let me comment on this from a more general point of view first, before 
going into the specifics. I've spent the last five years building a 
distributed system and during this time I've learned a couple of things 
about the value of timeouts :-) One thing that I've come to understand 
is that *no* operation is unbounded. We may leisurely talk about "just 
wait until it's done" but the reality is that regardless of what the 
operation is we never actually wait forever. At some point we *will* 
give up no matter what you may think. This is THE fundamental point 
here. Everything else is basically haggling about what the right timeout is.

For the right timeout the second fundamental thing to understand is that 
if there's a question of whether the operation "maybe" completed, then 
your timeout is too short. Period. The timeout's value is not to 
indicate that "maybe" the operation completed, it is there to say 
unequivocally that something caused it to not complete and that it DID fail.

Obviously, introducing timeouts will create some initial false 
positives. But it may be interesting to be a bit more precise on what 
we're talking about. To do this I attributed TestRunner to measure the 
time it takes to run each test and then ran all the tests in 4.2 to see 
where that leads us. As you might expect, the distribution is extremely 
uneven. Out of 2681 tests run  2588 execute in < 500 msecs (approx. 1800 
execute with no measurable time);  2630 execute in less than one second, 
leaving a total of 51 that take more than a second and only three tests 
actually take longer than 5 seconds and they are all tagged as such.

As you can see the vast majority of tests have a "safety margin" of 10x 
or more between the time the test usually takes and its timeout value. 
Generally speaking, this margin is sufficient to compensate for "other" 
effects that might rightfully delay the completion of the test in time. 
If you have tests that commonly vary by 10x I'd be interested in finding 
out more about what makes them so unpredictable.

So if your question is "are my timeouts to tight" one thing we could do 
is to introduce the 10x as a more or less general guideline for 
executing tests, and perhaps add a transcript notifier if we ever come 
closer than 1/3rd of the specified timeout value (i.e., indicating that 
something in the nature of the test has changed that should be reflected 
in its timeout). This would give you ample warning that you need to 
adjust your test even if it isn't (yet) failing on the timeout.

That said, a couple of concrete comments to your post:

On 5/30/2010 11:52 AM, Chris Muller wrote:
> (Copying squeak-dev too).
>
> I'm not sold on the whole test timeout thing.  When I run tests, I
> want to know the answer to the question, "is the software working?"

Correct.

> Putting a timeout on tests trades a slower, but definitive, "yes" or
> "no" for a supposedly-faster "maybe".  But is getting a "maybe" back
> really faster?  I've just incurred the cost of running a test suite,
> but left without my answer.  I get a "maybe", what am I supposed to do
> next?  Find a faster machine?  Hack into the code to fiddle with a
> timeout pragma?  That's not faster..

See above. If you're thinking "maybe", then the timeout is too short.

> But, the reason given for the change was not for running tests
> interactively (the 99% case), rather, all tests form the beginning of
> time are now saddled with a timeout for the 1% case:

As the data shows, this is already the case. It may be interesting to 
note that so far there were a total of 5 (five) places that had to be 
adjusted in Squeak. One was a general place (the default timeout for the 
decompiler tests) and four were individual methods. Considering that 
computers usually don't become slower over time, it seems unlikely that 
further adjustments will be necessary here. So the bottom line is that 
the changes required aren't exactly excessive.

>    "The purpose of the timeout is to catch issues like infinite loops,
> unexpected user input etc. in automated test environments."
>
> If tests are supposed to be quick (and deterministic) anyway, wouldn't
> an infinite loop or user-input be caught the first time the test was
> run (interactively)?  Seriously, when you make software changes, we
> run the tests interactively first, and then the purpose of night-time
> automated test environment is to catch regressions on the merged
> code.

These changes are largely intended for automated integration testing. I 
am hoping to automate the tests for community supported packages to a 
point where there will be no user in front of the system. Even if there 
were, it's not clear whether that person can fix the issue immediately 
or whether the entire process is stuck because someone can momentarily 
not fix the problem at hand and the tests will never run to completion 
and produce any useful result.

So the idea here is not that unit tests are *only* to catch regressions 
in previously manually tested (combinations of) code. The idea is to 
catch interactions, and integration bugs and be able to produce a result 
even if there is no user to watch the particular combination of packages 
being loaded together in this particular form.

Perhaps that is our problem here? It seems to me that you're taking a 
view that says unit tests are exclusively for regression testing and 
consequently there is no way a previously successful test would suddenly 
become unsuccessful in a way that makes it time out ... but you know, 
having written this sentence, it makes no sense to me. If we'd know 
beforehand that tests fail only in particular known ways we wouldn't 
have to run them to begin with. The whole idea of running the tests to 
catch *unexpected* situations and as a consequence there is value of 
capturing these situations instead of hanging and producing no useful 
result.

> In that case, the high-level test-controller which spits out the
> results could and should be responsible for handling "unexpected user
> input" and/or putting in a timeout, not each and every last test
> method..

Do you have such a "high-level test-controller"? Or do you mean a human 
being spending their time watching the tests run to completion? If the 
former, I'm curious as to how it would differ from what I did. If the 
latter, are you volunteering? ;-)

> IMO, we want short tests, so let's just write them to be short.  If
> they're too long, then the encouragement to shorten them comes from
> our own impatience of running them interactively.  Running them in
> batch at night requires no patience, because we're sleeping, and
> besides, the batch processor should take responsibility for handling
> those rare scenarios at a higher-level..

The goal for the timeouts is *not* to cause you to write shorter tests. 
If you're looking at it this way you're looking at it from the wrong 
angle. Up your timeout to whatever you feel is sensible to have trust in 
the results of the tests. As I said earlier, I'm quite happy to discuss 
the default timeout; it's simply that with some 95% coverage on a 10x 
safety margin it feels to me that we're playing it safe enough for the 
remaining cases to have explicit timeouts.

Cheers,
   - Andreas