[squeak-dev] Re: [Pharo-project] SUnit Time out
Andreas Raab
andreas.raab at gmx.de
Wed Jun 2 04:36:48 UTC 2010
Hi Chris -
Let me comment on this from a more general point of view first, before
going into the specifics. I've spent the last five years building a
distributed system and during this time I've learned a couple of things
about the value of timeouts :-) One thing that I've come to understand
is that *no* operation is unbounded. We may leisurely talk about "just
wait until it's done" but the reality is that regardless of what the
operation is we never actually wait forever. At some point we *will*
give up no matter what you may think. This is THE fundamental point
here. Everything else is basically haggling about what the right timeout is.
For the right timeout the second fundamental thing to understand is that
if there's a question of whether the operation "maybe" completed, then
your timeout is too short. Period. The timeout's value is not to
indicate that "maybe" the operation completed, it is there to say
unequivocally that something caused it to not complete and that it DID fail.
Obviously, introducing timeouts will create some initial false
positives. But it may be interesting to be a bit more precise on what
we're talking about. To do this I attributed TestRunner to measure the
time it takes to run each test and then ran all the tests in 4.2 to see
where that leads us. As you might expect, the distribution is extremely
uneven. Out of 2681 tests run 2588 execute in < 500 msecs (approx. 1800
execute with no measurable time); 2630 execute in less than one second,
leaving a total of 51 that take more than a second and only three tests
actually take longer than 5 seconds and they are all tagged as such.
As you can see the vast majority of tests have a "safety margin" of 10x
or more between the time the test usually takes and its timeout value.
Generally speaking, this margin is sufficient to compensate for "other"
effects that might rightfully delay the completion of the test in time.
If you have tests that commonly vary by 10x I'd be interested in finding
out more about what makes them so unpredictable.
So if your question is "are my timeouts to tight" one thing we could do
is to introduce the 10x as a more or less general guideline for
executing tests, and perhaps add a transcript notifier if we ever come
closer than 1/3rd of the specified timeout value (i.e., indicating that
something in the nature of the test has changed that should be reflected
in its timeout). This would give you ample warning that you need to
adjust your test even if it isn't (yet) failing on the timeout.
That said, a couple of concrete comments to your post:
On 5/30/2010 11:52 AM, Chris Muller wrote:
> (Copying squeak-dev too).
>
> I'm not sold on the whole test timeout thing. When I run tests, I
> want to know the answer to the question, "is the software working?"
Correct.
> Putting a timeout on tests trades a slower, but definitive, "yes" or
> "no" for a supposedly-faster "maybe". But is getting a "maybe" back
> really faster? I've just incurred the cost of running a test suite,
> but left without my answer. I get a "maybe", what am I supposed to do
> next? Find a faster machine? Hack into the code to fiddle with a
> timeout pragma? That's not faster..
See above. If you're thinking "maybe", then the timeout is too short.
> But, the reason given for the change was not for running tests
> interactively (the 99% case), rather, all tests form the beginning of
> time are now saddled with a timeout for the 1% case:
As the data shows, this is already the case. It may be interesting to
note that so far there were a total of 5 (five) places that had to be
adjusted in Squeak. One was a general place (the default timeout for the
decompiler tests) and four were individual methods. Considering that
computers usually don't become slower over time, it seems unlikely that
further adjustments will be necessary here. So the bottom line is that
the changes required aren't exactly excessive.
> "The purpose of the timeout is to catch issues like infinite loops,
> unexpected user input etc. in automated test environments."
>
> If tests are supposed to be quick (and deterministic) anyway, wouldn't
> an infinite loop or user-input be caught the first time the test was
> run (interactively)? Seriously, when you make software changes, we
> run the tests interactively first, and then the purpose of night-time
> automated test environment is to catch regressions on the merged
> code.
These changes are largely intended for automated integration testing. I
am hoping to automate the tests for community supported packages to a
point where there will be no user in front of the system. Even if there
were, it's not clear whether that person can fix the issue immediately
or whether the entire process is stuck because someone can momentarily
not fix the problem at hand and the tests will never run to completion
and produce any useful result.
So the idea here is not that unit tests are *only* to catch regressions
in previously manually tested (combinations of) code. The idea is to
catch interactions, and integration bugs and be able to produce a result
even if there is no user to watch the particular combination of packages
being loaded together in this particular form.
Perhaps that is our problem here? It seems to me that you're taking a
view that says unit tests are exclusively for regression testing and
consequently there is no way a previously successful test would suddenly
become unsuccessful in a way that makes it time out ... but you know,
having written this sentence, it makes no sense to me. If we'd know
beforehand that tests fail only in particular known ways we wouldn't
have to run them to begin with. The whole idea of running the tests to
catch *unexpected* situations and as a consequence there is value of
capturing these situations instead of hanging and producing no useful
result.
> In that case, the high-level test-controller which spits out the
> results could and should be responsible for handling "unexpected user
> input" and/or putting in a timeout, not each and every last test
> method..
Do you have such a "high-level test-controller"? Or do you mean a human
being spending their time watching the tests run to completion? If the
former, I'm curious as to how it would differ from what I did. If the
latter, are you volunteering? ;-)
> IMO, we want short tests, so let's just write them to be short. If
> they're too long, then the encouragement to shorten them comes from
> our own impatience of running them interactively. Running them in
> batch at night requires no patience, because we're sleeping, and
> besides, the batch processor should take responsibility for handling
> those rare scenarios at a higher-level..
The goal for the timeouts is *not* to cause you to write shorter tests.
If you're looking at it this way you're looking at it from the wrong
angle. Up your timeout to whatever you feel is sensible to have trust in
the results of the tests. As I said earlier, I'm quite happy to discuss
the default timeout; it's simply that with some 95% coverage on a 10x
safety margin it feels to me that we're playing it safe enough for the
remaining cases to have explicit timeouts.
Cheers,
- Andreas
More information about the Squeak-dev
mailing list
|