Tests and software process

Wed Nov 1 14:11:57 UTC 2006

Hi Ralph,

Of course you're right, this has been an issue for quite a while. I 
think the problem is that tests have diverse domains of validity, and 
there are neither abstractions nor infrastructure in place to support them.

In theory (and often in practice) you run "the" test suite every few 
minutes, and a test fails iff some code is broken. Wonderful! 
unfortunately, in a large scale, distributed, diverse effort like 
Squeak, things are more complicated.

Examples:
- Platform specific tests.
- Very long running tests, which for most people don't give enough value 
for their machine time.
- Non-self-contained tests, for example ones that require external files 
to be present.
- Performance tests (only valid on reasonably fast machines. And this 
might change over time...)

All of these do have some value in some context, but some cannot be 
expected to be always green, and some aren't even worth running most of 
the time. And the problem is that our current choice about "where/when 
should this test run" is currently binary - everywhere, or nowhere. You 
say we should be more aggressive in making this binary decision, but the 
reason this isn't happening is that sometimes neither option is quite right.

The community has moved back and forth between extracting some/all tests 
into an optional package, but in practice that just means they never get 
run.

Do you know of some set of abstractions/practices/framework to deal with 
this problem?

Daniel Vainsencher

Ralph Johnson wrote:
> Squeak comes with a large set of SUnit tests.  Unfortunately, some of
> them don't work.  As far as I can tell, there is NO recent version of
> Squak in which all the tests work.
>
> This is a sign that something is wrong.  The main purpose of shipping
> tests with code is so that people making changes can tell when they
> break things.  If the tests don't work then people will not run them.
> If they don't run the tests then the tests are useless.  The current
> set of tests are useless because of the bad tests.  Nobody complains
> about them, which tells me that nobody runs them.  So, it is all a
> waste of time.
>
> If the tests worked then it would be easy to make a new version.
> Every bug fix would have to come with a test that illustrates the bug
> and shows that it has been fixed.  The group that makes a new version
> would check that all tests continue to work after the bug fix.
>
> An easy way to make all the tests run is to delete the ones that don't
> work.  There are thousands of working tests and, depending on the
> version, dozens of non-working tests.  Perhaps the non-working tests
> indicate bugs, perhaps they indicate bad tests.  It seems a shame to
> delete tests that are illustrating bugs.  But if these tests don't
> work, they keep the other tests from being useful.  Programmers need
> to know that all the tests worked in the virgin image, and that if the
> tests quit working, it is there own fault.
>
> No development image should ever be shipped with any failing tests.
>
> -Ralph Johnson
>