SUnit: Skipping tests?

List overview All Threads
Download

newer

older

REMINDER: Squeak Chat Sunday April...

VM states for ubuntu integration

Andreas Raab

27 Mar 2006 27 Mar '06

1:08 a.m.

Hi Folks -

I am in the interesting situation that I'm writing a few tests that require large data sets for input and where I don't want people to require to download the data sets. My problem is while it's easy to determine that the data is missing and skip the test there isn't a good way of relaying this to the user. From the user's point of view "all tests are green" even though that statement is completely meaningless and I'd rather communicate that in a way that says "X tests skipped" so that one can look at and decide whether it's useful to re-run the tests with the data sets or not.

Another place where I've seen this to happen is when platform specific tests are involved. A test which cannot be run on some platform should be skipped meaningfully (e.g., by telling the user it was skipped) rather than to appear green and working.

Any ideas?

Cheers, - Andreas

Show replies by date

Markus Gaelli

27 Mar 27 Mar

10:31 a.m.

Hi Andreas,

...

I am in the interesting situation that I'm writing a few tests that require large data sets for input and where I don't want people to require to download the data sets. My problem is while it's easy to determine that the data is missing and skip the test there isn't a good way of relaying this to the user. From the user's point of view "all tests are green" even though that statement is completely meaningless and I'd rather communicate that in a way that says "X tests skipped" so that one can look at and decide whether it's useful to re-run the tests with the data sets or not.

...

Another place where I've seen this to happen is when platform specific tests are involved. A test which cannot be run on some platform should be skipped meaningfully (e.g., by telling the user it was skipped) rather than to appear green and working.

Any ideas?

Cheers,

Andreas

If it's not possible to put the data zipped into a method because it would be too big somehow, I'd consider your two examples logically equivalent to "If the moon is made out of green cheese anything is allowed". So it is kind of ok that these tests are green. But you are right, one usually does not think about tests to have prerequisites, one does think about them as commands, which "always" bring their necessary context.

And you are suggesting to indicate clearly, which tests depend on some external resource? I'd suggest to use (and introduce in general into Squeak) preconditions using blocks (*1) like:

testCroquetOnXBox self precondition: [SmalltalkImage current platformName = 'XBox']. (...)

Having that in place one could easily collect and indicate all the tests, which have a failed precondition. They should be rare and all depend on an external resource, which is too cumbersome to recreate as a scenario.

Cheers,

Markus

(*1) Except backwards compatibility I wouldn't have problems to use method properties/pragmas for the introduction of pre- and postconditions either.

Andreas Raab

11:09 a.m.

Markus Gaelli wrote:

...

If it's not possible to put the data zipped into a method because it would be too big somehow, I'd consider your two examples logically equivalent to "If the moon is made out of green cheese anything is allowed". So it is kind of ok that these tests are green.

It's 8MB a pop so no, I think it's not really feasible to stick that test data into a method ;-)

...

And you are suggesting to indicate clearly, which tests depend on some external resource?

Well, really, what I'm looking for is something that instead of saying "all tests are green, everything is fine" says "all the tests we ran were green, but there were various that were *not* run so YMMV". I think what I'm really looking for is something that instead of saying "x tests, y passed" either says "x tests, y passed, z skipped" or simply doesn't include the "skipped" ones in the number of tests being run. In either case, looking at something that says "19 tests, 0 passed, 19 skipped" or simply "0 tests, 0 passed" is vastly more explicit than "19 tests, 19 passed" where in reality 0 were run.

Like, what if a test which doesn't have any assertion is simply not counted? Doesn't make sense to begin with, and then all the preconditions need to do is to bail out and the test doesn't count...

In any case, my complaint here is more about the *perception* of "these tests are all green, everything must be fine" when in fact, none of them have tested anything.

Cheers, - Andreas

Adrian Lienhard

11:18 a.m.

Maybe the "expected failures" feature of SUnit would do the job? You let the tests in question fail but mark them as expected failures depending on whether the resources are loaded or not. Visually, the test runner will run yellow but explicitly state that it expected to so.

Adrian

On Mar 27, 2006, at 11:09 , Andreas Raab wrote:

...

Markus Gaelli wrote:

...
If it's not possible to put the data zipped into a method because it would be too big somehow, I'd consider your two examples logically equivalent to "If the moon is made out of green cheese anything is allowed". So it is kind of ok that these tests are green.

It's 8MB a pop so no, I think it's not really feasible to stick that test data into a method ;-)

...
And you are suggesting to indicate clearly, which tests depend on some external resource?

Well, really, what I'm looking for is something that instead of saying "all tests are green, everything is fine" says "all the tests we ran were green, but there were various that were *not* run so YMMV". I think what I'm really looking for is something that instead of saying "x tests, y passed" either says "x tests, y passed, z skipped" or simply doesn't include the "skipped" ones in the number of tests being run. In either case, looking at something that says "19 tests, 0 passed, 19 skipped" or simply "0 tests, 0 passed" is vastly more explicit than "19 tests, 19 passed" where in reality 0 were run.

Like, what if a test which doesn't have any assertion is simply not counted? Doesn't make sense to begin with, and then all the preconditions need to do is to bail out and the test doesn't count...

In any case, my complaint here is more about the *perception* of "these tests are all green, everything must be fine" when in fact, none of them have tested anything.

Cheers,

Andreas

Markus Gaelli

12:04 p.m.

On Mar 27, 2006, at 11:18 AM, Adrian Lienhard wrote:

...

Maybe the "expected failures" feature of SUnit would do the job? You let the tests in question fail but mark them as expected failures depending on whether the resources are loaded or not. Visually, the test runner will run yellow but explicitly state that it expected to so.

...which also could be denoted in the test by

BarTes >> testFoo self precondition: [Bar includesSelector: #foo]

Test which are known not to be implemented yet could then be easily selected asking for preconditions including #includesSelector...

Cheers,

Markus

...

Adrian

On Mar 27, 2006, at 11:09 , Andreas Raab wrote:

...
Markus Gaelli wrote:

...
If it's not possible to put the data zipped into a method because it would be too big somehow, I'd consider your two examples logically equivalent to "If the moon is made out of green cheese anything is allowed". So it is kind of ok that these tests are green.

It's 8MB a pop so no, I think it's not really feasible to stick that test data into a method ;-)

...
And you are suggesting to indicate clearly, which tests depend on some external resource?

Well, really, what I'm looking for is something that instead of saying "all tests are green, everything is fine" says "all the tests we ran were green, but there were various that were *not* run so YMMV". I think what I'm really looking for is something that instead of saying "x tests, y passed" either says "x tests, y passed, z skipped" or simply doesn't include the "skipped" ones in the number of tests being run. In either case, looking at something that says "19 tests, 0 passed, 19 skipped" or simply "0 tests, 0 passed" is vastly more explicit than "19 tests, 19 passed" where in reality 0 were run.

Like, what if a test which doesn't have any assertion is simply not counted? Doesn't make sense to begin with, and then all the preconditions need to do is to bail out and the test doesn't count...

In any case, my complaint here is more about the *perception* of "these tests are all green, everything must be fine" when in fact, none of them have tested anything.

Cheers,

Andreas

David T. Lewis

1:41 p.m.

On Mon, Mar 27, 2006 at 11:18:58AM +0200, Adrian Lienhard wrote:

...

Maybe the "expected failures" feature of SUnit would do the job? You let the tests in question fail but mark them as expected failures depending on whether the resources are loaded or not. Visually, the test runner will run yellow but explicitly state that it expected to so.

Can you give an example of how to mark a test that is expected to fail? I'm looking for something like the following, but I must be missing something obvious.

(SmalltalkImage current platformName = 'unix') ifFalse: [self expectFailure]

Thanks,

Dave

Adrian Lienhard

2:32 p.m.

You can override TestCase>>#expectedFailures in the test case.

HTH, Adrian

On Mar 27, 2006, at 13:41 , David T. Lewis wrote:

...

On Mon, Mar 27, 2006 at 11:18:58AM +0200, Adrian Lienhard wrote:

...
Maybe the "expected failures" feature of SUnit would do the job? You let the tests in question fail but mark them as expected failures depending on whether the resources are loaded or not. Visually, the test runner will run yellow but explicitly state that it expected to so.

Can you give an example of how to mark a test that is expected to fail? I'm looking for something like the following, but I must be missing something obvious.

(SmalltalkImage current platformName = 'unix') ifFalse: [self expectFailure]

Thanks,

Dave

David T. Lewis

6:32 p.m.

Thanks!

On Mon, Mar 27, 2006 at 02:32:59PM +0200, Adrian Lienhard wrote:

...

You can override TestCase>>#expectedFailures in the test case.

HTH, Adrian

On Mar 27, 2006, at 13:41 , David T. Lewis wrote:

...
Can you give an example of how to mark a test that is expected to fail? I'm looking for something like the following, but I must be missing something obvious.

(SmalltalkImage current platformName = 'unix') ifFalse: [self expectFailure]

Markus Gaelli

11:44 a.m.

On Mar 27, 2006, at 11:09 AM, Andreas Raab wrote:

...

Markus Gaelli wrote:

...
If it's not possible to put the data zipped into a method because it would be too big somehow, I'd consider your two examples logically equivalent to "If the moon is made out of green cheese anything is allowed". So it is kind of ok that these tests are green.

It's 8MB a pop so no, I think it's not really feasible to stick that test data into a method ;-)

...
And you are suggesting to indicate clearly, which tests depend on some external resource?

Well, really, what I'm looking for is something that instead of saying "all tests are green, everything is fine" says "all the tests we ran were green, but there were various that were *not* run so YMMV". I think what I'm really looking for is something that instead of saying "x tests, y passed" either says "x tests, y passed, z skipped" or simply doesn't include the "skipped" ones in the number of tests being run. In either case, looking at something that says "19 tests, 0 passed, 19 skipped" or simply "0 tests, 0 passed" is vastly more explicit than "19 tests, 19 passed" where in reality 0 were run.

Yeah, and I think my precondition mechanism could just do that. I mean you want to annotate your test somehow of being such kind of beast, so that the TestRunner can know about them, and indicate them as you suggest, no? I was banging on the "external resource" a bit, cause until now this is the only reason I can see for writing such kinds of tests, and I wanted to make that very explicit... ;-)

...

Like, what if a test which doesn't have any assertion is simply not counted? Doesn't make sense to begin with, and then all the preconditions need to do is to bail out and the test doesn't count...

I don't understand this remark within that context.

I know a guy who is using that shouldnt: aBlock raise: anExceptionalEvent : [] idiom a lot ;-) , which is good for knowing what is really tested ;-) but otherwise does not provide any real assertion in the test. (See most of the BitBltClipBugs tests, which should be platform independent)

Also, tests without any assertions still could execute lots of methods, which have nice post conditions with them. So besides being good smoke tests, they also could be seen as tests of that very methods.

...

In any case, my complaint here is more about the *perception* of "these tests are all green, everything must be fine" when in fact, none of them have tested anything.

Fine for me, all I proposed was a mechanism to denote them. :-)

Cheers,

Markus

Andreas Raab

28 Mar 28 Mar

2:43 a.m.

Markus Gaelli wrote:

...

...
Like, what if a test which doesn't have any assertion is simply not counted? Doesn't make sense to begin with, and then all the preconditions need to do is to bail out and the test doesn't count...

I don't understand this remark within that context.

I know a guy who is using that shouldnt: aBlock raise: anExceptionalEvent : [] idiom a lot ;-) , which is good for knowing what is really tested ;-) but otherwise does not provide any real assertion in the test. (See most of the BitBltClipBugs tests, which should be platform independent)

But that is a very valuable assertion! It means that we expect that code to run without an error where (by contradiction) it used to raise an exception. And not surprisingly, #shouldnt:raise: (and its other variants) is implemented as an assertion:

shouldnt: aBlock raise: anExceptionalEvent ^self assert: (self executeShould: aBlock inScopeOf: anExceptionalEvent) not

Makes perfect sense and is a perfectly valid statement for a test. What I was referring to is a test like this:

MyTestCase>>testDummy "Really just a dummy, does nothing"

or like this:

MyTestCase>>windowsTestOnly "A test that only executes on windows" Smalltalk platformName = 'Win32' ifFalse:[^nil]. self assert: " ...something... ".

or any other form that does not exercise any assertion in the SUnit framework. It seems to me that such tests are really "empty tests", e.g., by having no assertions there isn't really a statement made whether this test succeeded or not (one might equally claim "it failed" - namely "to test anything"). In any case, I think a good solution would be to simply disregard any tests that don't assert anything.

...

Also, tests without any assertions still could execute lots of methods, which have nice post conditions with them. So besides being good smoke tests, they also could be seen as tests of that very methods.

If you mean to do this, you could just add a "self assert: true" at the end of it. At least that's a statement that the test is indeed doing something useful, like exercising code and asserting that it really does run through without any other exceptions.

Cheers, - Andreas

stéphane ducasse

9:30 a.m.

...

or any other form that does not exercise any assertion in the SUnit framework. It seems to me that such tests are really "empty tests", e.g., by having no assertions there isn't really a statement made whether this test succeeded or not (one might equally claim "it failed" - namely "to test anything"). In any case, I think a good solution would be to simply disregard any tests that don't assert anything.

For testing some XMI stuff I could not write tests the way I wanted without having to patch the library. So one test is really if by doing an action nothing breaks. I did not use self assert: true but could be a good idea.

...

If you mean to do this, you could just add a "self assert: true" at the end of it. At least that's a statement that the test is indeed doing something useful, like exercising code and asserting that it really does run through without any other exceptions.

Cheers,

Andreas

Markus Gaelli

29 Mar 29 Mar

11:42 a.m.

On Mar 28, 2006, at 2:43 AM, Andreas Raab wrote:

...

Markus Gaelli wrote:

...
...
Like, what if a test which doesn't have any assertion is simply not counted? Doesn't make sense to begin with, and then all the preconditions need to do is to bail out and the test doesn't count...

I don't understand this remark within that context. I know a guy who is using that shouldnt: aBlock raise: anExceptionalEvent : [] idiom a lot ;-) , which is good for knowing what is really tested ;-) but otherwise does not provide any real assertion in the test. (See most of the BitBltClipBugs tests, which should be platform independent)

But that is a very valuable assertion! It means that we expect that code to run without an error where (by contradiction) it used to raise an exception.

Let me cite you: ;-)

self shouldnt:[bb copyBits] raise: Error.

Aehm, isn't any unit test supposed to not throw an Error? So in addition to have provided me the information, what was the method under test here (which is good :-) ) the only other information you give is that the test was throwing an Error. For the hardcore test driven, this should always be the case in the beginning, no? ;-)

...

And not surprisingly, #shouldnt:raise: (and its other variants) is implemented as an assertion:

shouldnt: aBlock raise: anExceptionalEvent ^self assert: (self executeShould: aBlock inScopeOf: anExceptionalEvent) not

Sure. But: When nil and friends don't understand a thing again, developers get yellow failures in statements bracketed like yours and red errors for all other statements. It might well be confusing to get two different results for the same error with the only difference that the test developer was saying "no errors expected here".

...

Makes perfect sense and is a perfectly valid statement for a test. What I was referring to is a test like this:

MyTestCase>>testDummy "Really just a dummy, does nothing"

...

or like this:

MyTestCase>>windowsTestOnly "A test that only executes on windows" Smalltalk platformName = 'Win32' ifFalse:[^nil]. self assert: " ...something... ".

or any other form that does not exercise any assertion in the SUnit framework. It seems to me that such tests are really "empty tests", e.g., by having no assertions there isn't really a statement made whether this test succeeded or not (one might equally claim "it failed" - namely "to test anything"). In any case, I think a good solution would be to simply disregard any tests that don't assert anything.

...
Also, tests without any assertions still could execute lots of methods, which have nice post conditions with them. So besides being good smoke tests, they also could be seen as tests of that very methods.

If you mean to do this, you could just add a "self assert: true" at the end of it. At least that's a statement that the test is indeed doing something useful, like exercising code and asserting that it really does run through without any other exceptions.

I'd find an empty assert statement at the end of a test confusing, too. I also fail to see how this would help a TestRunner categorization of skipped tests, unless we tweaked the implementation of TestCase >> assert: a bit.

I think what we both want is a visual separation of tests, which have assertions, and examples, which don't. I'd even say that a command, which executes methods having post conditions or checking invariants, but does not provide any assertion itself, should be treated as a test of all these methods.

As I am also a big fan of examples, I'd really prefer to let developers write them without any bad consciousness. Examples are smoke tests, the more the merrier, and they should be considered to run, without being forced to tag them so. Empty test cases are empty and can easily be removed and avoided.

I am all for a test framework which makes that visual difference and we are about to build one here roughly based on my taxonomy of unit tests: http://www.iam.unibe.ch/~scg/Archive/Papers/ Gael05aTowardsATaxonomyOfUnitTests.pdf

Cheers,

Markus

Bert Freudenberg

27 Mar 27 Mar

12:28 p.m.

Am 27.03.2006 um 11:09 schrieb Andreas Raab:

...

Markus Gaelli wrote:

...
If it's not possible to put the data zipped into a method because it would be too big somehow, I'd consider your two examples logically equivalent to "If the moon is made out of green cheese anything is allowed". So it is kind of ok that these tests are green.

It's 8MB a pop so no, I think it's not really feasible to stick that test data into a method ;-)

...
And you are suggesting to indicate clearly, which tests depend on some external resource?

Well, really, what I'm looking for is something that instead of saying "all tests are green, everything is fine" says "all the tests we ran were green, but there were various that were *not* run so YMMV". I think what I'm really looking for is something that instead of saying "x tests, y passed" either says "x tests, y passed, z skipped" or simply doesn't include the "skipped" ones in the number of tests being run. In either case, looking at something that says "19 tests, 0 passed, 19 skipped" or simply "0 tests, 0 passed" is vastly more explicit than "19 tests, 19 passed" where in reality 0 were run.

Like, what if a test which doesn't have any assertion is simply not counted? Doesn't make sense to begin with, and then all the preconditions need to do is to bail out and the test doesn't count...

In any case, my complaint here is more about the *perception* of "these tests are all green, everything must be fine" when in fact, none of them have tested anything.

Other Unit Test frameworks support skipping tests. One pattern is to raise a SkipTest exception, in which case the test it added to the "skipped" list.

The good thing about implementing this with exceptions is that it would work nicely even if the particular test runner does not yet know about skipping.

Another nice XPish thing is to mark tests as ToDo - it's an expected failure, but you communicate that you intend to fix it soon.

See, e.g., http://twistedmatrix.com/projects/core/documentation/howto/ policy/test-standard.html#auto6

- Bert -

Markus Gaelli

12:54 p.m.

On Mar 27, 2006, at 12:28 PM, Bert Freudenberg wrote:

...

...
...
And you are suggesting to indicate clearly, which tests depend on some external resource?

Well, really, what I'm looking for is something that instead of saying "all tests are green, everything is fine" says "all the tests we ran were green, but there were various that were *not* run so YMMV". I think what I'm really looking for is something that instead of saying "x tests, y passed" either says "x tests, y passed, z skipped" or simply doesn't include the "skipped" ones in the number of tests being run. In either case, looking at something that says "19 tests, 0 passed, 19 skipped" or simply "0 tests, 0 passed" is vastly more explicit than "19 tests, 19 passed" where in reality 0 were run.

Like, what if a test which doesn't have any assertion is simply not counted? Doesn't make sense to begin with, and then all the preconditions need to do is to bail out and the test doesn't count...

In any case, my complaint here is more about the *perception* of "these tests are all green, everything must be fine" when in fact, none of them have tested anything.

Other Unit Test frameworks support skipping tests. One pattern is to raise a SkipTest exception, in which case the test it added to the "skipped" list.

The good thing about implementing this with exceptions is that it would work nicely even if the particular test runner does not yet know about skipping.

Another nice XPish thing is to mark tests as ToDo - it's an expected failure, but you communicate that you intend to fix it soon.

See, e.g., http://twistedmatrix.com/projects/core/documentation/ howto/policy/test-standard.html#auto6

So the circle is closing... exceptions and preconditions again! ;-) So Andreas, want to introduce some ResourceNotAvailable and ToDo exceptions ;-) , or do we get away without them and just throw a PreconditionError that I was suggesting in an earlier thread?

As said in the previous mail ToDo's could be easily figured by just sticking to the convention not to even start the method under test, which is a good idea in that case anyhow. As a nice effect one would not even have to touch the tests later when the method under test gets implemented.

Cheers,

Markus

Bert Freudenberg

1:23 p.m.

Am 27.03.2006 um 12:54 schrieb Markus Gaelli:

...

On Mar 27, 2006, at 12:28 PM, Bert Freudenberg wrote:

...
...
...
And you are suggesting to indicate clearly, which tests depend on some external resource?

Well, really, what I'm looking for is something that instead of saying "all tests are green, everything is fine" says "all the tests we ran were green, but there were various that were *not* run so YMMV". I think what I'm really looking for is something that instead of saying "x tests, y passed" either says "x tests, y passed, z skipped" or simply doesn't include the "skipped" ones in the number of tests being run. In either case, looking at something that says "19 tests, 0 passed, 19 skipped" or simply "0 tests, 0 passed" is vastly more explicit than "19 tests, 19 passed" where in reality 0 were run.

Like, what if a test which doesn't have any assertion is simply not counted? Doesn't make sense to begin with, and then all the preconditions need to do is to bail out and the test doesn't count...

In any case, my complaint here is more about the *perception* of "these tests are all green, everything must be fine" when in fact, none of them have tested anything.

Other Unit Test frameworks support skipping tests. One pattern is to raise a SkipTest exception, in which case the test it added to the "skipped" list.

The good thing about implementing this with exceptions is that it would work nicely even if the particular test runner does not yet know about skipping.

Another nice XPish thing is to mark tests as ToDo - it's an expected failure, but you communicate that you intend to fix it soon.

See, e.g., http://twistedmatrix.com/projects/core/documentation/ howto/policy/test-standard.html#auto6

So the circle is closing... exceptions and preconditions again! ;-) So Andreas, want to introduce some ResourceNotAvailable and ToDo exceptions ;-) , or do we get away without them and just throw a PreconditionError that I was suggesting in an earlier thread?

It's all about communicating the test writer's intent to the test runner. And I think I'd prefer "x tests, y passed, z skipped" as Andreas suggested.

...

As said in the previous mail ToDo's could be easily figured by just sticking to the convention not to even start the method under test, which is a good idea in that case anyhow. As a nice effect one would not even have to touch the tests later when the method under test gets implemented.

However, you wouldn't get the "unexpected success" mentioned in the link above.

- Bert -

goran＠krampe.se

2:24 p.m.

Hi!

Bert Freudenberg bert@impara.de wrote:

...

It's all about communicating the test writer's intent to the test runner. And I think I'd prefer "x tests, y passed, z skipped" as Andreas suggested.

But as Andres Valloud implied, there is a difference between:

- "skipped because I could not run them due to missing resources but I wanted to run them" - "skipped because the tests do not even apply to this platform and there is no point in even trying to run them"

So perhaps even:

"x tests, y passed, z skipped, k not applicable"

regards, Göran

Markus Gaelli

4:57 p.m.

On Mar 27, 2006, at 1:23 PM, Bert Freudenberg wrote:

...

...
So the circle is closing... exceptions and preconditions again! ;-) So Andreas, want to introduce some ResourceNotAvailable and ToDo exceptions ;-) , or do we get away without them and just throw a PreconditionError that I was suggesting in an earlier thread?

It's all about communicating the test writer's intent to the test runner. And I think I'd prefer "x tests, y passed, z skipped" as Andreas suggested.

Right. I still fail to see why this wouldn't be possible using preconditions and basically putting all tests into the skipped section where the precondition fails.

...

...
As said in the previous mail ToDo's could be easily figured by just sticking to the convention not to even start the method under test, which is a good idea in that case anyhow. As a nice effect one would not even have to touch the tests later when the method under test gets implemented.

However, you wouldn't get the "unexpected success" mentioned in the link above.

Hmmm, right! Maybe these to-do tests should not be treated by using failed preconditions but by some idiom like:

FooTest >> testBar

self should: [Foo new bar] stillRaiseButIIDoPromiseToFitxItReallySoonNowTM: Error TestRunner could be tweaked so that failing tests sending above message - or a shorter one ;-) - land in a special section, which would be "not yet implemented" / "unexpected success" respectively.

Just trying to keep the number of concepts small.

Cheers,

Markus

Diego Fernandez

3:43 p.m.

On 3/27/06, Andreas Raab andreas.raab@gmx.de wrote:

...

It's 8MB a pop so no, I think it's not really feasible to stick that test data into a method ;-)

A test that needs 8mb of data doesn't look like a unit test to me. Why you need 8mb, you can't test the same with a few bytes only?

I think that the problem is that some times is very useful to use the assertion framework and the test runner to make another kind of tests ("user story tests", "stress test", etc). But the SUnit framework depends on classification to detect all test cases available (unit tests or not). When I run all the unit tests on the system, I expect unit tests, that is small and fast tests.

Maybe we can have a Test trait or something like that to reuse the assertion framework, and the test runners, and keep the TestCase hierarchy as a classification for unit tests.

Andreas Raab

8:16 p.m.

Diego Fernandez wrote:

...

On 3/27/06, *Andreas Raab* <andreas.raab@gmx.de mailto:andreas.raab@gmx.de> wrote:
It's 8MB a pop so no, I think it's not really feasible to stick that
test data into a method ;-)
A test that needs 8mb of data doesn't look like a unit test to me. Why you need 8mb, you can't test the same with a few bytes only?

It's test data to guarantee that floating point operations create the same bit patterns across platforms (1 million samples per operation). BTW, there is a "common" variant of those tests that run with a few bytes only (by MD5-hashing and comparing it to the expected result) but that doesn't help you understanding what is going wrong and where. In any case, my inquiry wasn't about the concrete test but rather about the issue of skipped tests in general. This example just reminded me of the issue again (that I had thought about before when I wrote certain platform tests).

Cheers, - Andreas

Diego Fernandez

9:21 p.m.

On 3/27/06, Andreas Raab andreas.raab@gmx.de wrote:

...

It's test data to guarantee that floating point operations create the same bit patterns across platforms (1 million samples per operation). BTW, there is a "common" variant of those tests that run with a few bytes only (by MD5-hashing and comparing it to the expected result) but that doesn't help you understanding what is going wrong and where. In any case, my inquiry wasn't about the concrete test but rather about the issue of skipped tests in general. This example just reminded me of the issue again (that I had thought about before when I wrote certain platform tests).

Yes I know that your inquiry was about skipping test cases in general. (sorry if I was rude trying to explain myself, my English is not so good)

But, why I want to skip a test case? Only two cases come to my mind:

(1) Unfinished work: I have detected some bug or something must be done to make the test pass. (2) The test needs resources that are not available to everyone (the example that you give).

The second case sometimes is caused because of a bad written test case, or a design failure (the objects are too coupled and you can't use mock objects). But there are cases when the test is not "unitary" because you want to test the interaction with other systems or a complete user story. In those cases the problem is that I want to use the SUnit framework to make the assertions and run the tests. But SUnit depends on classification to build the suite of all the test cases, so there is no way to make another "classification" for tests, all the tests are at the same level for the test runner. That was the point that I was trying to explain in my first mail.

Cheers, Diego.-

stéphane ducasse

6:19 p.m.

May be we could use method annotation to carry this kind of behavior. Lukas started to do something in that direction.

...

Well, really, what I'm looking for is something that instead of saying "all tests are green, everything is fine" says "all the tests we ran were green, but there were various that were *not* run so YMMV". I think what I'm really looking for is something that instead of saying "x tests, y passed" either says "x tests, y passed, z skipped" or simply doesn't include the "skipped" ones in the number of tests being run. In either case, looking at something that says "19 tests, 0 passed, 19 skipped" or simply "0 tests, 0 passed" is vastly more explicit than "19 tests, 19 passed" where in reality 0 were run.

Like, what if a test which doesn't have any assertion is simply not counted? Doesn't make sense to begin with, and then all the preconditions need to do is to bail out and the test doesn't count...

In any case, my complaint here is more about the *perception* of "these tests are all green, everything must be fine" when in fact, none of them have tested anything.

For that I would extend SUnit because this idea of skipped tests is nice to have.

Lukas Renggli

7:08 p.m.

...

May be we could use method annotation to carry this kind of behavior. Lukas started to do something in that direction.

Yes, but what I did was in the context of SmallLint, so that methods could be annotated to expect or ignore certain SmallLint rules. The class LintTestCase (a subclass of TestCase) then queries these pragmas when performing the rules and raises errors only in appropriate cases.

...

For that I would extend SUnit because this idea of skipped tests is nice to have.

While developing the new test-runner and this SmallLint extension I struggled with SUnit several times. Even-tough there are always ways to subclass and configure it to suit special needs, it gets very soon ugly and cumbersome.

After 3 years of not having touched Java, I decided to have a quick look at JUnit 4 [1] and I must say that they changed a lot to the positive. It makes me sad to see that SUnit still looks more or less the same as the first time I've used it about 4 years ago. We should really try to improve it! Imagine a newbie coming from Java and seeing a testing-framework looks like JUnit 1.0 :-/

A new test runner should be the second step (I've done that already, because the old one was simply causing too much pain), an improved test-model should be the first step! ;-)

Lukas

[1] http://junit.sourceforge.net/javadoc_40

-- Lukas Renggli http://www.lukas-renggli.ch

stéphane ducasse

8:27 p.m.

...

While developing the new test-runner and this SmallLint extension I struggled with SUnit several times. Even-tough there are always ways to subclass and configure it to suit special needs, it gets very soon ugly and cumbersome.

After 3 years of not having touched Java, I decided to have a quick look at JUnit 4 [1] and I must say that they changed a lot to the positive. It makes me sad to see that SUnit still looks more or less the same as the first time I've used it about 4 years ago. We should really try to improve it! Imagine a newbie coming from Java and seeing a testing-framework looks like JUnit 1.0 :-/

Excellent idea. Any taker. I think that the backward compatibility between the dialect is a good idea as soon as it does not get in our way.

So if you have suggestions please say them

...

A new test runner should be the second step (I've done that already, because the old one was simply causing too much pain), an improved test-model should be the first step! ;-)

stéphane ducasse

8:37 p.m.

...

While developing the new test-runner and this SmallLint extension I struggled with SUnit several times. Even-tough there are always ways to subclass and configure it to suit special needs, it gets very soon ugly and cumbersome.

It would be good to fix that. Sunit is not curved on a stone.

...

After 3 years of not having touched Java, I decided to have a quick look at JUnit 4 [1] and I must say that they changed a lot to the positive.

I look at I have the impression that this is rather complex. With the after before ( I can understand that you want to have a setup shared by a bunch of tests of the same test case but more...) I think that the descriptions in SUnit are good.

...

It makes me sad to see that SUnit still looks more or less the same as the first time I've used it about 4 years ago. We should really try to improve it! Imagine a newbie coming from Java and seeing a testing-framework looks like JUnit 1.0 :-/

I have the impression that he would be scared now that I got a look at it. Can you tell us what you liked? It seems to me that annotations could help us there but annotation should only be used to convey optimization (sharing or failure expected...) else the tests would lose their values when porting application between dialects.

...

A new test runner should be the second step (I've done that already, because the old one was simply causing too much pain), an improved test-model should be the first step! ;-)

Lukas

[1] http://junit.sourceforge.net/javadoc_40

-- Lukas Renggli http://www.lukas-renggli.ch

Lukas Renggli

10:15 p.m.

...

...
After 3 years of not having touched Java, I decided to have a quick look at JUnit 4 [1] and I must say that they changed a lot to the positive.

I look at I have the impression that this is rather complex. With the after before ( I can understand that you want to have a setup shared by a bunch of tests of the same test case but more...) I think that the descriptions in SUnit are good.

I find it rather cool to be able to put tests anywhere I want: In Magritte/Pier I was forced to duplicate most of the model hierarchy for the test-cases (some tests are designed to run on whole hierarchies), what is rather annoying to maintain. Being able to put the tests anywhere would make it much easier to navigate, browse and maintain the tests.

I think the way JUnit is doing the tests is simpler than the one of SUnit, naming conventions and declarations are rather difficult to understand. Basically JUnit know 3 different kinds of annotations, translated to Smalltalk this would look like:

- test-methods can be implemented anywhere in the system and are taged with the annotation <test>.

- setUp- and tearDown-methods can be implemented anywhere in the system and are taged with the annotation <setUp> and <tearDown>.

- resources are being implemented anywhere in the system and are taged with the annotation <begin> and <end> (it took me 3 years to understand how resources work in the current implementation).

Of course one could implement a facade to make the tests run the old way. And subclassing TestCase would still work, for those tests that need a more complex object setup for the tests.

Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

stéphane ducasse

28 Mar 28 Mar

10:14 a.m.

...

I find it rather cool to be able to put tests anywhere I want:

I missed that point. You do not have to write tests in TestCase classes?

Can you explain that? We can define tests anywhere because of the @ usage?

...

In Magritte/Pier I was forced to duplicate most of the model hierarchy for the test-cases (some tests are designed to run on whole hierarchies), what is rather annoying to maintain. Being able to put the tests anywhere would make it much easier to navigate, browse and maintain the tests.

I think the way JUnit is doing the tests is simpler than the one of SUnit, naming conventions and declarations are rather difficult to understand. Basically JUnit know 3 different kinds of annotations, translated to Smalltalk this would look like:

test-methods can be implemented anywhere in the system and are taged

with the annotation <test>.

I guess only if you have a test without setup.

...

setUp- and tearDown-methods can be implemented anywhere in the

system and are taged with the annotation <setUp> and <tearDown>.

resources are being implemented anywhere in the system and are taged

with the annotation <begin> and <end> (it took me 3 years to understand how resources work in the current implementation).

And there are not really satisfactory since you cannot control when they are exactly setup. I wanted to have tests that shared all the same setup that is run only one for all the tests.

...

Of course one could implement a facade to make the tests run the old way. And subclassing TestCase would still work, for those tests that need a more complex object setup for the tests.

I do not know but indeed rethinking SUnit based on the need we have would be good.

...

Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

Bert Freudenberg

11:04 a.m.

Just a general remark - there are other folks thinking about refreshing SUnit, like Travis:

http://www.cincomsmalltalk.com/userblogs/travis/blogView? showComments=true&entry=3277650953

Also, there seems to be a VW package called SUnitToo but I couldn't find much info about it.

- Bert -

stéphane ducasse

11:16 a.m.

On 28 mars 06, at 11:04, Bert Freudenberg wrote:

...

Just a general remark - there are other folks thinking about refreshing SUnit, like Travis:

http://www.cincomsmalltalk.com/userblogs/travis/blogView? showComments=true&entry=3277650953

Yes and this is interesting even if SunitToo has some drawbacks (I do not remember which ones exactly).

I think that this is really important that we adapt the tools to our needs.

...

Also, there seems to be a VW package called SUnitToo but I couldn't find much info about it.

Bert -

Colin Putney

2:23 p.m.

On Mar 28, 2006, at 4:04 AM, Bert Freudenberg wrote:

...

Just a general remark - there are other folks thinking about refreshing SUnit, like Travis:

http://www.cincomsmalltalk.com/userblogs/travis/blogView? showComments=true&entry=3277650953

Also, there seems to be a VW package called SUnitToo but I couldn't find much info about it.

I've played with SUnitToo a bit. It's pretty cool. Travis doesn't try to make SUnit more like JUnit, but rather strips out the cross- Smalltalk compatibility layer of SUnit and re-implements it to take best advantage of modern VisualWorks. For me the interesting things were:

- Factored out the class/selector related stuff into separate class, test token. This is really handy if you've got a large suite, since it means the test runner doesn't have to keep the test case instances around just to keep track of what tests were run. That can save a lot of memory. (The work-around for memory issues in SUnit is to nil out all your ivars in #tearDown, but that's a real pain to do over and over again, and points to a flaw in the framework.)

- Instead of just producing a set of TestResults, suites broadcast the results via #triggerEvent:. Very handy for implementing UI, logging etc.

- Tests are always run in randomized order. Not sure I like this one. The intent is great - to help expose cross-test dependencies - but I find it makes *debugging* those cross-test dependencies really hard. What would be better is to randomize test order when a suite is created, then always run that suite in the same order. This would still let you detect cross-test interaction, but allow you to reproduce it reliably when it does crop up.

- General simplification. The first two points above allowed the implementation to get simpler, and SUnitToo has much less coupling between its classes.

For more detail, see:

http://www.cincomsmalltalk.com/userblogs/travis/blogView? showComments=true&entry=3278236086

Lukas Renggli

2:38 p.m.

...

http://www.cincomsmalltalk.com/userblogs/travis/blogView? showComments=true&entry=3278236086

Very interesting, thanks for the link.

Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

Lukas Renggli

1:50 p.m.

...

...

test-methods can be implemented anywhere in the system and are taged

with the annotation <test>.

I guess only if you have a test without setup.

[...]

And there are not really satisfactory since you cannot control when they are exactly setup. I wanted to have tests that shared all the same setup that is run only one for all the tests.

Well, you also tag setup and tear down methods.

The big advantage of this approach is that you already have an instance of your model, and since this is self you are also able to directly access the i-vars what can be very useful for some kind of (internal) tests:

OrderedCollection>>beginOfTest <begin> self add: 1; add: 2

Point>>test <test> self assert: lastIndex - firstIndex = 1. self assert: (array includes: 1). self assert: (array includes: 2). ...

...

I do not know but indeed rethinking SUnit based on the need we have would be good.

Another idea I saw in a Java testing framework is the possibility to group-tests, so you can add an annotation like <group: #windows> or <group: #slow> and the test runner would allows to run (or filter out) specific groups.

I wonder where I can find some documentation about SUnitToo, I am really interested to see what they did there. I started to play a bit with the existing framework and see what tools could be useful ...

Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

stéphane ducasse

10:28 p.m.

...

OrderedCollection>>beginOfTest <begin> self add: 1; add: 2

Point>>test <test> self assert: lastIndex - firstIndex = 1. self assert: (array includes: 1). self assert: (array includes: 2). ...

...
I do not know but indeed rethinking SUnit based on the need we have would be good.

Another idea I saw in a Java testing framework is the possibility to group-tests, so you can add an annotation like <group: #windows> or <group: #slow> and the test runner would allows to run (or filter out) specific groups.

I see indeed this way you can tests without violating encapsulation.

...

I wonder where I can find some documentation about SUnitToo, I am really interested to see what they did there. I started to play a bit with the existing framework and see what tools could be useful ...

You should ask travis on vw and discussed with adrian kuhn since he is using it in moose.

Stef

...

Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

Diego Fernandez

29 Mar 29 Mar

2:53 a.m.

On 3/28/06, Lukas Renggli renggli@gmail.com wrote:

...

The big advantage of this approach is that you already have an instance of your model, and since this is self you are also able to directly access the i-vars what can be very useful for some kind of (internal) tests:

OrderedCollection>>beginOfTest <begin> self add: 1; add: 2

Point>>test <test> self assert: lastIndex - firstIndex = 1. self assert: (array includes: 1). self assert: (array includes: 2). ...

For me isn't a good idea to do "white box" testing. In the OrderedCollection example given, I don't care how the items are handled in the collection object, and I couldn't find any good reason to make a test like that. Because if I do a re factoring, maybe I have to modify all the tests written in that way :(

...

Another idea I saw in a Java testing framework is the possibility to group-tests, so you can add an annotation like <group: #windows> or <group: #slow> and the test runner would allows to run (or filter out) specific groups.

I like the idea of test groups (that's I was trying to say in my previous mail :) ). But I don't like the idea of having "annotations"... I hate them :P Because an annotation for me is like an "attribute", it's only data without behavior, but in the system is necessary to take decisions based on this "attribute value". For example if the "test group" is reified, you can have platform specific tests and skip the tests that doesn't applies to the platform, this decision to skip the test could be made in the test group. (anyway a TestSuite is a "test group", the problem for me is that SUnit builds the "all tests" test suite collecting #allSubclasses of TestCase, maybe if the construction of the "all tests" is a little more "intelligent" we can have "test groups" without adding annotations to the test methods)

I wonder where I can find some documentation about SUnitToo, I am

...

really interested to see what they did there. I started to play a bit with the existing framework and see what tools could be useful ...

This is the first time that I heard about SUnitToo... I will take a look, thanks :)

Cheers, Diego.-

Lukas Renggli

9:06 a.m.

...

...
OrderedCollection>>beginOfTest <begin> self add: 1; add: 2

Point>>test <test> self assert: lastIndex - firstIndex = 1. self assert: (array includes: 1). self assert: (array includes: 2). ...

For me isn't a good idea to do "white box" testing. In the OrderedCollection example given, I don't care how the items are handled in the collection object, and I couldn't find any good reason to make a test like that. Because if I do a re factoring, maybe I have to modify all the tests written in that way :(

- If you rename #add: to #addItem: you have to re-factor your tests as well.

- If you have something more complex such as an AVL-, B-, R-Tree, etc. you certainly want to test some internal state, simply iterating over the elements cannot reveal all bugs. I don't see a reason why the internal state of OrderedCollection shouldn't be testable as well.

...

But I don't like the idea of having "annotations"... I hate them :P

There are some problem in software engineering (mainly related to extending) that cannot be cleanly solved otherwise.

...

Because an annotation for me is like an "attribute", it's only data without behavior, but in the system is necessary to take decisions based on this "attribute value".

Wrong.

- No need to take decisions, the trick is to dispatch annotations onto objects that understand these messages.

- Annotated method do have behavior, see Tweak that is using annotations to define event-handlers.

Cheers, Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

hernan_wilkinson

2:38 p.m.

--- In squeak@yahoogroups.com, "Lukas Renggli" <renggli@...> wrote:

...

...
...
OrderedCollection>>beginOfTest <begin> self add: 1; add: 2

Point>>test <test> self assert: lastIndex - firstIndex = 1. self assert: (array includes: 1). self assert: (array includes: 2). ...

For me isn't a good idea to do "white box" testing. In the OrderedCollection example given, I don't care how the items are

handled in

...

...
the collection object, and I couldn't find any good reason to make

a test

...

...
like that. Because if I do a re factoring, maybe I have to modify

all the

...

...
tests written in that way :(

If you rename #add: to #addItem: you have to re-factor your tests

as well.

...

If you have something more complex such as an AVL-, B-, R-Tree, etc.

you certainly want to test some internal state, simply iterating over the elements cannot reveal all bugs. I don't see a reason why the internal state of OrderedCollection shouldn't be testable as well.

I think that Diego is saying that it is better not to test the implementation but the behavior of an object. If the implementation changes and you have implementation tests, you will have to change them. On the other hand, if you only test behavior and you change the implementation you should not change the tests. I understand that you would want to see that your implementation is right, but is not enough to test the behavior? what implementation mistake couldn't you find testing the behavior? If there is an implementation error, for sure it will pop up in a behavior test. If not, I beleive that you are not covering a 100 % of your code. Here at Mercap, we have more than 7000 unit tests and we do not test implementation at all and when we found a bug, it was because there were not "behavior" test to cover that situation.

...

...
But I don't like the idea of having "annotations"... I hate them :P

There are some problem in software engineering (mainly related to extending) that cannot be cleanly solved otherwise.

...
Because an annotation for me is like an "attribute", it's only data without behavior, but in the system is necessary to take decisions based

on this

...

...
"attribute value".

Wrong.

No need to take decisions, the trick is to dispatch annotations onto

objects that understand these messages.

Annotated method do have behavior, see Tweak that is using

annotations to define event-handlers.

We have talked with Diego many times about annotations and our opinion is that annotations are a tool that do not follow the idea of having only "objects and messages". It is sintax sugar that makes the language more difficult to use. It is a matter of taste and I don't like sugar :-)... (For sure you have discuss this on the list many times...) I haven't read the hole track, but I understand the first question of the track was, in essence, how to categorize tests. We did have the same problem here and we created different test suites. So we have a test suite for the unit tests, we have a test suite for functional test (they take more time), we have a test suite for those objects we call "systems", we have a test suite to check coverage and SmallLint compliance and we have a test suite for architecture issues (it is important when you work with Envy and GemStone). So, we run each test at different stages of the development process. For example, we run the unit tests all the time, but we run the architecture test during the integration phase, before versioning. We run the coverage and SmallLint compliance test before integration and during integration, etc. Anyway, we like the idea of keeping things simple and try to solve all the problems with objects and messages, no more.... I'm not saying you don't, but annotations, for us, looks "outside" that idea... I guess time will show if they are a good idea or not, we may be wrong.

Hernan

...

Cheers, Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

Markus Gaelli

3:14 p.m.

+1!

I think the block constructs can be used in may cases to achieve the same goal as annotions. Looking at all the should: and shouldnt: construct can give you an idea.

The nice thing is that you can browse for them using your standard "browse senders". I, for example, want to denotate the "unit under test" of the unit test, which turns out to be a method in most of the cases.

For achieving this goal, one just can bracket the method under test, with a self test: [someMethod] construct.

This also gives nice backwards and sidewards compatibility to Squeak and other dialects. As long as Smalltalk already provides me with some means to achieve a goal in a simple way, I am all for exploiting that before introducing another concept.

Looking at tweak (and being a great fan of Etoys) I like the idea to denotate some event, when a method gets triggered, though debugging such an event based system might be hard. I haven't thought about using the block idiom described above for achieving this goal, so I don't know if that would be possible also, but my feeling tells me that it should be the case.

Cheers,

Markus

On Mar 29, 2006, at 2:38 PM, hernan_wilkinson wrote:

...

--- In squeak@yahoogroups.com, "Lukas Renggli" <renggli@...> wrote:

...
...
...
OrderedCollection>>beginOfTest <begin> self add: 1; add: 2

Point>>test <test> self assert: lastIndex - firstIndex = 1. self assert: (array includes: 1). self assert: (array includes: 2). ...

For me isn't a good idea to do "white box" testing. In the OrderedCollection example given, I don't care how the items are

handled in

...
...
the collection object, and I couldn't find any good reason to make

a test

...
...
like that. Because if I do a re factoring, maybe I have to modify

all the

...
...
tests written in that way :(

If you rename #add: to #addItem: you have to re-factor your tests

as well.

...

If you have something more complex such as an AVL-, B-, R-Tree,

etc. you certainly want to test some internal state, simply iterating over the elements cannot reveal all bugs. I don't see a reason why the internal state of OrderedCollection shouldn't be testable as well.

I think that Diego is saying that it is better not to test the implementation but the behavior of an object. If the implementation changes and you have implementation tests, you will have to change them. On the other hand, if you only test behavior and you change the implementation you should not change the tests. I understand that you would want to see that your implementation is right, but is not enough to test the behavior? what implementation mistake couldn't you find testing the behavior? If there is an implementation error, for sure it will pop up in a behavior test. If not, I beleive that you are not covering a 100 % of your code. Here at Mercap, we have more than 7000 unit tests and we do not test implementation at all and when we found a bug, it was because there were not "behavior" test to cover that situation.

...
...
But I don't like the idea of having "annotations"... I hate them :P

There are some problem in software engineering (mainly related to extending) that cannot be cleanly solved otherwise.

...
Because an annotation for me is like an "attribute", it's only data without behavior, but in the system is necessary to take decisions based

on this

...
...
"attribute value".

Wrong.

No need to take decisions, the trick is to dispatch annotations

onto objects that understand these messages.

Annotated method do have behavior, see Tweak that is using

annotations to define event-handlers.

We have talked with Diego many times about annotations and our opinion is that annotations are a tool that do not follow the idea of having only "objects and messages". It is sintax sugar that makes the language more difficult to use. It is a matter of taste and I don't like sugar :-)... (For sure you have discuss this on the list many times...) I haven't read the hole track, but I understand the first question of the track was, in essence, how to categorize tests. We did have the same problem here and we created different test suites. So we have a test suite for the unit tests, we have a test suite for functional test (they take more time), we have a test suite for those objects we call "systems", we have a test suite to check coverage and SmallLint compliance and we have a test suite for architecture issues (it is important when you work with Envy and GemStone). So, we run each test at different stages of the development process. For example, we run the unit tests all the time, but we run the architecture test during the integration phase, before versioning. We run the coverage and SmallLint compliance test before integration and during integration, etc. Anyway, we like the idea of keeping things simple and try to solve all the problems with objects and messages, no more.... I'm not saying you don't, but annotations, for us, looks "outside" that idea... I guess time will show if they are a good idea or not, we may be wrong.

Hernan

...
Cheers, Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

Lukas Renggli

5:16 p.m.

...

I think the block constructs can be used in may cases to achieve the same goal as annotions. Looking at all the should: and shouldnt: construct can give you an idea.

[snip]

I don't understand a single word in your e-mail. What are you talking about? What are you using blocks for? What is the purpose of #test:, #should:, #shouldnt: ...?

Can you elaborate what you mean?

Cheers, Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

Markus Gaelli

7:35 p.m.

On Mar 29, 2006, at 5:16 PM, Lukas Renggli wrote:

...

...
I think the block constructs can be used in may cases to achieve the same goal as annotions. Looking at all the should: and shouldnt: construct can give you an idea.

[snip]

I don't understand a single word in your e-mail. What are you talking about? What are you using blocks for? What is the purpose of #test:, #should:, #shouldnt: ...?

Not a single word. Bummer! ;-)

...

Can you elaborate what you mean?

Sure. Let me give you three examples.

==Logging== Let's stick to the canonical example for aspects: Logging. You could describe this with an annotation "Log this".

But on the other hand you could also wrap the code you want to log with a

Bar >> foo self log:[body of foo]

construction. Whether your class browser displays you the actual source or filters out the log: could depend on some setting of your browser. If it filters, it could at least provide you with some indicator close to the method that logging is on. Yes, I mean it, the possible filter options of our class browser for code is highly underestimated currently... ;-)

Implementing Object >> log: aBlock could do all the tricks you need for logging, no?

How would I manage all the aspects? Guess I tried to get away with some stupid method category naming convention a la Monticello, as I don't know anything better. Just some method category name with an "aspects" or maybe even aspect dependent name like "aspects-loggingOut" or whatever. This is where I would store the log: method of course.

Having that in place one could easily iterate over all classes, collect the appropriate "aspects" (with all what comes first and last problems, one would need a decent tool for that) and apply them or remove them.

==Tweak Events== Let me try it with tweak:

MyMorph >> makeYellow <on: mouseDown> color := Color yellow.

this method should not be called by some other method, but only when the mouse is pressed. Another way of tagging this, could be to write sth. like

MyMorph >> makeYellow self onMouseDown: [color := Color yellow]

and

Object(Tweak-MouseEvents) >> onMouseDown: [aBlock] self precondition: [detect if mouse is really pressed]. ^aBlock value

Again, some mechanism could be installed to both put that "aspects" in and out of the code, and also manage the mousedown scheduler, to call all interested parties when appropriate.

Or just switch it off globally at runtime like a possible solution for Object >> assert: aBlock (to be found in our scg VW library) Object >> assert: aBlock AssertionsOn ifTrue: [aBlock value]

==Testing== "Unit test" is a funny name, as nobody seems to agree or make explicit what the unit under test is. Having pursued the endeavor of categorizing all unit tests of Squeak 3.7 we came to the conclusion that most of the tests written in SUnit do focus one a single method as unit under test. This is no wonder as the naming convention of SUnit makes that approach natural. (There are certainly others and better ones, the best ones I find those, that check, if the inverse function applied to the function applied to a parameter delivers the parameter in the end.)

All unit tests are decomposable into examples, which focus on single methods (I call these guys "method examples"), and tests, which focus on single methods. ("method tests"). So method examples and method tests are basically the building blocks for our test framework.

It would be nice to navigate between tests and unit under tests, which is futile if nobody makes that relationship explicit, and this is usually our main job as object-oriented developers. Romain Robbes did a nice first shot navigating between tests and code, but here every method in a test is basically treated to be somehow tested/exemplified.

On the other hand some philosophers like Wittgenstein or linguists like Lakoff would say that there are better and worse examples for a given concept - you would not explain the concept of a bird to a child with a penguin first...

So how do I make explicit which "animal" I am really interested in a test?

Again, just use the block concept. FooTest >> testBar "Our tool is actually able to detect that this is an InverseTest" "..someSetupCode for blaBla." (...) self test: [aResult := blaBla bar inverseBar] "All methods called in the test block are methods under test" "some assertions to make this a test and not a mere example" self assert: (aResult = blaBla) Java folks would have to denote the method under test using annotations, we don't have to, having our wonderful universal acid of lambda calculus.

I hope these three examples clarify my sympathy for using blocks instead of annotations more than it confuses, as I bring in my idea of examples too. ;-) Shouldn't have given the examples of should: and shouldnt: as they might not have been the best ones.

Cheers,

Markus

Markus Gaelli

8:06 p.m.

On Mar 29, 2006, at 7:35 PM, Markus Gaelli wrote:

...

On the other hand some philosophers like Wittgenstein or linguists like Lakoff would say that there are better and worse examples for a given concept - you would not explain the concept of a bird to a child with a penguin first...

So how do I make explicit which "animal" I am really interested in a test?

Again, just use the block concept.

FooTest >> testBar "Our tool is actually able to detect that this is an InverseTest"
"..someSetupCode for blaBla."
(...)
self test: [aResult := blaBla bar inverseBar] "All methods called  
in the test block are methods under test" "some assertions to make this a test and not a mere example" self assert: (aResult = blaBla)

Having said all this, I have to retract my arguments vs. self shouldnt:[bb copyBits] raise: Error. from my morning mail of course...

If its always Error that I don't expect in my methods under test I could rewrite

...

self test: [aResult := blaBla bar inverseBar] "All methods called in the test block are methods under test"

from above easily into

self shouldntRaiseError: [aResult := blaBla bar inverseBar] "All methods called in the test block are methods under test" ;-) The only difference to Andreas' solution would be that I would bracket the methods under test like this, even if there are some assertions down below in the code.

Cheers,

Markus

Lukas Renggli

5:13 p.m.

...

I think that Diego is saying that it is better not to test the implementation but the behavior of an object. If the implementation changes and you have implementation tests, you will have to change them. On the other hand, if you only test behavior and you change the implementation you should not change the tests.

But this is the same, if you change the behavior or interface, you have to adapt the tests as well, so this does not count ;-)

...

I understand that you would want to see that your implementation is right, but is not enough to test the behavior? what implementation mistake couldn't you find testing the behavior?

You are probably right for most cases.

How do you test class invariants in your test-cases, without exposing the internal state of the object?

...

If there is an implementation error, for sure it will pop up in a behavior test. If not, I beleive that you are not covering a 100 % of your code.

Again, you are probably right, but it might be not that obvious and/or only appear later on.

...

Here at Mercap, we have more than 7000 unit tests and we do not test implementation at all and when we found a bug, it was because there were not "behavior" test to cover that situation.

Sure, I do the same in Magritte and Pier with almost 3000 tests.

...

We have talked with Diego many times about annotations and our opinion is that annotations are a tool that do not follow the idea of having only "objects and messages". It is sintax sugar that makes the language more difficult to use. It is a matter of taste and I don't like sugar :-)... (For sure you have discuss this on the list many times...)

- An annotation is an object, as everything is.

- An annotation is a message(-send), you can send it to any object and have behavior and/or a return objects as an effect.

Cheers, Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

hernan_wilkinson

31 Mar 31 Mar

2:13 p.m.

--- In squeak@yahoogroups.com, "Lukas Renggli" <renggli@...> wrote:

...

...
I think that Diego is saying that it is better not to test the implementation but the behavior of an object. If the implementation changes and you have implementation tests, you will have to change them. On the other hand, if you only test behavior and you change the implementation you should not change the tests.

But this is the same, if you change the behavior or interface, you have to adapt the tests as well, so this does not count ;-)

Ha, ha, this is a good one... but only if you take it out of context... Of course that if B depends on A an A changes, then B will have to change, that is not the issue. I think the issue here is: "What's more important, implementation or behavior?" and I beleive the answer is behavior. Implementation can change many times but the responsibilities of an object, its essence, should change less that its implementation... (I'm not saying that is always like this, but that's the idea...) If I have tests based on behavior, then I can reuse them for different implementations, for example, a Stack could be implemented with a List, with an Array, etc., there could even be the two implementations, but I could use the same tests... Anyway, having said this, I guess I can answers your "so this does not count ;-)" with a "so this does not count ;-)" ;-) (hey, just kidding, don't get mad!).

...

...
I understand that you would want to see that your implementation is right, but is not enough to test the behavior? what implementation mistake couldn't you find testing the behavior?

You are probably right for most cases.

How do you test class invariants in your test-cases, without exposing the internal state of the object?

Well, I beleive that an object's invariant is based on its behavior, not its implementation. For example, in a CircularList the last and first elements are linked (if there are elements of course). That's the invariant no matter if the CircularList uses a DoubleLinkedList, an OrderedCollection or whatever structure you want to use to implement the circular list. I don't think you have to expose the internal state of an object to test its invariant... The idea of invariant, as I understand it, it is tied to the concept itself and if you want to test the invariant, that should happend all the time something changes in the instance of that concept. That's what Eiffel does, but in Eiffel you write the invariant as part of the class not as a separate test.

[snip]

...

...
We have talked with Diego many times about annotations and our opinion is that annotations are a tool that do not follow the idea of having only "objects and messages". It is sintax sugar that makes the language more difficult to use. It is a matter of taste and I don't like sugar :-)... (For sure you have discuss this on the list many times...)

An annotation is an object, as everything is.

An annotation is a message(-send), you can send it to any object and

have behavior and/or a return objects as an effect.

I have to admit that I have not read about annotations, just saw a few examples, but I'd like you to think about this: 1) When I see an annotation I don't see an object, neither a message not a message send, I just see "<anAnnotation>". 2) This means that I need to learn something new that is not about objects and messages (at least explicitly) but about sintax. This may be a good price to pay, but only if this new tool gives me something it is hard to do in other ways.

When I see an annotation I see a "tag" (an this is personal, maybe if I read more about them I'll see other things), wich remembers me and transports me to the "relational paradigm", where everytime you need to "tag" something you just create a new "field" on the "row". But, what is the essence of doing this?, the essence is to create a categorization. In the case of annotations, to categorize a method using different "tags". Well, I beleive categorization can be achieved in a much simple and flexible way using sets (I mean set in the abstract way) which in the "relational paradigm" would mean to create a new table, and in the case of tests using SUnit, to create different suites. Forgive my ignorance, but how would you categorize a test of beeing, let's say, an architecture test and an integration test with annotations? I want to be able to run only architecture tests, or integration tests, or both or neither. I cant think of two ways of anotating: 1) aClass>>test1 <architectureTest> <integrationTest> ... the test

2) aClass>>test1 <architectureTest> <integrationTest> <architectureTestAndIntegrationTest> ... the test

But I don't kwnow how to obtain which ones are architecture tests, etc. Using sets and suites, it is simple:

Set architectureTests := Set with: aClass>>test1 with: ... Set integrationTests := Set with: aClass>>test1 with: ... Set archtectureAndIntegrationTests := architectureTests union: integrationTests.

architectureTestSuite := (TestSuite named:..) addTests: architectureTests. integrationTestSuite := (TestSuite named:..) addTests: integrationTests. etc.

If I need to change the categorization, I change the objects (the sets), not a method, and the info about the categories is in one place, not spread all over.

Bye, Hernan.

Lukas Renggli

9:49 p.m.

...

I cant think of two ways of anotating:

aClass>>test1

<architectureTest> <integrationTest> ... the test

aClass>>test1

<architectureTest> <integrationTest> <architectureTestAndIntegrationTest> ... the test > But I don't kwnow how to obtain which ones are architecture tests, etc. Using sets and suites, it is simple:

Set architectureTests := Set with: aClass>>test1 with: ... Set integrationTests := Set with: aClass>>test1 with: ... Set archtectureAndIntegrationTests := architectureTests union: integrationTests.

architectureTestSuite := (TestSuite named:..) addTests: architectureTests. integrationTestSuite := (TestSuite named:..) addTests: integrationTests. etc.

If I need to change the categorization, I change the objects (the sets), not a method, and the info about the categories is in one place, not spread all over.

Mhh, I see that different ... ;-)

The problem with your approach is that you categorize your methods at a different places than you define them. Basically this means I have to accept two methods, if I add, change or remove a test.

I do very much prefer your annotation example (1) where I directly tag the concerning method with the necessary meta-information. For me this is much easier to maintain, because I can see immediately when browsing the method to what category it belongs to. Moreover I can quickly query all the currently taged methods by browsing the senders of #architectureTest. Then I get the suite with one line of code, not a method full of selectors defined somewhere throughout the image:

Set architectureTests := MethodAnnotation allNamed: #architectureTest from: aClass.

Cheers, Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

Hernan Wilkinson

11:05 p.m.

Lukas Renggli wrote:

...

Mhh, I see that different ... ;-)

The problem with your approach is that you categorize your methods at a different places than you define them. Basically this means I have to accept two methods, if I add, change or remove a test.

I do very much prefer your annotation example (1) where I directly tag the concerning method with the necessary meta-information. For me this is much easier to maintain, because I can see immediately when browsing the method to what category it belongs to. Moreover I can quickly query all the currently taged methods by browsing the senders of #architectureTest. Then I get the suite with one line of code, not a method full of selectors defined somewhere throughout the image:

Set architectureTests := MethodAnnotation allNamed: #architectureTest from: aClass.

I have to agree on that, it is much easier the way you propose...

...

Cheers, Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

-- ______________________________ Lic. Hernán A. Wilkinson Gerente de Desarrollo y Tecnología Mercap S.R.L. Tacuari 202 - 7mo Piso - Tel: 54-11-4878-1118 Buenos Aires - Argentina http://www.mercapsoftware.com --------------------------------------------------------------------- Este mensaje es confidencial. Puede contener informacion amparada por el secreto profesional. Si usted ha recibido este e-mail por error, por favor comuniquenoslo inmediatamente via e-mail y tenga la amabilidad de eliminarlo de su sistema; no debera copiar el mensaje ni divulgar su contenido a ninguna persona. Muchas gracias. This message is confidential. It may also contain information that is privileged or otherwise legally exempt from disclosure. If you have received it by mistake please let us know by e-mail immediately and delete it from your system; you should also not copy the message nor disclose its contents to anyone. Thanks. ---------------------------------------------------------------------

Lukas Renggli

1 Apr 1 Apr

9:41 a.m.

...

I have to agree on that, it is much easier the way you propose...

Gosh, really? I didn't yet brought up the best argument ... ;-)

Imagine several packages with tests being categorized as #architectureTests and #integrationTests. Now not all the developers want to always load all the packages (e.g. maybe packages provide extensions you don't need in your particular image, or there are extra packages of other people not under your control). Defining a global method returning all the #architecutreTests simply doesn't work with such a setup, querying meta information of methods however does. This is exactly the problem I was confronted with when developing Magritte/Pier, and this is the reason why I implemented and itroduced MethodAnnotations for Squeak 3.9.

Thanks for the interesting discussion!

Cheers, Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

Andres Valloud

27 Mar 27 Mar

12:08 p.m.

Hello Andreas,

Sunday, March 26, 2006, 5:08:17 PM, you wrote:

AR> I am in the interesting situation that I'm writing a few tests AR> that require large data sets for input and where I don't want AR> people to require to download the data sets. My problem is while AR> it's easy to determine that the data is missing and skip the test AR> there isn't a good way of relaying this to the user. From the AR> user's point of view "all tests are green" even though that AR> statement is completely meaningless and I'd rather communicate AR> that in a way that says "X tests skipped" so that one can look at AR> and decide whether it's useful to re-run the tests with the data AR> sets or not.

Would it make sense to make separate test case classes for the tests that take a lot of data, and then users would only get those test cases if they download the large data sets?

AR> Another place where I've seen this to happen is when platform AR> specific tests are involved. A test which cannot be run on some AR> platform should be skipped meaningfully (e.g., by telling the user AR> it was skipped) rather than to appear green and working.

Personally, I'd make a subclass to hold platform specific tests and make the whole thing pass if the platform does not match the one for which the tests were designed. If the test does not even apply, why should I get concerned as to why was something skipped when it's ok?

-- Best regards, Andres mailto:sqrmax@optonline.net

6622

Age (days ago)

6628

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

44 comments

13 participants

tags (0)

participants (13)

Adrian Lienhard
Andreas Raab
Andres Valloud
Bert Freudenberg
Colin Putney
David T. Lewis
Diego Fernandez
goran＠krampe.se
Hernan Wilkinson
hernan_wilkinson
Lukas Renggli
Markus Gaelli
stéphane ducasse