Hi all, sometimes, some build fail for just 1 test...
Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844 a squeak.stack.v3
RenderBugz ✗ #testSetForward (7ms) TestFailure: Block evaluation took more than the expected 0:00:00:00.004 RenderBugz(TestCase)>>assert:description: RenderBugz(TestCase)>>should:notTakeMoreThan: RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds: RenderBugz>>shouldntTakeLong: RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 . self assert: ( t forwardDirection = 0.0 ) ] RenderBugz(TestCase)>>performTest
4ms, really? On C.I. infrastructure, anything can happen... Do we really want to keep this kind of test? We eventually could once startup performance is known (see isLowerPerformance discussion on squeak-dev), but in the interim, I suggest we neutralize this specific test in Smalltalk-CI.
Hi Nicolas.
Do we really want to keep this kind of test?
Such benchmarks (and benchmark-like tests) should at least average over several runs and only fail as a test if something actually got slower on average. Or something like that. A single misbehaving run should not be the reason for such a test failure.
Maybe we can tweak #should:notTakeMoreThan: to evaluate the block several times? But then it cannot fail early on as it is doing now ... Hmmm...
Best, Marcel Am 05.01.2021 09:08:46 schrieb Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Hi all, sometimes, some build fail for just 1 test...
Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844 a squeak.stack.v3
RenderBugz ✗ #testSetForward (7ms) TestFailure: Block evaluation took more than the expected 0:00:00:00.004 RenderBugz(TestCase)>>assert:description: RenderBugz(TestCase)>>should:notTakeMoreThan: RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds: RenderBugz>>shouldntTakeLong: RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 . self assert: ( t forwardDirection = 0.0 ) ] RenderBugz(TestCase)>>performTest
4ms, really? On C.I. infrastructure, anything can happen... Do we really want to keep this kind of test? We eventually could once startup performance is known (see isLowerPerformance discussion on squeak-dev), but in the interim, I suggest we neutralize this specific test in Smalltalk-CI.
Seems like more of a warning and not a failure.
All the best,
Ron Teitelbaum
On Tue, Jan 5, 2021 at 3:22 AM Marcel Taeumel marcel.taeumel@hpi.de wrote:
Hi Nicolas.
Do we really want to keep this kind of test?
Such benchmarks (and benchmark-like tests) should at least average over several runs and only fail as a test if something actually got slower on average. Or something like that. A single misbehaving run should not be the reason for such a test failure.
Maybe we can tweak #should:notTakeMoreThan: to evaluate the block several times? But then it cannot fail early on as it is doing now ... Hmmm...
Best, Marcel
Am 05.01.2021 09:08:46 schrieb Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com>:
Hi all, sometimes, some build fail for just 1 test...
Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844 a squeak.stack.v3
RenderBugz ✗ #testSetForward (7ms) TestFailure: Block evaluation took more than the expected 0:00:00:00.004 RenderBugz(TestCase)>>assert:description: RenderBugz(TestCase)>>should:notTakeMoreThan: RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds: RenderBugz>>shouldntTakeLong: RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 . self assert: ( t forwardDirection = 0.0 ) ] RenderBugz(TestCase)>>performTest
4ms, really? On C.I. infrastructure, anything can happen... Do we really want to keep this kind of test? We eventually could once startup performance is known (see isLowerPerformance discussion on squeak-dev), but in the interim, I suggest we neutralize this specific test in Smalltalk-CI.
Here is another source of frequent C.I. failures:
MCMethodDefinitionTest
✗ #testLoadAndUnload (20255ms)
TestFailure: Test timed out
Presumably not a lean and mean test...
Le mar. 5 janv. 2021 à 17:59, Ron Teitelbaum ron@usmedrec.com a écrit :
Seems like more of a warning and not a failure.
All the best,
Ron Teitelbaum
On Tue, Jan 5, 2021 at 3:22 AM Marcel Taeumel marcel.taeumel@hpi.de wrote:
Hi Nicolas.
Do we really want to keep this kind of test?
Such benchmarks (and benchmark-like tests) should at least average over several runs and only fail as a test if something actually got slower on average. Or something like that. A single misbehaving run should not be the reason for such a test failure.
Maybe we can tweak #should:notTakeMoreThan: to evaluate the block several times? But then it cannot fail early on as it is doing now ... Hmmm...
Best, Marcel
Am 05.01.2021 09:08:46 schrieb Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Hi all, sometimes, some build fail for just 1 test...
Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844 a squeak.stack.v3
RenderBugz ✗ #testSetForward (7ms) TestFailure: Block evaluation took more than the expected 0:00:00:00.004 RenderBugz(TestCase)>>assert:description: RenderBugz(TestCase)>>should:notTakeMoreThan: RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds: RenderBugz>>shouldntTakeLong: RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 . self assert: ( t forwardDirection = 0.0 ) ] RenderBugz(TestCase)>>performTest
4ms, really? On C.I. infrastructure, anything can happen... Do we really want to keep this kind of test? We eventually could once startup performance is known (see isLowerPerformance discussion on squeak-dev), but in the interim, I suggest we neutralize this specific test in Smalltalk-CI.
Yet another one (stack.v3)
SUnitToolBuilderTests 837fef_b498
✗ #testHandlingNotification (18863ms)
Le mar. 12 janv. 2021 à 14:18, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com a écrit :
Here is another source of frequent C.I. failures:
MCMethodDefinitionTest
✗ #testLoadAndUnload (20255ms)
TestFailure: Test timed out
Presumably not a lean and mean test...
Le mar. 5 janv. 2021 à 17:59, Ron Teitelbaum ron@usmedrec.com a écrit :
Seems like more of a warning and not a failure.
All the best,
Ron Teitelbaum
On Tue, Jan 5, 2021 at 3:22 AM Marcel Taeumel marcel.taeumel@hpi.de wrote:
Hi Nicolas.
Do we really want to keep this kind of test?
Such benchmarks (and benchmark-like tests) should at least average over several runs and only fail as a test if something actually got slower on average. Or something like that. A single misbehaving run should not be the reason for such a test failure.
Maybe we can tweak #should:notTakeMoreThan: to evaluate the block several times? But then it cannot fail early on as it is doing now ... Hmmm...
Best, Marcel
Am 05.01.2021 09:08:46 schrieb Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Hi all, sometimes, some build fail for just 1 test...
Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844 a squeak.stack.v3
RenderBugz ✗ #testSetForward (7ms) TestFailure: Block evaluation took more than the expected 0:00:00:00.004 RenderBugz(TestCase)>>assert:description: RenderBugz(TestCase)>>should:notTakeMoreThan: RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds: RenderBugz>>shouldntTakeLong: RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 . self assert: ( t forwardDirection = 0.0 ) ] RenderBugz(TestCase)>>performTest
4ms, really? On C.I. infrastructure, anything can happen... Do we really want to keep this kind of test? We eventually could once startup performance is known (see isLowerPerformance discussion on squeak-dev), but in the interim, I suggest we neutralize this specific test in Smalltalk-CI.
And the fun of it, each time I retry, I see a different random failure...
#########################
# 1 tests did not pass: #
#########################
CompiledMethodTest 16ccae_ca85
✗ #testCopyWithTrailerBytes (11332ms)
Le mar. 12 janv. 2021 à 15:23, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com a écrit :
Yet another one (stack.v3)
SUnitToolBuilderTests 837fef_b498
✗ #testHandlingNotification (18863ms)
Le mar. 12 janv. 2021 à 14:18, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com a écrit :
Here is another source of frequent C.I. failures:
MCMethodDefinitionTest
✗ #testLoadAndUnload (20255ms)
TestFailure: Test timed out
Presumably not a lean and mean test...
Le mar. 5 janv. 2021 à 17:59, Ron Teitelbaum ron@usmedrec.com a écrit :
Seems like more of a warning and not a failure.
All the best,
Ron Teitelbaum
On Tue, Jan 5, 2021 at 3:22 AM Marcel Taeumel marcel.taeumel@hpi.de wrote:
Hi Nicolas.
Do we really want to keep this kind of test?
Such benchmarks (and benchmark-like tests) should at least average over several runs and only fail as a test if something actually got slower on average. Or something like that. A single misbehaving run should not be the reason for such a test failure.
Maybe we can tweak #should:notTakeMoreThan: to evaluate the block several times? But then it cannot fail early on as it is doing now ... Hmmm...
Best, Marcel
Am 05.01.2021 09:08:46 schrieb Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Hi all, sometimes, some build fail for just 1 test...
Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844 a squeak.stack.v3
RenderBugz ✗ #testSetForward (7ms) TestFailure: Block evaluation took more than the expected 0:00:00:00.004 RenderBugz(TestCase)>>assert:description: RenderBugz(TestCase)>>should:notTakeMoreThan: RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds: RenderBugz>>shouldntTakeLong: RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 . self assert: ( t forwardDirection = 0.0 ) ] RenderBugz(TestCase)>>performTest
4ms, really? On C.I. infrastructure, anything can happen... Do we really want to keep this kind of test? We eventually could once startup performance is known (see isLowerPerformance discussion on squeak-dev), but in the interim, I suggest we neutralize this specific test in Smalltalk-CI.
Hmm, for the sake of documenting the randomly failing tests, here are two others:
###################################################### # Squeak-4.6 on Travis CI (2361.31) # # 3396 Tests with 2 Failures and 0 Errors in 158.13s # ######################################################
######################### # 2 tests did not pass: # #########################
PureBehaviorTest 8401de_4bcf
✗ #testMethodCategoryReorganization (20517ms)
SecureHashAlgorithmTest b63682_4bcf
✗ #testEmptyInput (12145ms)
Le mar. 12 janv. 2021 à 15:41, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com a écrit :
And the fun of it, each time I retry, I see a different random failure...
#########################
# 1 tests did not pass: #
#########################
CompiledMethodTest 16ccae_ca85
✗ #testCopyWithTrailerBytes (11332ms)
Le mar. 12 janv. 2021 à 15:23, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com a écrit :
Yet another one (stack.v3)
SUnitToolBuilderTests 837fef_b498
✗ #testHandlingNotification (18863ms)
Le mar. 12 janv. 2021 à 14:18, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com a écrit :
Here is another source of frequent C.I. failures:
MCMethodDefinitionTest
✗ #testLoadAndUnload (20255ms)
TestFailure: Test timed out
Presumably not a lean and mean test...
Le mar. 5 janv. 2021 à 17:59, Ron Teitelbaum ron@usmedrec.com a écrit :
Seems like more of a warning and not a failure.
All the best,
Ron Teitelbaum
On Tue, Jan 5, 2021 at 3:22 AM Marcel Taeumel marcel.taeumel@hpi.de wrote:
Hi Nicolas.
Do we really want to keep this kind of test?
Such benchmarks (and benchmark-like tests) should at least average over several runs and only fail as a test if something actually got slower on average. Or something like that. A single misbehaving run should not be the reason for such a test failure.
Maybe we can tweak #should:notTakeMoreThan: to evaluate the block several times? But then it cannot fail early on as it is doing now ... Hmmm...
Best, Marcel
Am 05.01.2021 09:08:46 schrieb Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Hi all, sometimes, some build fail for just 1 test...
Here https://travis-ci.com/github/OpenSmalltalk/opensmalltalk-vm/jobs/468407844 a squeak.stack.v3
RenderBugz ✗ #testSetForward (7ms) TestFailure: Block evaluation took more than the expected 0:00:00:00.004 RenderBugz(TestCase)>>assert:description: RenderBugz(TestCase)>>should:notTakeMoreThan: RenderBugz(TestCase)>>should:notTakeMoreThanMilliseconds: RenderBugz>>shouldntTakeLong: RenderBugz>>testSetForward ...shouldntTakeLong: [ t forwardDirection: 180.0 . self assert: ( t forwardDirection = 0.0 ) ] RenderBugz(TestCase)>>performTest
4ms, really? On C.I. infrastructure, anything can happen... Do we really want to keep this kind of test? We eventually could once startup performance is known (see isLowerPerformance discussion on squeak-dev), but in the interim, I suggest we neutralize this specific test in Smalltalk-CI.
vm-dev@lists.squeakfoundation.org