Hi Ross, a few things:
The last visible line in _magmaTestConductor.image is remote performing startAddStrings: with arguments #(true) in client 1.
Client 1 shows "ConnectionTimedOut: Data receive timed out" submit is executing submitGuard critical: [ "ensure connected" self protocolEstablished ifFalse: [ self connect ]. self primSubmit: aMaClientServerRequest ]
deeper into the evaluation, it's waiting for data with a 4s timeout.
This particular method is called from several tests in the suite. Only two pass a true argument, either #testForwardRecovery or #verifyAddToNode, the rest pass false. The former occurs fairly early in the test-suite, the latter would occur near the end of the test suite (at least 45 minutes in on a fast laptop). I'm not sure which happened, but it doesn't matter. If you look at that method, #startAddStrings:, you'll see that the code attempts to handle ConnectionTimedOut (actually NetworkError) so it can gracefully exit that loop. In the case you experienced the deadlock, the argument was set to true, so we are expecting a NetworkError. But there may still be an timing issue within the test-suite, not Magma, that has to do with the timing of network events, resulting in the deadlock.
The solution is to simply abandon the debugger in client1, the other process waiting on the Mutex will immediately resume and the tests will resume normally.
This was using the 3.9 image distributed in Debian.
Now, another thing you need to be sure of; update your 3.9 image with important fixes or you may experience a total image lock-up.
It's not a Magma issue, perhaps you remember the discussions on the list about 18-months ago related to Semaphore / critical / Delays and so forth that were locking up Seaside and other server-based images? Well, Magma was also affected.
Andreas provided great fixes which I have rolled up into a new package on SqueakMap:
Introducing Ma3.9FixPack.
Please load this package before kicking off the Magma test suite.
- Chris
magma@lists.squeakfoundation.org