On Sun, Mar 10, 2013 at 12:10 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
OK, see the VM thread, I now think that problems does not come from COG, but from ClassBuilder which in some cases fail to clean a cache (primitive 116). The problem does not show up in interpreter VM thanks to primitive 119 (this primitives does not unlink send in cogit).
it does unlink sends, but only for that selector. But is it really the case that it is a missing cache flush or is it a bug in Cog with its cache flushing? I realised the way to test this is to try the Stack VM and see if it crashes or not. I just tried that but now neither Cog nor the Stack VM crash although both fail the load with an MNU of #do: to UndefinedObject in Environment>>bindingOf:ifAbsent:. So how do I get the system back to a state where I can reproduce the Cog crash to compare the Stack and Cog VMs with each other?
(Apologies for being unresponsive; I've just moved into a new apartment and only got my internet connection yesterday afternoon; at least its fast (for the states) :) ).
I have attempted a ClassBuilder fix and posted new updates from nice-222 to cwp-227.
Can I please ask our testers contribution once again?
Nicolas
2013/3/8 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
2013/3/8 Bert Freudenberg bert@freudenbergs.de:
On 2013-03-08, at 10:55, Frank Shearar frank.shearar@gmail.com wrote:
On 7 March 2013 23:25, Frank Shearar frank.shearar@gmail.com wrote:
On 7 March 2013 23:11, Bert Freudenberg bert@freudenbergs.de wrote:
On 2013-03-07, at 23:42, Frank Shearar frank.shearar@gmail.com wrote:
>>> On 6 March 2013 15:59, Ken G. Brown kbrown@mac.com wrote: >>>> Running on COG 2397, and after updating fresh Squeak4.4-12327 Release to >>>> 12332, updating to Trunk fails at first attempt in the same place, then by >>>> abandoning and trying the update again, it apparently completes to 12511. >>>> >>>> Ken G. Brown >>>> >>>>> With COG 2678, pretty well the same. First attempt it timed out during >>>>> the same update-nice-223, then trying again from what had already been >>>>> loaded, got the following during the same update, during compiling >>>>> SMLoader-fbs-78 as before: > > What I find strange about all this is that we take a 4.4-12327 image > and whatever the latest Cog is and update it all the way without any > probems quite a few times a day on the CI server. > > frank
Looks like it's an intermittent problem, unfortunately:
I just updated the new all-in-one-cog to latest trunk, no problem. This is a 4.4-12327 image with Cog VM 2697.
I then tried what Ken described: update the fresh image first from the squeak44 stream, then switch to trunk, then update again.
BOOM. Cog crash. Didn't save the log unfortunately.
Tried again. Update, switch to trunk, update again. No crash. What?!
Once more. Update, switch to trunk, update. Crash! See below.
Tried yet again, with switching to trunk immediately in a fresh image. Crashes, too, same place.
So it does crash, just not always. But it's been more than 50% in my case.
Ah, interesting. The CI jobs, naturally, don't update from squeak44; they switch to trunk and update just like that. Which I would have thought would make no difference...
Actually, I lie. Here's an example of the CI jobs hitting the same issue: http://build.squeak.org/job/SqueakTrunk/204/console And further if you look at http://build.squeak.org/job/SqueakTrunk/ and choose to see the failing tests you'll see times (say around build #184) where the test failure count is unusually low. And http://build.squeak.org/job/SqueakTrunk/buildTimeTrend shows grey streaks where builds die.
Curious that it still runs the tests at all if the update failed ...
So Cog crashes, but has someone tried to replicate this on an interpreter?
- Bert -
I think that the problem comes form COG which tries to use an obsolete method sent AFTER the recompilation of Parser which is not the expected behavior. I have triggered such kind of strange behavior that does not happen on an Interpreter VM, see the thread opened by Jeff Gonis '[Vm-dev] Cog VM Crash on Windows' For me, it must be related to a cache that is not cleaned-up, I don't know why.
Nicolas
vm-dev@lists.squeakfoundation.org