SqueakSource is down again

Lukas Renggli renggli at gmail.com
Mon Dec 24 08:20:27 UTC 2007


> > I loaded all your semaphore related patches a couple of months ago and
> > squeaksource.com ran quietly and happily up to a few weeks ago. Then
> > suddenly we got many processes hanging in Semaphore>>#critical:.
>
> If you could send a couple of complete stack dumps from the affected
> image it might be interesting. There is a possibility you were affected
> by the problem of primitiveSuspend (which we discussed earlier) but
> that's difficult to tell from a stack dump. Much much easier if you can
> go into the image and check whether the doIt I sent comes up empty or not.

The doIt you sent comes out empty, I've never seen a case where it
actually returned a process. For the stack dumps I've got only the
attached screenshot from the process browser that I took December 5.,
roughly a month after loading your patches.

> What we've experienced was basically that after the first commit, when
> our image went to saving the data model in a reference stream (via
> SSFileSystem; takes about two minutes or so), a second commit would
> wreck havoc on the system. You can probably simulate this by generating
> enough load from different clients on the network with or without
> SSFileSystem. And I don't like the idea of saving the image very much
> because it's probably not feasible to save multiple versions of that
> image which ultimately means that any data corruption kills the whole
> data model.

We save the image every hour, what only takes a couple of seconds. We
also recently fixed some bugs that caused it to block for minutes
afterwards.

> Interesting thought. It may be possible for some strange things to
> happen if Seaside doesn't take precautions of not accepting connections
> while in the midst of a save. The problem is that the image save/startup
> runs with whatever priority it's being issued at, so if there's another
> process running at the same time there is a chance this process
> interrupts the image save with the potential for strange things
> happening. Here is one way in which I could see this happening: A
> critical lock held by a process waiting for network traffic to occur
> when the image is saved. When the image is restored later on, that
> socket is no longer valid but the process could still wait on the
> semaphore, blocking the critical section for all other uses.

Current versions of the Kom server adapter for Seaside stop listening
while saving the image, but I have to check if this is also the case
with the version of Seaside used in squeaksource.com.

Cheers,
Lukas

-- 
Lukas Renggli
http://www.lukas-renggli.ch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Picture 1.png
Type: image/png
Size: 13375 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20071224/634e252f/Picture1.png


More information about the Squeak-dev mailing list