[SqueakSource] SSFilesystem woes

Andreas Raab andreas.raab at gmx.de
Fri Aug 3 06:49:51 UTC 2007


Hi -

[I couldn't find a mailing list dedicated to SqueakSource discussions so 
I'm abusing Squeak-dev here. If there is one, please point me to it]

We've been using SqueakSource at Qwaq for our internal projects and 
unfortunately it works only so-so. Mostly it works but every other day 
it basically spins out at 100% CPU and needs to be killed for good. 
Since this usually looses the last checkin(s) it's a major annoyance 
which we work around by sending an email message which includes a 
portion of the log so that at least you have a chance to see if it was 
"your" checkin that was just lost.

That was until two days ago. About a week or so ago, we ran out of disk 
space on the box and after restoring it the server was working quite 
well until it again spun out in 100%. After the restart we noticed that 
we hadn't lost just one but several dozen checkins - basically 
everything that happened after we run out of disk space didn't show up.

Since this smelled like major desaster I actually dug into the 
SqueakSource code to see what can be done to restore our data 
(fortunately, I could see that all of the data was actually on the 
server). This immediately showed a couple of major issues:

1) When SSFilesystem saves a repository it uses a mutex to serialize 
access but it doesn't protect a client from modifying the repository 
*while* it is saving. Since this is a process running in background 
priority, two saves in quick succession will lead to the second save 
modifying the repository that the first one is trying to write on disk. 
And indeed, looking at our problems, many of them show a pattern of two 
commits closely together like here:

2007-07-10T21:09:57+00:00 PUT /Qwaq/QwaqForums-1.0.42.mcm (qwaq)
2007-07-10T21:09:57+00:00 MODIFIED by SSSession>>putRequest:
2007-07-10T21:09:57+00:00 BEGIN SAVING
2007-07-10T21:11:03+00:00 PUT /Qwaq/QwaqForums-1.0.41.mcm (qwaq)
2007-07-10T21:11:03+00:00 MODIFIED by SSSession>>putRequest:

(note that the "END SAVING" is missing before the second put) So it 
seems like one of the failure modes is that the repository is being 
modified *while* it is being saved. In addition, I think that one of the 
reasons while so many of the saved snapshots are "kaputt" is simply that 
they are broken by the same concurrent modification.

I'd appreciate some insight from the authors (or anyone else 
knowledgeable) what the right fix for this problem might be. I have no 
idea how Seaside in general deals with these concurrency issues but it 
seems pretty clear that SSFilesystem is *not* safe in the face of 
concurrent modifications of the repository.

2) Much to my surprise I found that SSFilesystem actually *has* code 
that can be used to recover versions if any of the above happens 
(SSFilesystem>>importVersionsFor:) but it seems to be pretty much unused 
and affected by some bit rot. One of the things I did for our version is 
to hook this code up with the case that the last snapshot is kaputt, so 
that if there is a broken snapshot SSFilesystem automatically imports 
all the versions that aren't currently present in the repository. I'm 
attaching the recovery code in case anyone else has had similar problems.

Question: Does anyone use similar/other changes like those? If so I'd be 
interested in learning about them.

3) The speed (and snapshot size) of SSFilesystem is pretty abysmal (on 
our box a repository snapshot is about 4mins and about 4MB each). 
Looking at what it's writing it seems that most of it is information 
that is easily available from the .MCZs and really doesn't need to be 
kept in the snapshot.

Question: Is anyone using alternative storage mechanisms (lightweight & 
fast perhaps)? If so, what do you use and how does it work out? 
Generally speaking, what *do* people use for Squeaksource storage given 
that SSFilesystem is generally quite unreliable?

I'd appreciate any help on the above issues.

Cheers,
   - Andreas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: SSRecovery.1.cs
Type: text/x-csharp
Size: 2730 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20070802/4bd2bf9f/SSRecovery.1.bin


More information about the Squeak-dev mailing list