[Box-Admins] Re: [squeak-dev] SqueakSource.com home page (was: Fix for OSProcess - Where to commit?)

Frank Shearar frank.shearar at gmail.com
Fri Nov 15 20:13:33 UTC 2013


On 15 November 2013 18:11, David T. Lewis <lewis at mail.msen.com> wrote:
> On Fri, Nov 15, 2013 at 09:31:18AM -0800, Eliot Miranda wrote:
>> Hi David,
>>
>> On Thu, Nov 14, 2013 at 6:00 PM, David T. Lewis <lewis at mail.msen.com> wrote:
>>
>> > Attached is a screen shot of the process browser in the squeaksource.com
>> > image, showing the excess SSSession processes. They are deadlocked on
>> > accessing DateAndTime now, which contains a critical section using the
>> > LastTickSemaphore in class DateAndTime.
>> >
>> > In the squeaksource.com image, LastTickSemaphore has 0 excess signals,
>> > whereas in other images I look at, it has 1 excess signal. This looks
>> > to me like a mutex that has gotten confused.
>> >
>> > I sent a signal to LastTickSemaphore in class DateAndTime, and now it
>> > looks like a mutex again. Let's see if that clears the problem.
>> >
>> > This certainly has a bad smell about it :-(  But I note also that
>> > we are running our SqueakSource services on older images, and a number
>> > of changes have been made to DateAndTime since then.
>> >
>> > Nicolas, I will send private email to give you the VNC password for access
>> > to the squeaksource.com image in case you need it (I am going to get some
>> > sleep soon).
>> >
>>
>> All this LastTickSemaphore stuff is complete nonsense, wasting on average
>> 1/2 a second on startup spinning until the clock rolls over.  If we move to
>> the 64-bit microsecond timebase which is provided by the Cog time
>> primitives we don't need to sync the second and the millisecond clocks
>> because they are replaced by a single microsecond clock.  If the current
>> Interpreter VMs do support the 64-bit microsecond primitives I suggest we
>> move ASAP.  QWe've already done this in our images at Cadence and been
>> running happily with it for several months.  Would this help?
>>
>
> Yes, it probably would help, in the sense that it would make this particular
> failure scenario impossible.
>
> But I think that something else must be going on here, and it would be
> worth getting to the bottom of it. The particular deadlock we are seeing
> here should be impossible, regardless of how nonsensical the LastTickSemaphore
> stuff may be. We are looking at a small section of code evaluated within
> a critical section. If semaphores and process scheduling are working
> correctly, it should be impossible for two different processes to deadlock
> on that section.
>
> I recall that Andreas made an important fix to process scheduling perhaps
> a couple of years ago, but I can't remember the details. I wonder if our
> SqueakSource images may be lacking that fix?
>
> Also, Chris Muller pointed out that he has seen similar symptoms related
> to Seaside:
>
>   This might be a problem I think I observed with using Seaside's
>   #returnResponse: inside a Mutex's critical: block.  The block is
>   entered, the Sema waited but never resignaled, leaving all subsequent
>   processes stuck waiting.
>
> So I am concerned about the following: How is it possible that a semaphore
> that is used privately by a small section of uncomplicated code ever
> get itself into a state where it has missed a signal and no longer
> functions as a mutex? In normal operation this never happens, but is
> there some scenario related to Seaside operations, socket timeouts,
> process scheduling, or image save and restart that might lead to this
> condition?

I really don't know if this is anything more than a wild shot in the
dark, but Seaside does (or used to) perform stack slicing. It was
famed as being based on continuations. So anyway, is this Seaside
installation using continuations? I'm pretty sure Seaside's
continuations play correctly with #ensure: and stuff (being based on
resumable exceptions), but you never know...?

frank

> BTW, squeaksource.com seems to be working nicely since I signalled
> that semaphore yesterday to break things loose. I uploaded a few packages
> today without problems. But the problem will be back, I am certain
> of that.
>
> Dave
>
>


More information about the Squeak-dev mailing list