Hi -
We recently had some "fun" chasing server lockups (with truly awful uptimes of about a day or less before things went downhill) and were finally able to track a huge portion of it down to problems with Delay. The effect we were seeing on our servers was that the system would randomly lock up and either go down to 0% CPU or 100% CPU.
After poking it with a USR1 signal (which, in our VMs is hooked up such that it prints all the call stacks in the image; it's a life-safer if you need to debug these issues) we usually found that all processes were waiting on Delay's AccessProtect (0%) or alternatively found that a particular process (the event tickler) would sit in a tight loop swallowing repeated errors complaining that "this delay is already scheduled".
After hours and hours of testing, debugging, and a little stroke of luck we finally found out that all of these issues were caused by the fact that Delay's internal structures are updated by the calling process (insertion into and removal from SuspendedDelays) which renders the process susceptible to being terminated in the midst of updating these structures.
If you look at the code, this is obviously an issue because if (for example) the calling process gets terminated while it's resorting SuspendedDelays the result is unpredictable. This is in particular an issue because the calling process is often running at a relatively low priority so interruption by other, high-priority processes is a common case. And if any of these higher priority processes kills the one that just happens to execute SortedCollection>>remove: anything can happen - from leaving a later delay in front of an earlier one (one of the cases we had indicated that this was just what had happened) to errors when doing the next insert/remove ("trying to evaluate a block that is already evaluated") to many more weirdnesses. Unfortunately, it is basically impossible to recreate this problem under any kind of controlled circumstances, mostly because you need a source of events that is truly independent from your time source.
As a consequence of our findings we rewrote Delay to deal with these issues properly and, having deployed the changes about ten days ago on our servers, all of these sources of problems simply vanished. We haven't had a single server problem which we couldn't attribute to our own stupidity (such as running out of disk space ;-)
The changes will in particular be helpful to you if you: * run network servers * fork processes to handle network requests * terminate these processes explicitly (on error conditions for example) * use Semaphore>>waitTimeoutMsecs: (all socket functions use this)
If you have seen random, unexplained lockups of your server (0% CPU load while being locked up is a dead giveaway[*]) I'd recommend using the attached changes (which work best on top of a VM with David Lewis' 64bit fixes applied) and see if that helps. For us, they made the difference between running the server in Squeak and rewriting it in Java.
I've also filed this as http://bugs.squeak.org/view.php?id=6576
[*] The 0% CPU lockups have sometimes been attributed to issues with Linux wait functions. After having seen the havoc that Delay wrecks on the system I don't buy these explanations any longer. A much simpler (and more likely) explanation is that Delay went wild.
Cheers, - Andreas
Hi Andreas,
That's very important patch and very interesting to me too, because I'm just deciding to put some of my public Aida/Web websites from VW to Squeak and I was afraid of such issues as one you just solved.
Is there any chance that this patch goes to 3.10?
Best regards Janko
Andreas Raab wrote:
Hi -
We recently had some "fun" chasing server lockups (with truly awful uptimes of about a day or less before things went downhill) and were finally able to track a huge portion of it down to problems with Delay. The effect we were seeing on our servers was that the system would randomly lock up and either go down to 0% CPU or 100% CPU.
After poking it with a USR1 signal (which, in our VMs is hooked up such that it prints all the call stacks in the image; it's a life-safer if you need to debug these issues) we usually found that all processes were waiting on Delay's AccessProtect (0%) or alternatively found that a particular process (the event tickler) would sit in a tight loop swallowing repeated errors complaining that "this delay is already scheduled".
After hours and hours of testing, debugging, and a little stroke of luck we finally found out that all of these issues were caused by the fact that Delay's internal structures are updated by the calling process (insertion into and removal from SuspendedDelays) which renders the process susceptible to being terminated in the midst of updating these structures.
If you look at the code, this is obviously an issue because if (for example) the calling process gets terminated while it's resorting SuspendedDelays the result is unpredictable. This is in particular an issue because the calling process is often running at a relatively low priority so interruption by other, high-priority processes is a common case. And if any of these higher priority processes kills the one that just happens to execute SortedCollection>>remove: anything can happen - from leaving a later delay in front of an earlier one (one of the cases we had indicated that this was just what had happened) to errors when doing the next insert/remove ("trying to evaluate a block that is already evaluated") to many more weirdnesses. Unfortunately, it is basically impossible to recreate this problem under any kind of controlled circumstances, mostly because you need a source of events that is truly independent from your time source.
As a consequence of our findings we rewrote Delay to deal with these issues properly and, having deployed the changes about ten days ago on our servers, all of these sources of problems simply vanished. We haven't had a single server problem which we couldn't attribute to our own stupidity (such as running out of disk space ;-)
The changes will in particular be helpful to you if you:
- run network servers
- fork processes to handle network requests
- terminate these processes explicitly (on error conditions for example)
- use Semaphore>>waitTimeoutMsecs: (all socket functions use this)
If you have seen random, unexplained lockups of your server (0% CPU load while being locked up is a dead giveaway[*]) I'd recommend using the attached changes (which work best on top of a VM with David Lewis' 64bit fixes applied) and see if that helps. For us, they made the difference between running the server in Squeak and rewriting it in Java.
I've also filed this as http://bugs.squeak.org/view.php?id=6576
[*] The 0% CPU lockups have sometimes been attributed to issues with Linux wait functions. After having seen the havoc that Delay wrecks on the system I don't buy these explanations any longer. A much simpler (and more likely) explanation is that Delay went wild.
Cheers,
- Andreas
'From Croquet1.0beta of 11 April 2006 [latest update: #1] on 23 July 2007 at 11:53:23 pm'! "Change Set: SafeDelay Date: 23 July 2007 Author: Andreas Raab
This change set fixes a set of severe problems with concurrent use of Delay. Previously, many of the delay-internal structures were modified by the calling process which made it susceptible to being terminated in the middle of manipulating these structures and leave Delay (and consequently the entire system) in an inconsistent state.
This change set fixes this problem by moving *all* manipulation of Delay's internal structures out of the calling process. As a side-effect it also removes the requirement of Delays being limited to SmallInteger range; the new code has no limitation on the duration of a delay.
No tests are provided since outside of true asynchronous environments (networks) it is basically impossible to recreate the situation reliably."!
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 21:24'! activate "Private!! Make the receiver the Delay to be awoken when the next timer interrupt occurs. This method should only be called from a block protected by the AccessProtect semaphore." TimerEventLoop ifNotNil:[^nil]. ActiveDelay := self. ActiveDelayStartTime := Time millisecondClockValue. ActiveDelayStartTime > resumptionTime ifTrue:[ ActiveDelay signalWaitingProcess. SuspendedDelays isEmpty ifTrue:[ ActiveDelay := nil. ActiveDelayStartTime := nil. ] ifFalse:[SuspendedDelays removeFirst activate]. ] ifFalse:[ TimingSemaphore initSignals. Delay primSignal: TimingSemaphore atMilliseconds: resumptionTime. ].! !
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 21:55'! schedule "Private!! Schedule this Delay, but return immediately rather than waiting. The receiver's semaphore will be signalled when its delay duration has elapsed."
beingWaitedOn ifTrue: [self error: 'This Delay has already been scheduled.'].
TimerEventLoop ifNotNil:[^self scheduleEvent]. AccessProtect critical: [ beingWaitedOn := true. resumptionTime := Time millisecondClockValue + delayDuration. ActiveDelay == nil ifTrue: [self activate] ifFalse: [ resumptionTime < ActiveDelay resumptionTime ifTrue: [ SuspendedDelays add: ActiveDelay. self activate] ifFalse: [SuspendedDelays add: self]]]. ! !
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 22:33'! scheduleEvent "Schedule this delay" resumptionTime := Time millisecondClockValue + delayDuration. AccessProtect critical:[ ScheduledDelay := self. TimingSemaphore signal. ].! !
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 21:55'! unschedule "Unschedule this Delay. Do nothing if it wasn't scheduled."
| done | TimerEventLoop ifNotNil:[^self unscheduleEvent]. AccessProtect critical: [ done := false. [done] whileFalse: [SuspendedDelays remove: self ifAbsent: [done := true]]. ActiveDelay == self ifTrue: [ SuspendedDelays isEmpty ifTrue: [ ActiveDelay := nil. ActiveDelayStartTime := nil] ifFalse: [ SuspendedDelays removeFirst activate]]]. ! !
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 21:56'! unscheduleEvent AccessProtect critical:[ FinishedDelay := self. TimingSemaphore signal. ].! !
!Delay methodsFor: 'public' stamp: 'ar 7/10/2007 21:49'! beingWaitedOn "Answer whether this delay is currently scheduled, e.g., being waited on" ^beingWaitedOn! !
!Delay methodsFor: 'public' stamp: 'ar 7/10/2007 21:49'! beingWaitedOn: aBool "Indicate whether this delay is currently scheduled, e.g., being waited on" beingWaitedOn := aBool! !
!Delay methodsFor: 'public' stamp: 'ar 7/10/2007 20:56'! delayDuration ^delayDuration! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/11/2007 10:35'! handleTimerEvent "Handle a timer event; which can be either: - a schedule request (ScheduledDelay notNil) - an unschedule request (FinishedDelay notNil) - a timer signal (not explicitly specified) We check for timer expiry every time we get a signal." | nextTick | "Wait until there is work to do." TimingSemaphore wait.
"Process any schedule requests" ScheduledDelay ifNotNil:[ "Schedule the given delay" self scheduleDelay: ScheduledDelay. ScheduledDelay := nil. ].
"Process any unschedule requests" FinishedDelay ifNotNil:[ self unscheduleDelay: FinishedDelay. FinishedDelay := nil. ].
"Check for clock wrap-around." nextTick := Time millisecondClockValue. nextTick < ActiveDelayStartTime ifTrue: [ "clock wrapped" self saveResumptionTimes. self restoreResumptionTimes. ]. ActiveDelayStartTime := nextTick.
"Signal any expired delays" [ActiveDelay notNil and:[ Time millisecondClockValue >= ActiveDelay resumptionTime]] whileTrue:[ ActiveDelay signalWaitingProcess. SuspendedDelays isEmpty ifTrue: [ActiveDelay := nil] ifFalse:[ActiveDelay := SuspendedDelays removeFirst]. ].
"And signal when the next request is due. We sleep at most 1sec here as a soft busy-loop so that we don't accidentally miss signals." nextTick := Time millisecondClockValue + 1000. ActiveDelay ifNotNil:[nextTick := nextTick min: ActiveDelay resumptionTime]. nextTick := nextTick min: SmallInteger maxVal.
"Since we have processed all outstanding requests, reset the timing semaphore so that only new work will wake us up again. Do this RIGHT BEFORE setting the next wakeup call from the VM because it is only signaled once so we mustn't miss it." TimingSemaphore initSignals. Delay primSignal: TimingSemaphore atMilliseconds: nextTick. ! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/11/2007 09:04'! runTimerEventLoop "Run the timer event loop." [ [RunTimerEventLoop] whileTrue: [self handleTimerEvent] ] on: Error do:[:ex| "Clear out the process so it does't get killed" TimerEventLoop := nil. "Launch the old-style interrupt watcher" self startTimerInterruptWatcher. "And pass the exception on" ex pass. ].! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 22:32'! scheduleDelay: aDelay "Private. Schedule this Delay." aDelay beingWaitedOn: true. ActiveDelay ifNil:[ ActiveDelay := aDelay ] ifNotNil:[ aDelay resumptionTime < ActiveDelay resumptionTime ifTrue:[ SuspendedDelays add: ActiveDelay. ActiveDelay := aDelay. ] ifFalse: [SuspendedDelays add: aDelay]. ]. ! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/11/2007 10:18'! startTimerEventLoop "Start the timer event loop" "Delay startTimerEventLoop" self stopTimerEventLoop. self stopTimerInterruptWatcher. AccessProtect := Semaphore forMutualExclusion. ActiveDelayStartTime := Time millisecondClockValue. SuspendedDelays := Heap withAll: (SuspendedDelays ifNil:[#()]) sortBlock: [:d1 :d2 | d1 resumptionTime <= d2 resumptionTime]. TimingSemaphore := Semaphore new. RunTimerEventLoop := true. TimerEventLoop := [self runTimerEventLoop] newProcess. TimerEventLoop priority: Processor timingPriority. TimerEventLoop resume. TimingSemaphore signal. "get going" ! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 22:32'! startTimerInterruptWatcher "Reset the class variables that keep track of active Delays and re-start the timer interrupt watcher process. Any currently scheduled delays are forgotten." "Delay startTimerInterruptWatcher" | p | self stopTimerEventLoop. self stopTimerInterruptWatcher. TimingSemaphore := Semaphore new. AccessProtect := Semaphore forMutualExclusion. SuspendedDelays := SortedCollection sortBlock: [:d1 :d2 | d1 resumptionTime <= d2 resumptionTime]. ActiveDelay := nil. p := [self timerInterruptWatcher] newProcess. p priority: Processor timingPriority. p resume. ! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 21:26'! stopTimerEventLoop "Stop the timer event loop" RunTimerEventLoop := false. TimingSemaphore signal. TimerEventLoop := nil.! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 21:32'! stopTimerInterruptWatcher "Reset the class variables that keep track of active Delays and re-start the timer interrupt watcher process. Any currently scheduled delays are forgotten." "Delay startTimerInterruptWatcher" self primSignal: nil atMilliseconds: 0. TimingSemaphore ifNotNil:[TimingSemaphore terminateProcess].! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 22:33'! unscheduleDelay: aDelay "Private. Unschedule this Delay." ActiveDelay == aDelay ifTrue: [ SuspendedDelays isEmpty ifTrue:[ ActiveDelay := nil. ] ifFalse: [ ActiveDelay := SuspendedDelays removeFirst. ] ] ifFalse:[ SuspendedDelays remove: aDelay ifAbsent: []. ]. aDelay beingWaitedOn: false.! !
!Delay class methodsFor: 'class initialization' stamp: 'ar 7/11/2007 18:16'! initialize "Delay initialize" self startTimerEventLoop.! !
Delay initialize!
Hi,
2007/7/24, Janko Mivšek janko.mivsek@eranova.si:
That's very important patch and very interesting to me too, because I'm just deciding to put some of my public Aida/Web websites from VW to Squeak and I was afraid of such issues as one you just solved.
Is there any chance that this patch goes to 3.10?
Chance would be greater if unit tests were included.
Damien Cassou wrote:
Is there any chance that this patch goes to 3.10?
Chance would be greater if unit tests were included.
Good luck with that. I tried for a couple of hours to find a reliable way of creating this problem with not even so much as a hint of being able to make it happen. The problem is that on any local machine you're never completely independent from the time source of that machine and if you are dependent on the time source you are in sync with Delay and everything will be fine. You need an independent source of events and I've yet to find someone who shows me how to write unit tests across multiple machines (and no, running multiple images on the same machine doesn't work because your process scheduler uses the same time source that your image uses so it's not independent).
Cheers, - Andreas
Andreas Raab wrote:
Damien Cassou wrote:
Is there any chance that this patch goes to 3.10?
Chance would be greater if unit tests were included.
Good luck with that. I tried for a couple of hours to find a reliable way of creating this problem with not even so much as a hint of being able to make it happen. The problem is that on any local machine you're never completely independent from the time source of that machine and if you are dependent on the time source you are in sync with Delay and everything will be fine. You need an independent source of events and I've yet to find someone who shows me how to write unit tests across multiple machines (and no, running multiple images on the same machine doesn't work because your process scheduler uses the same time source that your image uses so it's not independent).
Cheers,
- Andreas
I dont know if this would help, but I used the process specific package to warp the clock, so that you can specify your own DateAndTime implementation on a per process basis. With this technique you can run the clock at 2x speed or even backwards, so you may be able to schedule specific event times to recreate the bug.
The code is in monticello repository http://gjallar.krampe.se/ in the ProcessSpecific package
Keith
On Jul 24, 2007, at 10:16 , Damien Cassou wrote:
Hi,
2007/7/24, Janko Mivšek janko.mivsek@eranova.si:
That's very important patch and very interesting to me too, because I'm just deciding to put some of my public Aida/Web websites from VW to Squeak and I was afraid of such issues as one you just solved.
Is there any chance that this patch goes to 3.10?
Chance would be greater if unit tests were included.
No. It just takes a couple of people filing this in and using the image for a while. Preferably on servers. And then reporting their findings.
- Bert -
El 7/24/07 5:40 AM, "Bert Freudenberg" bert@freudenbergs.de escribió:
No. It just takes a couple of people filing this in and using the image for a while. Preferably on servers. And then reporting their findings.
- Bert -
Damien and Janko I watching your finds. This is the last week of vacations, so on next Monday when students come , I put they to test on Mac , Windows XP and 98 and I hope on Umbuntu also.
Edgar
On 24-Jul-07, at 24-Jul;1:40 AM, Bert Freudenberg wrote:
On Jul 24, 2007, at 10:16 , Damien Cassou wrote:
Hi,
2007/7/24, Janko Mivšek janko.mivsek@eranova.si:
That's very important patch and very interesting to me too, because I'm just deciding to put some of my public Aida/Web websites from VW to Squeak and I was afraid of such issues as one you just solved.
Is there any chance that this patch goes to 3.10?
Chance would be greater if unit tests were included.
No. It just takes a couple of people filing this in and using the image for a while. Preferably on servers. And then reporting their findings.
Whilst it may be effectively impossible to do unit tests to cover the problem this code is intended to fix it would be quite nice to have some Delay tests in the image (caveat - I've just checked my *working* image which is a sophie development image and thus 3.8 based) to make sure that the Delay and related classes work the way we expect after this fix is applied. T'would be tragic to find that it stops a major lockup but breaks some small routine function.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: BPB: Branch on Program Bug
You can apply the fix yourself; it works in all Squeak versions that I'm aware of (and if not, you'll find out really quickly ;-) This is just the kind of thing for which I wanted to see some sort of "standard package" for so that people across various Squeak versions can benefit from it.
Cheers, - Andreas
Janko Mivšek wrote:
Hi Andreas,
That's very important patch and very interesting to me too, because I'm just deciding to put some of my public Aida/Web websites from VW to Squeak and I was afraid of such issues as one you just solved.
Is there any chance that this patch goes to 3.10?
Best regards Janko
Andreas Raab wrote:
Hi -
We recently had some "fun" chasing server lockups (with truly awful uptimes of about a day or less before things went downhill) and were finally able to track a huge portion of it down to problems with Delay. The effect we were seeing on our servers was that the system would randomly lock up and either go down to 0% CPU or 100% CPU.
After poking it with a USR1 signal (which, in our VMs is hooked up such that it prints all the call stacks in the image; it's a life-safer if you need to debug these issues) we usually found that all processes were waiting on Delay's AccessProtect (0%) or alternatively found that a particular process (the event tickler) would sit in a tight loop swallowing repeated errors complaining that "this delay is already scheduled".
After hours and hours of testing, debugging, and a little stroke of luck we finally found out that all of these issues were caused by the fact that Delay's internal structures are updated by the calling process (insertion into and removal from SuspendedDelays) which renders the process susceptible to being terminated in the midst of updating these structures.
If you look at the code, this is obviously an issue because if (for example) the calling process gets terminated while it's resorting SuspendedDelays the result is unpredictable. This is in particular an issue because the calling process is often running at a relatively low priority so interruption by other, high-priority processes is a common case. And if any of these higher priority processes kills the one that just happens to execute SortedCollection>>remove: anything can happen
- from leaving a later delay in front of an earlier one (one of the
cases we had indicated that this was just what had happened) to errors when doing the next insert/remove ("trying to evaluate a block that is already evaluated") to many more weirdnesses. Unfortunately, it is basically impossible to recreate this problem under any kind of controlled circumstances, mostly because you need a source of events that is truly independent from your time source.
As a consequence of our findings we rewrote Delay to deal with these issues properly and, having deployed the changes about ten days ago on our servers, all of these sources of problems simply vanished. We haven't had a single server problem which we couldn't attribute to our own stupidity (such as running out of disk space ;-)
The changes will in particular be helpful to you if you:
- run network servers
- fork processes to handle network requests
- terminate these processes explicitly (on error conditions for example)
- use Semaphore>>waitTimeoutMsecs: (all socket functions use this)
If you have seen random, unexplained lockups of your server (0% CPU load while being locked up is a dead giveaway[*]) I'd recommend using the attached changes (which work best on top of a VM with David Lewis' 64bit fixes applied) and see if that helps. For us, they made the difference between running the server in Squeak and rewriting it in Java.
I've also filed this as http://bugs.squeak.org/view.php?id=6576
[*] The 0% CPU lockups have sometimes been attributed to issues with Linux wait functions. After having seen the havoc that Delay wrecks on the system I don't buy these explanations any longer. A much simpler (and more likely) explanation is that Delay went wild.
Cheers,
- Andreas
'From Croquet1.0beta of 11 April 2006 [latest update: #1] on 23 July 2007 at 11:53:23 pm'! "Change Set: SafeDelay Date: 23 July 2007 Author: Andreas Raab
This change set fixes a set of severe problems with concurrent use of Delay. Previously, many of the delay-internal structures were modified by the calling process which made it susceptible to being terminated in the middle of manipulating these structures and leave Delay (and consequently the entire system) in an inconsistent state.
This change set fixes this problem by moving *all* manipulation of Delay's internal structures out of the calling process. As a side-effect it also removes the requirement of Delays being limited to SmallInteger range; the new code has no limitation on the duration of a delay.
No tests are provided since outside of true asynchronous environments (networks) it is basically impossible to recreate the situation reliably."!
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 21:24'! activate "Private!! Make the receiver the Delay to be awoken when the next timer interrupt occurs. This method should only be called from a block protected by the AccessProtect semaphore." TimerEventLoop ifNotNil:[^nil]. ActiveDelay := self. ActiveDelayStartTime := Time millisecondClockValue. ActiveDelayStartTime > resumptionTime ifTrue:[ ActiveDelay signalWaitingProcess. SuspendedDelays isEmpty ifTrue:[ ActiveDelay := nil. ActiveDelayStartTime := nil. ] ifFalse:[SuspendedDelays removeFirst activate]. ] ifFalse:[ TimingSemaphore initSignals. Delay primSignal: TimingSemaphore atMilliseconds: resumptionTime. ].! !
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 21:55'! schedule "Private!! Schedule this Delay, but return immediately rather than waiting. The receiver's semaphore will be signalled when its delay duration has elapsed."
beingWaitedOn ifTrue: [self error: 'This Delay has already been
scheduled.'].
TimerEventLoop ifNotNil:[^self scheduleEvent]. AccessProtect critical: [ beingWaitedOn := true. resumptionTime := Time millisecondClockValue + delayDuration. ActiveDelay == nil ifTrue: [self activate] ifFalse: [ resumptionTime < ActiveDelay resumptionTime ifTrue: [ SuspendedDelays add: ActiveDelay. self activate] ifFalse: [SuspendedDelays add: self]]].
! !
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 22:33'! scheduleEvent "Schedule this delay" resumptionTime := Time millisecondClockValue + delayDuration. AccessProtect critical:[ ScheduledDelay := self. TimingSemaphore signal. ].! !
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 21:55'! unschedule "Unschedule this Delay. Do nothing if it wasn't scheduled."
| done | TimerEventLoop ifNotNil:[^self unscheduleEvent]. AccessProtect critical: [ done := false. [done] whileFalse: [SuspendedDelays remove: self ifAbsent: [done := true]]. ActiveDelay == self ifTrue: [ SuspendedDelays isEmpty ifTrue: [ ActiveDelay := nil. ActiveDelayStartTime := nil] ifFalse: [ SuspendedDelays removeFirst activate]]].
! !
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 21:56'! unscheduleEvent AccessProtect critical:[ FinishedDelay := self. TimingSemaphore signal. ].! !
!Delay methodsFor: 'public' stamp: 'ar 7/10/2007 21:49'! beingWaitedOn "Answer whether this delay is currently scheduled, e.g., being waited on" ^beingWaitedOn! !
!Delay methodsFor: 'public' stamp: 'ar 7/10/2007 21:49'! beingWaitedOn: aBool "Indicate whether this delay is currently scheduled, e.g., being waited on" beingWaitedOn := aBool! !
!Delay methodsFor: 'public' stamp: 'ar 7/10/2007 20:56'! delayDuration ^delayDuration! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/11/2007 10:35'! handleTimerEvent "Handle a timer event; which can be either: - a schedule request (ScheduledDelay notNil) - an unschedule request (FinishedDelay notNil) - a timer signal (not explicitly specified) We check for timer expiry every time we get a signal." | nextTick | "Wait until there is work to do." TimingSemaphore wait.
"Process any schedule requests" ScheduledDelay ifNotNil:[ "Schedule the given delay" self scheduleDelay: ScheduledDelay. ScheduledDelay := nil. ]. "Process any unschedule requests" FinishedDelay ifNotNil:[ self unscheduleDelay: FinishedDelay. FinishedDelay := nil. ]. "Check for clock wrap-around." nextTick := Time millisecondClockValue. nextTick < ActiveDelayStartTime ifTrue: [ "clock wrapped" self saveResumptionTimes. self restoreResumptionTimes. ]. ActiveDelayStartTime := nextTick. "Signal any expired delays" [ActiveDelay notNil and:[ Time millisecondClockValue >= ActiveDelay resumptionTime]]
whileTrue:[ ActiveDelay signalWaitingProcess. SuspendedDelays isEmpty ifTrue: [ActiveDelay := nil] ifFalse:[ActiveDelay := SuspendedDelays removeFirst]. ].
"And signal when the next request is due. We sleep at most 1sec here as a soft busy-loop so that we don't accidentally miss signals." nextTick := Time millisecondClockValue + 1000. ActiveDelay ifNotNil:[nextTick := nextTick min: ActiveDelay
resumptionTime]. nextTick := nextTick min: SmallInteger maxVal.
"Since we have processed all outstanding requests, reset the
timing semaphore so that only new work will wake us up again. Do this RIGHT BEFORE setting the next wakeup call from the VM because it is only signaled once so we mustn't miss it." TimingSemaphore initSignals. Delay primSignal: TimingSemaphore atMilliseconds: nextTick. ! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/11/2007 09:04'! runTimerEventLoop "Run the timer event loop." [ [RunTimerEventLoop] whileTrue: [self handleTimerEvent] ] on: Error do:[:ex| "Clear out the process so it does't get killed" TimerEventLoop := nil. "Launch the old-style interrupt watcher" self startTimerInterruptWatcher. "And pass the exception on" ex pass. ].! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 22:32'! scheduleDelay: aDelay "Private. Schedule this Delay." aDelay beingWaitedOn: true. ActiveDelay ifNil:[ ActiveDelay := aDelay ] ifNotNil:[ aDelay resumptionTime < ActiveDelay resumptionTime ifTrue:[ SuspendedDelays add: ActiveDelay. ActiveDelay := aDelay. ] ifFalse: [SuspendedDelays add: aDelay]. ]. ! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/11/2007 10:18'! startTimerEventLoop "Start the timer event loop" "Delay startTimerEventLoop" self stopTimerEventLoop. self stopTimerInterruptWatcher. AccessProtect := Semaphore forMutualExclusion. ActiveDelayStartTime := Time millisecondClockValue. SuspendedDelays := Heap withAll: (SuspendedDelays ifNil:[#()]) sortBlock: [:d1 :d2 | d1 resumptionTime <= d2 resumptionTime]. TimingSemaphore := Semaphore new. RunTimerEventLoop := true. TimerEventLoop := [self runTimerEventLoop] newProcess. TimerEventLoop priority: Processor timingPriority. TimerEventLoop resume. TimingSemaphore signal. "get going" ! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 22:32'! startTimerInterruptWatcher "Reset the class variables that keep track of active Delays and re-start the timer interrupt watcher process. Any currently scheduled delays are forgotten." "Delay startTimerInterruptWatcher" | p | self stopTimerEventLoop. self stopTimerInterruptWatcher. TimingSemaphore := Semaphore new. AccessProtect := Semaphore forMutualExclusion. SuspendedDelays := SortedCollection sortBlock: [:d1 :d2 | d1 resumptionTime <= d2 resumptionTime]. ActiveDelay := nil. p := [self timerInterruptWatcher] newProcess. p priority: Processor timingPriority. p resume. ! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 21:26'! stopTimerEventLoop "Stop the timer event loop" RunTimerEventLoop := false. TimingSemaphore signal. TimerEventLoop := nil.! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 21:32'! stopTimerInterruptWatcher "Reset the class variables that keep track of active Delays and re-start the timer interrupt watcher process. Any currently scheduled delays are forgotten." "Delay startTimerInterruptWatcher" self primSignal: nil atMilliseconds: 0. TimingSemaphore ifNotNil:[TimingSemaphore terminateProcess].! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 22:33'! unscheduleDelay: aDelay "Private. Unschedule this Delay." ActiveDelay == aDelay ifTrue: [ SuspendedDelays isEmpty ifTrue:[ ActiveDelay := nil. ] ifFalse: [ ActiveDelay := SuspendedDelays removeFirst. ] ] ifFalse:[ SuspendedDelays remove: aDelay ifAbsent: []. ]. aDelay beingWaitedOn: false.! !
!Delay class methodsFor: 'class initialization' stamp: 'ar 7/11/2007 18:16'! initialize "Delay initialize" self startTimerEventLoop.! !
Delay initialize!
On Tue, 24 Jul 2007 01:17:59 -0700, Andreas Raab andreas.raab@gmx.de wrote:
You can apply the fix yourself; it works in all Squeak versions that I'm aware of (and if not, you'll find out really quickly ;-) This is just the kind of thing for which I wanted to see some sort of "standard package" for so that people across various Squeak versions can benefit from it.
I tried filing it into my 3.8 (#6665) image, and the following variables are undeclared:
TimerEventLoop ScheduledDelay FinishedDelay
Perhaps there was supposed to be a class definition included?
Later, Jon
-------------------------------------------------------------- Jon Hylands Jon@huv.com http://www.huv.com/jon
Project: Micro Raptor (Small Biped Velociraptor Robot) http://www.huv.com/blog
And RunTimerEventLoop.
Good work though. May explain a few mysteries! (using quite a lot of processes with Delays...)
-----Original Message----- From: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] On Behalf Of Jon Hylands Sent: 24 July 2007 1:17 pm To: The general-purpose Squeak developers list Subject: Re: Delay and Server reliability
On Tue, 24 Jul 2007 01:17:59 -0700, Andreas Raab andreas.raab@gmx.de wrote:
You can apply the fix yourself; it works in all Squeak versions that I'm aware of (and if not, you'll find out really quickly ;-) This is just the kind of thing for which I wanted to see some sort of "standard package" for so that people across various Squeak versions can benefit from it.
I tried filing it into my 3.8 (#6665) image, and the following variables are undeclared:
TimerEventLoop ScheduledDelay FinishedDelay
Perhaps there was supposed to be a class definition included?
Later, Jon
-------------------------------------------------------------- Jon Hylands Jon@huv.com http://www.huv.com/jon
Project: Micro Raptor (Small Biped Velociraptor Robot) http://www.huv.com/blog
Ouch. You are right. Here is a variant with the class definition included.
Cheers, - Andreas
Jon Hylands wrote:
On Tue, 24 Jul 2007 01:17:59 -0700, Andreas Raab andreas.raab@gmx.de wrote:
You can apply the fix yourself; it works in all Squeak versions that I'm aware of (and if not, you'll find out really quickly ;-) This is just the kind of thing for which I wanted to see some sort of "standard package" for so that people across various Squeak versions can benefit from it.
I tried filing it into my 3.8 (#6665) image, and the following variables are undeclared:
TimerEventLoop ScheduledDelay FinishedDelay
Perhaps there was supposed to be a class definition included?
Later, Jon
Jon Hylands Jon@huv.com http://www.huv.com/jon
Project: Micro Raptor (Small Biped Velociraptor Robot) http://www.huv.com/blog
On Tue, 24 Jul 2007 09:03:44 -0700, Andreas Raab andreas.raab@gmx.de wrote:
Ouch. You are right. Here is a variant with the class definition included.
Thanks, that installs much better. I'll let you know how it works once I start testing again (I'm kinda down with pneumonia right now) - my server uses a bunch of processes and a lot of delays. In the current configuration, it is all on one machine, so I probably wouldn't run into the issue right now, but I also run with part of the server running on my PC, and the other part running on a gumstix, and they do socket communications in both directions. This may possibly explain why I have seen my gumstix system stop responding on occasion.
Later, Jon
-------------------------------------------------------------- Jon Hylands Jon@huv.com http://www.huv.com/jon
Project: Micro Raptor (Small Biped Velociraptor Robot) http://www.huv.com/blog
On Jul 24, 2007, at 9:03 AM, Andreas Raab wrote:
Ouch. You are right. Here is a variant with the class definition
Ok, well I'm wonder then if I should close
http://bugs.squeak.org/view.php?id=4882
which sounds similar, no cpu, nothing works. In looking at some stacks on the mac when this happens (very rare) everything seems to be waiting on Delays of some sort, and just the idle loop process is running (well sleeping...)
-- ======================================================================== === John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== ===
John M McIntosh wrote:
Ok, well I'm wonder then if I should close
http://bugs.squeak.org/view.php?id=4882
which sounds similar, no cpu, nothing works. In looking at some stacks on the mac when this happens (very rare) everything seems to be waiting on Delays of some sort, and just the idle loop process is running (well sleeping...)
Precisely. Those are the exact symptoms of the problem.
Cheers, - Andreas
Hi andreas
I was reading your code to exercise my bad concurrent skills. I guess that I'm hopeless on that but I always try :)
"This change set fixes this problem by moving *all* manipulation of Delay's internal structures out of the calling process."
Is this statement implemented by the AccessProtect critical...?
stef
Andreas Raab wrote:
Ouch. You are right. Here is a variant with the class definition included.
Cheers,
- Andreas
Andreas,
Are you sure that this is the complete patch? We are currently having a very similar problem: VNC doesn't respond to UI events, 0% cpu usage, several processes frozen in Delay although our Seaside server still responds. I tried your patch against Squeak3.9 but it had no effect on the problem. It could be that our problem isn't Delay but I was hoping it was ;-)
David
I'm pretty sure it's complete. If you want some help do this: * Launch the VM with output redirected to a file * Wait until it locks up * Attach gdb to the running process, e.g,: gdb --pid: <pid of vm> * Have it print all the call stacks, e.g.: p (int)printAllStacks * Look at the result output file.
Cheers, -Andreas
David Shaffer wrote:
Andreas Raab wrote:
Ouch. You are right. Here is a variant with the class definition included.
Cheers,
- Andreas
Andreas,
Are you sure that this is the complete patch? We are currently having a very similar problem: VNC doesn't respond to UI events, 0% cpu usage, several processes frozen in Delay although our Seaside server still responds. I tried your patch against Squeak3.9 but it had no effect on the problem. It could be that our problem isn't Delay but I was hoping it was ;-)
David
Andreas Raab wrote:
I'm pretty sure it's complete. If you want some help do this:
- Launch the VM with output redirected to a file
- Wait until it locks up
- Attach gdb to the running process, e.g,: gdb --pid: <pid of vm>
- Have it print all the call stacks, e.g.: p (int)printAllStacks
- Look at the result output file.
Thanks for the gdb tip. I can look at the processes via my Seaside server as well since it is still responding. Anyway the debugging output is below. The TdTimer processes are not making progress although, in one case, the sleep should only be for 60 seconds. The VNC server accepts connections but isn't responding to user input including alt-. (although the VNC cursor tracks and there is sometimes UI activity if, for example, the Transcript window is open). If I enter (Delay forSeconds: 5) wait in a web-browser based workspace it will hang forever although, as I mentioned, I am able to interact with the image in other ways through this workspace.
It seems like the list below isn't complete since I have a web server process blocked waiting for connections...but anyway the image exhibits this behavior with and without your Delay patch applied.
David
Process 2064888972 >idleProcess 2064858556 [] in >startUp 2064858648 [] in BlockContext>newProcess
Process 2064885004 >finalizationProcess 2064884820 [] in >restartFinalizationProcess 2064884912 [] in BlockContext>newProcess
Process 2085972252 Semaphore>critical: 2085972068 Delay>scheduleEvent 2085971932 Delay>schedule 2085971840 Delay>wait 2085971748 WorldState>interCyclePause: 2085971656 WorldState>doOneCycleFor: 2085971564 PasteUpMorph>doOneCycle 2054607980 [] in >spawnNewProcess 2054608164 [] in BlockContext>newProcess Process 2085972988 Semaphore>critical: 2085972804 Delay>scheduleEvent 2085972712 Delay>schedule 2085972620 Delay>wait 2085972436 [] in EventSensor>eventTickler 2085972344 BlockContext>on:do: 2064857692 EventSensor>eventTickler 2064857416 [] in EventSensor>installEventTickler 2064857600 [] in BlockContext>newProcess Process 2085973768 Semaphore>critical: 2085973584 Delay>scheduleEvent 2085973448 Delay>schedule 2085973356 Delay>wait 2085973264 [] in ApplicationService>sleepFor: 2085973172 >terminationOkDuring: 2085973080 ApplicationService>sleepFor: 2065158156 TdTimer>runWhile: 2065157788 [] in ApplicationService>start 2065157972 BlockContext>ensure: 2064894444 [] in ApplicationService>start 2065157696 BlockContext>on:do: 2065157512 BlockContext>valueWithBindingsContext: 2065157420 BlockContext>valueWithBindings: 2064894536 [] in BlockContext>newProcessWithBindings: 2064894628 [] in BlockContext>newProcess Process 2086064536 Semaphore>critical: 2086064308 Delay>scheduleEvent 2086064184 Delay>schedule 2086064092 Delay>wait 2086063816 [] in Semaphore>waitTimeoutMSecs: 2086064000 [] in BlockContext>newProcess Process 2086089116 Semaphore>critical: 2086088932 Delay>scheduleEvent 2086088796 Delay>schedule 2086088612 Delay>wait 2086088704 [] in ApplicationService>sleepFor: 2086088520 >terminationOkDuring: 2086088428 ApplicationService>sleepFor: 2064888880 TdTimer>runWhile: 2064888512 [] in ApplicationService>start 2064888696 BlockContext>ensure: 2064886188 [] in ApplicationService>start 2064888420 BlockContext>on:do: 2064888236 BlockContext>valueWithBindingsContext: 2064888144 BlockContext>valueWithBindings: 2064886280 [] in BlockContext>newProcessWithBindings: 2064886372 [] in BlockContext>newProcess Process 2092063728 Semaphore>critical: 2092063544 Delay>scheduleEvent 2092063408 Delay>schedule 2092063200 Delay>wait 2092063316 [] in ApplicationService>sleepFor: 2092063092 >terminationOkDuring: 2092063000 ApplicationService>sleepFor: 2065160244 TdTimer>runWhile: 2065157144 [] in ApplicationService>start 2065157328 BlockContext>ensure: 2064893660 [] in ApplicationService>start 2065157052 BlockContext>on:do: 2065156868 BlockContext>valueWithBindingsContext: 2065156776 BlockContext>valueWithBindings: 2064893752 [] in BlockContext>newProcessWithBindings: 2064893844 [] in BlockContext>newProcess Process 2099204904 Semaphore>critical: 2099204676 Delay>scheduleEvent 2099204552 Delay>schedule 2099204460 Delay>wait 2099187636 [] in Semaphore>waitTimeoutMSecs: 2099187820 [] in BlockContext>newProcess
Process 2099550248 >handleTimerEvent 2059283788 [] in >runTimerEventLoop 2059283368 BlockContext>on:do: 2059283140 >runTimerEventLoop 2059283572 [] in >startTimerEventLoop 2059283664 [] in BlockContext>newProcess Process 2064857200 InputSensor>userInterruptWatcher 2064857016 [] in InputSensor>installInterruptWatcher 2064857108 [] in BlockContext>newProcess
Process 2064858136 SystemDictionary>lowSpaceWatcher 2064858228 [] in SystemDictionary>installLowSpaceWatcher 2064858320 [] in BlockContext>newProcess
Process 2086063724 Semaphore>waitTimeoutMSecs: 2086063632 Socket>waitForConnectionFor:ifTimedOut: 2086063448 Socket>waitForConnectionFor: 2086063172 [] in Socket>waitForAcceptFor: 2086063356 BlockContext>on:do: 2086063080 Socket>waitForAcceptFor: 2086062896 [] in RFBServer>runLoop 2086062804 BlockContext>on:do: 2064890492 RFBServer>runLoop 2064890612 [] in RFB
Ah yes, of course. You're missing another batch of fixes that we have long applied to our servers. In this case it's the handling of Semaphore>>critical: (which is broken in all Squeak versions). Give the attached changes a try and if it still don't work I'm sure there are more fixes that we've applied in the meantime ;-)
Cheers, - Andreas
David Shaffer wrote:
Andreas Raab wrote:
I'm pretty sure it's complete. If you want some help do this:
- Launch the VM with output redirected to a file
- Wait until it locks up
- Attach gdb to the running process, e.g,: gdb --pid: <pid of vm>
- Have it print all the call stacks, e.g.: p (int)printAllStacks
- Look at the result output file.
Thanks for the gdb tip. I can look at the processes via my Seaside server as well since it is still responding. Anyway the debugging output is below. The TdTimer processes are not making progress although, in one case, the sleep should only be for 60 seconds. The VNC server accepts connections but isn't responding to user input including alt-. (although the VNC cursor tracks and there is sometimes UI activity if, for example, the Transcript window is open). If I enter (Delay forSeconds: 5) wait in a web-browser based workspace it will hang forever although, as I mentioned, I am able to interact with the image in other ways through this workspace.
It seems like the list below isn't complete since I have a web server process blocked waiting for connections...but anyway the image exhibits this behavior with and without your Delay patch applied.
David
Process 2064888972 >idleProcess 2064858556 [] in >startUp 2064858648 [] in BlockContext>newProcess
Process 2064885004 >finalizationProcess 2064884820 [] in >restartFinalizationProcess 2064884912 [] in BlockContext>newProcess
Process 2085972252 Semaphore>critical: 2085972068 Delay>scheduleEvent 2085971932 Delay>schedule 2085971840 Delay>wait 2085971748 WorldState>interCyclePause: 2085971656 WorldState>doOneCycleFor: 2085971564 PasteUpMorph>doOneCycle 2054607980 [] in >spawnNewProcess 2054608164 [] in BlockContext>newProcess Process 2085972988 Semaphore>critical: 2085972804 Delay>scheduleEvent 2085972712 Delay>schedule 2085972620 Delay>wait 2085972436 [] in EventSensor>eventTickler 2085972344 BlockContext>on:do: 2064857692 EventSensor>eventTickler 2064857416 [] in EventSensor>installEventTickler 2064857600 [] in BlockContext>newProcess Process 2085973768 Semaphore>critical: 2085973584 Delay>scheduleEvent 2085973448 Delay>schedule 2085973356 Delay>wait 2085973264 [] in ApplicationService>sleepFor: 2085973172 >terminationOkDuring: 2085973080 ApplicationService>sleepFor: 2065158156 TdTimer>runWhile: 2065157788 [] in ApplicationService>start 2065157972 BlockContext>ensure: 2064894444 [] in ApplicationService>start 2065157696 BlockContext>on:do: 2065157512 BlockContext>valueWithBindingsContext: 2065157420 BlockContext>valueWithBindings: 2064894536 [] in BlockContext>newProcessWithBindings: 2064894628 [] in BlockContext>newProcess Process 2086064536 Semaphore>critical: 2086064308 Delay>scheduleEvent 2086064184 Delay>schedule 2086064092 Delay>wait 2086063816 [] in Semaphore>waitTimeoutMSecs: 2086064000 [] in BlockContext>newProcess Process 2086089116 Semaphore>critical: 2086088932 Delay>scheduleEvent 2086088796 Delay>schedule 2086088612 Delay>wait 2086088704 [] in ApplicationService>sleepFor: 2086088520 >terminationOkDuring: 2086088428 ApplicationService>sleepFor: 2064888880 TdTimer>runWhile: 2064888512 [] in ApplicationService>start 2064888696 BlockContext>ensure: 2064886188 [] in ApplicationService>start 2064888420 BlockContext>on:do: 2064888236 BlockContext>valueWithBindingsContext: 2064888144 BlockContext>valueWithBindings: 2064886280 [] in BlockContext>newProcessWithBindings: 2064886372 [] in BlockContext>newProcess Process 2092063728 Semaphore>critical: 2092063544 Delay>scheduleEvent 2092063408 Delay>schedule 2092063200 Delay>wait 2092063316 [] in ApplicationService>sleepFor: 2092063092 >terminationOkDuring: 2092063000 ApplicationService>sleepFor: 2065160244 TdTimer>runWhile: 2065157144 [] in ApplicationService>start 2065157328 BlockContext>ensure: 2064893660 [] in ApplicationService>start 2065157052 BlockContext>on:do: 2065156868 BlockContext>valueWithBindingsContext: 2065156776 BlockContext>valueWithBindings: 2064893752 [] in BlockContext>newProcessWithBindings: 2064893844 [] in BlockContext>newProcess Process 2099204904 Semaphore>critical: 2099204676 Delay>scheduleEvent 2099204552 Delay>schedule 2099204460 Delay>wait 2099187636 [] in Semaphore>waitTimeoutMSecs: 2099187820 [] in BlockContext>newProcess
Process 2099550248 >handleTimerEvent 2059283788 [] in >runTimerEventLoop 2059283368 BlockContext>on:do: 2059283140 >runTimerEventLoop 2059283572 [] in >startTimerEventLoop 2059283664 [] in BlockContext>newProcess Process 2064857200 InputSensor>userInterruptWatcher 2064857016 [] in InputSensor>installInterruptWatcher 2064857108 [] in BlockContext>newProcess
Process 2064858136 SystemDictionary>lowSpaceWatcher 2064858228 [] in SystemDictionary>installLowSpaceWatcher 2064858320 [] in BlockContext>newProcess
Process 2086063724 Semaphore>waitTimeoutMSecs: 2086063632 Socket>waitForConnectionFor:ifTimedOut: 2086063448 Socket>waitForConnectionFor: 2086063172 [] in Socket>waitForAcceptFor: 2086063356 BlockContext>on:do: 2086063080 Socket>waitForAcceptFor: 2086062896 [] in RFBServer>runLoop 2086062804 BlockContext>on:do: 2064890492 RFBServer>runLoop 2064890612 [] in RFB
Mmm, I couldn't help but notice this is different code than the tweak code we have in sophie, even adjusting for the difference in tweak logic versus squeak logic. So do you have tweak updates for Semaphore too?
Oddly these are all timestamped today, are these new? Or have been in use for months?
On Jul 27, 2007, at 6:27 PM, Andreas Raab wrote:
<SemaphoreCritical-ar.1.cs>
-- ======================================================================== === John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== ===
John M McIntosh wrote:
Mmm, I couldn't help but notice this is different code than the tweak code we have in sophie, even adjusting for the difference in tweak logic versus squeak logic. So do you have tweak updates for Semaphore too?
Yes, see attachment.
Oddly these are all timestamped today, are these new? Or have been in use for months?
They have new timestamps only because I had to twiddle the changes to make them work in a non-Tweak environment.
Cheers, - Andreas
We have seen this exact problem as well, although not often. An interesteing observation was that you could bring the image back alive through the Seaside screenshot application...
Nice to see that these kind of production issues fixed. Andreas, if you have more patches laying around, let us know ;)
I think it would make sense to have a wiki page to keep track of important fixes and their associated Mantis reports, as well as instructions for debugging (like gdb, VM instrumentation).
Cheers, Adrian
On Jul 28, 2007, at 03:27 , Andreas Raab wrote:
Ah yes, of course. You're missing another batch of fixes that we have long applied to our servers. In this case it's the handling of Semaphore>>critical: (which is broken in all Squeak versions). Give the attached changes a try and if it still don't work I'm sure there are more fixes that we've applied in the meantime ;-)
Cheers,
- Andreas
David Shaffer wrote:
Andreas Raab wrote:
I'm pretty sure it's complete. If you want some help do this:
- Launch the VM with output redirected to a file
- Wait until it locks up
- Attach gdb to the running process, e.g,: gdb --pid: <pid of vm>
- Have it print all the call stacks, e.g.: p (int)printAllStacks
- Look at the result output file.
Thanks for the gdb tip. I can look at the processes via my Seaside server as well since it is still responding. Anyway the debugging output is below. The TdTimer processes are not making progress although, in one case, the sleep should only be for 60 seconds. The VNC server accepts connections but isn't responding to user input including alt-. (although the VNC cursor tracks and there is sometimes UI activity if, for example, the Transcript window is open). If I enter (Delay forSeconds: 5) wait in a web- browser based workspace it will hang forever although, as I mentioned, I am able to interact with the image in other ways through this workspace. It seems like the list below isn't complete since I have a web server process blocked waiting for connections...but anyway the image exhibits this behavior with and without your Delay patch applied. David Process 2064888972 >idleProcess 2064858556 [] in >startUp 2064858648 [] in BlockContext>newProcess Process 2064885004 >finalizationProcess 2064884820 [] in >restartFinalizationProcess 2064884912 [] in BlockContext>newProcess Process 2085972252 Semaphore>critical: 2085972068 Delay>scheduleEvent 2085971932 Delay>schedule 2085971840 Delay>wait 2085971748 WorldState>interCyclePause: 2085971656 WorldState>doOneCycleFor: 2085971564 PasteUpMorph>doOneCycle 2054607980 [] in >spawnNewProcess 2054608164 [] in BlockContext>newProcess Process 2085972988 Semaphore>critical: 2085972804 Delay>scheduleEvent 2085972712 Delay>schedule 2085972620 Delay>wait 2085972436 [] in EventSensor>eventTickler 2085972344 BlockContext>on:do: 2064857692 EventSensor>eventTickler 2064857416 [] in EventSensor>installEventTickler 2064857600 [] in BlockContext>newProcess Process 2085973768 Semaphore>critical: 2085973584 Delay>scheduleEvent 2085973448 Delay>schedule 2085973356 Delay>wait 2085973264 [] in ApplicationService>sleepFor: 2085973172 >terminationOkDuring: 2085973080 ApplicationService>sleepFor: 2065158156 TdTimer>runWhile: 2065157788 [] in ApplicationService>start 2065157972 BlockContext>ensure: 2064894444 [] in ApplicationService>start 2065157696 BlockContext>on:do: 2065157512 BlockContext>valueWithBindingsContext: 2065157420 BlockContext>valueWithBindings: 2064894536 [] in BlockContext>newProcessWithBindings: 2064894628 [] in BlockContext>newProcess Process 2086064536 Semaphore>critical: 2086064308 Delay>scheduleEvent 2086064184 Delay>schedule 2086064092 Delay>wait 2086063816 [] in Semaphore>waitTimeoutMSecs: 2086064000 [] in BlockContext>newProcess Process 2086089116 Semaphore>critical: 2086088932 Delay>scheduleEvent 2086088796 Delay>schedule 2086088612 Delay>wait 2086088704 [] in ApplicationService>sleepFor: 2086088520 >terminationOkDuring: 2086088428 ApplicationService>sleepFor: 2064888880 TdTimer>runWhile: 2064888512 [] in ApplicationService>start 2064888696 BlockContext>ensure: 2064886188 [] in ApplicationService>start 2064888420 BlockContext>on:do: 2064888236 BlockContext>valueWithBindingsContext: 2064888144 BlockContext>valueWithBindings: 2064886280 [] in BlockContext>newProcessWithBindings: 2064886372 [] in BlockContext>newProcess Process 2092063728 Semaphore>critical: 2092063544 Delay>scheduleEvent 2092063408 Delay>schedule 2092063200 Delay>wait 2092063316 [] in ApplicationService>sleepFor: 2092063092 >terminationOkDuring: 2092063000 ApplicationService>sleepFor: 2065160244 TdTimer>runWhile: 2065157144 [] in ApplicationService>start 2065157328 BlockContext>ensure: 2064893660 [] in ApplicationService>start 2065157052 BlockContext>on:do: 2065156868 BlockContext>valueWithBindingsContext: 2065156776 BlockContext>valueWithBindings: 2064893752 [] in BlockContext>newProcessWithBindings: 2064893844 [] in BlockContext>newProcess Process 2099204904 Semaphore>critical: 2099204676 Delay>scheduleEvent 2099204552 Delay>schedule 2099204460 Delay>wait 2099187636 [] in Semaphore>waitTimeoutMSecs: 2099187820 [] in BlockContext>newProcess Process 2099550248 >handleTimerEvent 2059283788 [] in >runTimerEventLoop 2059283368 BlockContext>on:do: 2059283140 >runTimerEventLoop 2059283572 [] in >startTimerEventLoop 2059283664 [] in BlockContext>newProcess Process 2064857200 InputSensor>userInterruptWatcher 2064857016 [] in InputSensor>installInterruptWatcher 2064857108 [] in BlockContext>newProcess Process 2064858136 SystemDictionary>lowSpaceWatcher 2064858228 [] in SystemDictionary>installLowSpaceWatcher 2064858320 [] in BlockContext>newProcess Process 2086063724 Semaphore>waitTimeoutMSecs: 2086063632 Socket>waitForConnectionFor:ifTimedOut: 2086063448 Socket>waitForConnectionFor: 2086063172 [] in Socket>waitForAcceptFor: 2086063356 BlockContext>on:do: 2086063080 Socket>waitForAcceptFor: 2086062896 [] in RFBServer>runLoop 2086062804 BlockContext>on:do: 2064890492 RFBServer>runLoop 2064890612 [] in RFB
<SemaphoreCritical-ar.1.cs>
Adrian Lienhard wrote:
Nice to see that these kind of production issues fixed. Andreas, if you have more patches laying around, let us know ;)
You may want to check out the Croquet repositories[1][2]. We've posted quite a few changes there that helped with general robustness issues. The last round [3] had various interesting fixes some that helped with reliability in general (like the handling of out-of-memory conditions).
[1] http://hedgehog.software.umn.edu:8888/ [2] http://jabberwocky.croquetproject.org:8889/ [3] https://lists.duke.edu/sympa/arc/croquet-dev/2007-05/msg00035.html
Cheers, - Andreas
Andreas Raab a écrit :
Adrian Lienhard wrote:
Nice to see that these kind of production issues fixed. Andreas, if you have more patches laying around, let us know ;)
You may want to check out the Croquet repositories[1][2]. We've posted quite a few changes there that helped with general robustness issues. The last round [3] had various interesting fixes some that helped with reliability in general (like the handling of out-of-memory conditions).
[1] http://hedgehog.software.umn.edu:8888/ [2] http://jabberwocky.croquetproject.org:8889/ [3] https://lists.duke.edu/sympa/arc/croquet-dev/2007-05/msg00035.html
Great ! Maybe we could reuse some of your modifications in the Squeak packages.
-- Serge Stinckwich http://doesnotunderstand.free.fr/
Andreas Raab wrote:
Ah yes, of course. You're missing another batch of fixes that we have long applied to our servers. In this case it's the handling of Semaphore>>critical: (which is broken in all Squeak versions). Give the attached changes a try and if it still don't work I'm sure there are more fixes that we've applied in the meantime ;-)
Cheers,
- Andreas
Well it's been about 12 hours since I've patched and everything seems to be chugging along. I'll stress it a little this afternoon to be sure. Thanks for the help!
David
I created a Mantis report for this bug here: http://bugs.squeak.org/ view.php?id=6588
I suggest to close report "0004882: VM lockup" (http:// bugs.squeak.org/view.php?id=4882). The problems described in it seem to be either this bug (#6588), or the other freezing bug http:// bugs.squeak.org/view.php?id=6581.
Cheers, Adrian
On Jul 28, 2007, at 17:04 , David Shaffer wrote:
Andreas Raab wrote:
Ah yes, of course. You're missing another batch of fixes that we have long applied to our servers. In this case it's the handling of Semaphore>>critical: (which is broken in all Squeak versions). Give the attached changes a try and if it still don't work I'm sure there are more fixes that we've applied in the meantime ;-)
Cheers,
- Andreas
Well it's been about 12 hours since I've patched and everything seems to be chugging along. I'll stress it a little this afternoon to be sure. Thanks for the help!
David
Yes. I imagine the pain you got to chase it.... This kind of bug huge pain.
Stef
On 24 juil. 07, at 10:17, Andreas Raab wrote:
You can apply the fix yourself; it works in all Squeak versions that I'm aware of (and if not, you'll find out really quickly ;-) This is just the kind of thing for which I wanted to see some sort of "standard package" for so that people across various Squeak versions can benefit from it.
Cheers,
- Andreas
Janko Mivšek wrote:
Hi Andreas, That's very important patch and very interesting to me too, because I'm just deciding to put some of my public Aida/Web websites from VW to Squeak and I was afraid of such issues as one you just solved. Is there any chance that this patch goes to 3.10? Best regards Janko Andreas Raab wrote:
Hi -
We recently had some "fun" chasing server lockups (with truly awful uptimes of about a day or less before things went downhill) and were finally able to track a huge portion of it down to problems with Delay. The effect we were seeing on our servers was that the system would randomly lock up and either go down to 0% CPU or 100% CPU.
After poking it with a USR1 signal (which, in our VMs is hooked up such that it prints all the call stacks in the image; it's a life-safer if you need to debug these issues) we usually found that all processes were waiting on Delay's AccessProtect (0%) or alternatively found that a particular process (the event tickler) would sit in a tight loop swallowing repeated errors complaining that "this delay is already scheduled".
After hours and hours of testing, debugging, and a little stroke of luck we finally found out that all of these issues were caused by the fact that Delay's internal structures are updated by the calling process (insertion into and removal from SuspendedDelays) which renders the process susceptible to being terminated in the midst of updating these structures.
If you look at the code, this is obviously an issue because if (for example) the calling process gets terminated while it's resorting SuspendedDelays the result is unpredictable. This is in particular an issue because the calling process is often running at a relatively low priority so interruption by other, high- priority processes is a common case. And if any of these higher priority processes kills the one that just happens to execute SortedCollection>>remove: anything can happen - from leaving a later delay in front of an earlier one (one of the cases we had indicated that this was just what had happened) to errors when doing the next insert/remove ("trying to evaluate a block that is already evaluated") to many more weirdnesses. Unfortunately, it is basically impossible to recreate this problem under any kind of controlled circumstances, mostly because you need a source of events that is truly independent from your time source.
As a consequence of our findings we rewrote Delay to deal with these issues properly and, having deployed the changes about ten days ago on our servers, all of these sources of problems simply vanished. We haven't had a single server problem which we couldn't attribute to our own stupidity (such as running out of disk space ;-)
The changes will in particular be helpful to you if you:
- run network servers
- fork processes to handle network requests
- terminate these processes explicitly (on error conditions for
example)
- use Semaphore>>waitTimeoutMsecs: (all socket functions use this)
If you have seen random, unexplained lockups of your server (0% CPU load while being locked up is a dead giveaway[*]) I'd recommend using the attached changes (which work best on top of a VM with David Lewis' 64bit fixes applied) and see if that helps. For us, they made the difference between running the server in Squeak and rewriting it in Java.
I've also filed this as http://bugs.squeak.org/view.php?id=6576
[*] The 0% CPU lockups have sometimes been attributed to issues with Linux wait functions. After having seen the havoc that Delay wrecks on the system I don't buy these explanations any longer. A much simpler (and more likely) explanation is that Delay went wild.
Cheers,
- Andreas
'From Croquet1.0beta of 11 April 2006 [latest update: #1] on 23 July 2007 at 11:53:23 pm'! "Change Set: SafeDelay Date: 23 July 2007 Author: Andreas Raab
This change set fixes a set of severe problems with concurrent use of Delay. Previously, many of the delay-internal structures were modified by the calling process which made it susceptible to being terminated in the middle of manipulating these structures and leave Delay (and consequently the entire system) in an inconsistent state.
This change set fixes this problem by moving *all* manipulation of Delay's internal structures out of the calling process. As a side-effect it also removes the requirement of Delays being limited to SmallInteger range; the new code has no limitation on the duration of a delay.
No tests are provided since outside of true asynchronous environments (networks) it is basically impossible to recreate the situation reliably."!
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 21:24'! activate "Private!! Make the receiver the Delay to be awoken when the next timer interrupt occurs. This method should only be called from a block protected by the AccessProtect semaphore." TimerEventLoop ifNotNil:[^nil]. ActiveDelay := self. ActiveDelayStartTime := Time millisecondClockValue. ActiveDelayStartTime > resumptionTime ifTrue:[ ActiveDelay signalWaitingProcess. SuspendedDelays isEmpty ifTrue:[ ActiveDelay := nil. ActiveDelayStartTime := nil. ] ifFalse:[SuspendedDelays removeFirst activate]. ] ifFalse:[ TimingSemaphore initSignals. Delay primSignal: TimingSemaphore atMilliseconds: resumptionTime. ].! !
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 21:55'! schedule "Private!! Schedule this Delay, but return immediately rather than waiting. The receiver's semaphore will be signalled when its delay duration has elapsed."
beingWaitedOn ifTrue: [self error: 'This Delay has already
been scheduled.'].
TimerEventLoop ifNotNil:[^self scheduleEvent]. AccessProtect critical: [ beingWaitedOn := true. resumptionTime := Time millisecondClockValue +
delayDuration. ActiveDelay == nil ifTrue: [self activate] ifFalse: [ resumptionTime < ActiveDelay resumptionTime ifTrue: [ SuspendedDelays add: ActiveDelay. self activate] ifFalse: [SuspendedDelays add: self]]]. ! !
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 22:33'! scheduleEvent "Schedule this delay" resumptionTime := Time millisecondClockValue + delayDuration. AccessProtect critical:[ ScheduledDelay := self. TimingSemaphore signal. ].! !
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 21:55'! unschedule "Unschedule this Delay. Do nothing if it wasn't scheduled."
| done | TimerEventLoop ifNotNil:[^self unscheduleEvent]. AccessProtect critical: [ done := false. [done] whileFalse: [SuspendedDelays remove: self ifAbsent: [done := true]]. ActiveDelay == self ifTrue: [ SuspendedDelays isEmpty ifTrue: [ ActiveDelay := nil. ActiveDelayStartTime := nil] ifFalse: [ SuspendedDelays removeFirst activate]]].
! !
!Delay methodsFor: 'private' stamp: 'ar 7/10/2007 21:56'! unscheduleEvent AccessProtect critical:[ FinishedDelay := self. TimingSemaphore signal. ].! !
!Delay methodsFor: 'public' stamp: 'ar 7/10/2007 21:49'! beingWaitedOn "Answer whether this delay is currently scheduled, e.g., being waited on" ^beingWaitedOn! !
!Delay methodsFor: 'public' stamp: 'ar 7/10/2007 21:49'! beingWaitedOn: aBool "Indicate whether this delay is currently scheduled, e.g., being waited on" beingWaitedOn := aBool! !
!Delay methodsFor: 'public' stamp: 'ar 7/10/2007 20:56'! delayDuration ^delayDuration! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/11/2007 10:35'! handleTimerEvent "Handle a timer event; which can be either: - a schedule request (ScheduledDelay notNil) - an unschedule request (FinishedDelay notNil) - a timer signal (not explicitly specified) We check for timer expiry every time we get a signal." | nextTick | "Wait until there is work to do." TimingSemaphore wait.
"Process any schedule requests" ScheduledDelay ifNotNil:[ "Schedule the given delay" self scheduleDelay: ScheduledDelay. ScheduledDelay := nil. ]. "Process any unschedule requests" FinishedDelay ifNotNil:[ self unscheduleDelay: FinishedDelay. FinishedDelay := nil. ]. "Check for clock wrap-around." nextTick := Time millisecondClockValue. nextTick < ActiveDelayStartTime ifTrue: [ "clock wrapped" self saveResumptionTimes. self restoreResumptionTimes. ]. ActiveDelayStartTime := nextTick. "Signal any expired delays" [ActiveDelay notNil and:[ Time millisecondClockValue >= ActiveDelay
resumptionTime]] whileTrue:[ ActiveDelay signalWaitingProcess. SuspendedDelays isEmpty ifTrue: [ActiveDelay := nil] ifFalse:[ActiveDelay := SuspendedDelays removeFirst]. ].
"And signal when the next request is due. We sleep at most
1sec here as a soft busy-loop so that we don't accidentally miss signals." nextTick := Time millisecondClockValue + 1000. ActiveDelay ifNotNil:[nextTick := nextTick min: ActiveDelay resumptionTime]. nextTick := nextTick min: SmallInteger maxVal.
"Since we have processed all outstanding requests, reset the
timing semaphore so that only new work will wake us up again. Do this RIGHT BEFORE setting the next wakeup call from the VM because it is only signaled once so we mustn't miss it." TimingSemaphore initSignals. Delay primSignal: TimingSemaphore atMilliseconds: nextTick. ! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/11/2007 09:04'! runTimerEventLoop "Run the timer event loop." [ [RunTimerEventLoop] whileTrue: [self handleTimerEvent] ] on: Error do:[:ex| "Clear out the process so it does't get killed" TimerEventLoop := nil. "Launch the old-style interrupt watcher" self startTimerInterruptWatcher. "And pass the exception on" ex pass. ].! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 22:32'! scheduleDelay: aDelay "Private. Schedule this Delay." aDelay beingWaitedOn: true. ActiveDelay ifNil:[ ActiveDelay := aDelay ] ifNotNil:[ aDelay resumptionTime < ActiveDelay resumptionTime ifTrue:[ SuspendedDelays add: ActiveDelay. ActiveDelay := aDelay. ] ifFalse: [SuspendedDelays add: aDelay]. ]. ! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/11/2007 10:18'! startTimerEventLoop "Start the timer event loop" "Delay startTimerEventLoop" self stopTimerEventLoop. self stopTimerInterruptWatcher. AccessProtect := Semaphore forMutualExclusion. ActiveDelayStartTime := Time millisecondClockValue. SuspendedDelays := Heap withAll: (SuspendedDelays ifNil:[#()]) sortBlock: [:d1 :d2 | d1 resumptionTime <= d2 resumptionTime]. TimingSemaphore := Semaphore new. RunTimerEventLoop := true. TimerEventLoop := [self runTimerEventLoop] newProcess. TimerEventLoop priority: Processor timingPriority. TimerEventLoop resume. TimingSemaphore signal. "get going" ! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 22:32'! startTimerInterruptWatcher "Reset the class variables that keep track of active Delays and re-start the timer interrupt watcher process. Any currently scheduled delays are forgotten." "Delay startTimerInterruptWatcher" | p | self stopTimerEventLoop. self stopTimerInterruptWatcher. TimingSemaphore := Semaphore new. AccessProtect := Semaphore forMutualExclusion. SuspendedDelays := SortedCollection sortBlock: [:d1 :d2 | d1 resumptionTime <= d2 resumptionTime]. ActiveDelay := nil. p := [self timerInterruptWatcher] newProcess. p priority: Processor timingPriority. p resume. ! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 21:26'! stopTimerEventLoop "Stop the timer event loop" RunTimerEventLoop := false. TimingSemaphore signal. TimerEventLoop := nil.! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 21:32'! stopTimerInterruptWatcher "Reset the class variables that keep track of active Delays and re-start the timer interrupt watcher process. Any currently scheduled delays are forgotten." "Delay startTimerInterruptWatcher" self primSignal: nil atMilliseconds: 0. TimingSemaphore ifNotNil:[TimingSemaphore terminateProcess].! !
!Delay class methodsFor: 'timer process' stamp: 'ar 7/10/2007 22:33'! unscheduleDelay: aDelay "Private. Unschedule this Delay." ActiveDelay == aDelay ifTrue: [ SuspendedDelays isEmpty ifTrue:[ ActiveDelay := nil. ] ifFalse: [ ActiveDelay := SuspendedDelays removeFirst. ] ] ifFalse:[ SuspendedDelays remove: aDelay ifAbsent: []. ]. aDelay beingWaitedOn: false.! !
!Delay class methodsFor: 'class initialization' stamp: 'ar 7/11/2007 18:16'! initialize "Delay initialize" self startTimerEventLoop.! !
Delay initialize!
Hi Andreas,
This is Terrific!! Thank you for doing it!!
Ron
-----Original Message----- From: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev- bounces@lists.squeakfoundation.org] On Behalf Of Andreas Raab Sent: Tuesday, July 24, 2007 3:46 AM To: The general-purpose Squeak developers list Subject: Delay and Server reliability
Hi -
We recently had some "fun" chasing server lockups (with truly awful uptimes of about a day or less before things went downhill) and were finally able to track a huge portion of it down to problems with Delay. The effect we were seeing on our servers was that the system would randomly lock up and either go down to 0% CPU or 100% CPU.
After poking it with a USR1 signal (which, in our VMs is hooked up such that it prints all the call stacks in the image; it's a life-safer if you need to debug these issues) we usually found that all processes were waiting on Delay's AccessProtect (0%) or alternatively found that a particular process (the event tickler) would sit in a tight loop swallowing repeated errors complaining that "this delay is already scheduled".
After hours and hours of testing, debugging, and a little stroke of luck we finally found out that all of these issues were caused by the fact that Delay's internal structures are updated by the calling process (insertion into and removal from SuspendedDelays) which renders the process susceptible to being terminated in the midst of updating these structures.
If you look at the code, this is obviously an issue because if (for example) the calling process gets terminated while it's resorting SuspendedDelays the result is unpredictable. This is in particular an issue because the calling process is often running at a relatively low priority so interruption by other, high-priority processes is a common case. And if any of these higher priority processes kills the one that just happens to execute SortedCollection>>remove: anything can happen - from leaving a later delay in front of an earlier one (one of the cases we had indicated that this was just what had happened) to errors when doing the next insert/remove ("trying to evaluate a block that is already evaluated") to many more weirdnesses. Unfortunately, it is basically impossible to recreate this problem under any kind of controlled circumstances, mostly because you need a source of events that is truly independent from your time source.
As a consequence of our findings we rewrote Delay to deal with these issues properly and, having deployed the changes about ten days ago on our servers, all of these sources of problems simply vanished. We haven't had a single server problem which we couldn't attribute to our own stupidity (such as running out of disk space ;-)
The changes will in particular be helpful to you if you:
- run network servers
- fork processes to handle network requests
- terminate these processes explicitly (on error conditions for example)
- use Semaphore>>waitTimeoutMsecs: (all socket functions use this)
If you have seen random, unexplained lockups of your server (0% CPU load while being locked up is a dead giveaway[*]) I'd recommend using the attached changes (which work best on top of a VM with David Lewis' 64bit fixes applied) and see if that helps. For us, they made the difference between running the server in Squeak and rewriting it in Java.
I've also filed this as http://bugs.squeak.org/view.php?id=6576
[*] The 0% CPU lockups have sometimes been attributed to issues with Linux wait functions. After having seen the havoc that Delay wrecks on the system I don't buy these explanations any longer. A much simpler (and more likely) explanation is that Delay went wild.
Cheers,
- Andreas
Could this be related to why monticello goes to sleep and never wakes up until you wiggle the mouse?
On Jul 24, 2007, at 9:34 AM, Bert Freudenberg wrote:
On Jul 24, 2007, at 18:30 , Steven W Riggins wrote:
Could this be related to why monticello goes to sleep and never wakes up until you wiggle the mouse?
I hope so (although you mean squeaksource).
Yeah I suppose, I only ever see MC, not the back end, except when I am wiggling the mouse!
If this fixes the squeaksource bug, I will personally buy dinner for whomever was involved with fixing this, unless it's like 100 people or something.
Dinner is much cheaper than the nearly 4 LCD screens I have smashed when trying to check something in 5 mins before I had to leave the house. :)
+1 (not nice to have to rebuild...)
-----Original Message----- From: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] On Behalf Of Bert Freudenberg Sent: 24 July 2007 5:34 pm To: The general-purpose Squeak developers list Subject: Re: Delay and Server reliability
On Jul 24, 2007, at 18:30 , Steven W Riggins wrote:
Could this be related to why monticello goes to sleep and never wakes up until you wiggle the mouse?
I hope so (although you mean squeaksource).
- Bert -
squeak-dev@lists.squeakfoundation.org