[squeak-dev] Re: [Pharo-project] #ensure: issues

Thu Mar 4 05:29:46 UTC 2010

On 3/3/2010 6:32 PM, Eliot Miranda wrote:
> No, it is /broken/.  What exception handlers are in effect when an
> unwind is run from another process?  Its not just tat process identity
> isn't being preserved (which I would argue is important but not
> essential) its that unwinds are being run in an entirely arbitrary
> context which has nothing to do with their running normally.  e.g.
>
> givethAndTakethAway: aCollection
>       [[aCollection do: [:k| mydict at: k put: self thing].
>         self doStuff] ensure: [aCollection do: [:k| mydict removeKey: k]]
>          on: KeyNotFoundError
>          do: [:ex| ex proceedWith: nil]
>
> will remove whatever keys in mydict are in aCollection when run
> normally, but when run if terminated from another process will break if
> any keys in aCollection have already been removed from mydict.

Yes, but my claim was that this a feature, not a bug, for a value of 
"feature" which means that the terminator (<- I really wanted to use 
that :-) is capable of controlling the exception context of the ensure 
block.

> Running unwinds in another process is /broken/, period.

The reason why I'm not convinced of that, period or not, repetitions or 
not, is because we've seen no evidence of it ever being a problem. If 
it's not a problem, neither in our production environments that have 
triggered all the problems and the fixes for process, semaphore, mutex, 
termination etc. handling over the last years, nor in messages or bug 
reports that I've seen on this or other lists, I'm just not buying the 
strong claim you are making.

Where is the *evidence* to support your claim? Not the theoretical 
discussion of what could possibly go wrong, but rather the actual, 
day-to-day things that people really get bitten by.

> You could do this better by saying e.g.
>
>        p terminateOn: Error do: [:ex| ex return "skip it"]
>
> and have terminateOn:do: et al install the exception handler at the
> bottom of stack.

Well, not really. In a realistic situation you'd probably want to handle 
multiple conditions, timeouts etc. and stitching the stack together for 
that is likely way beyond your average Squeaker :-)

[BTW, I don't get why you call this "better"; it would seem that if you 
call this "better" because it's using in-process termination that your 
reasoning is cyclical]

>     There are obviously issues if the unwind actions refer to process
>     local state; but then again perhaps providing said context for the
>     duration of the unwind is the right thing, along the lines of:
>
>     Process>>terminate
>
>       self isActiveProcess ifFalse:[
>         Processor activeProcess evaluate: [self unwind] onBehalfOf: self.
>       ].
>
>     This for example would guard errors during unwind as initially
>     intended while ensuring that process relative state is retrieved
>     correctly.
>
>
> Yes, but what if I do something like
>
>           [processMemo add: Processor activeProcess.
>            self doStuff]
>                  ensure: [processMemo remove: Processor activeProcess]
> ?

It would work fine, why? Did you not notice that I used 
#evaluate:onBehalfOf: in the above for precisely this reason?

> It may have facets but that doesn't mean that running unwinds in other
> than the current process isn't badly broken, and isn't relatively easy
> to fix ;)

I don't buy this, and claiming it repeatedly doesn't make it any more 
real unless you put up or shut up. I think you're vastly underestimating 
the resulting fallout (does that sound like something I've said before? ;-)

For example, here is an interesting one (and please use VW for reference 
here if you'd like; I'd actually be interested in finding a *small* 
download myself to try some of this stuff): One of the issues in 
Levente's example of terminating a process halfways through an unwind 
block is that if the unwind block is ill-behaved you can really be in 
for trouble. Let's say you have this:

[
   "just to have something to interrupt"
   [(Delay forMilliseconds: 100) wait. true] whileTrue.
] ensure:[
   "this is the actual killer"
   [(Delay forMilliseconds: 100) wait. true] whileTrue.
].

If you wish to support completing a well-behaved ensure block that has 
been started, it seems tricky to see how this process would actually be 
terminated, and how you'd find out that it didn't. In Squeak it's simple 
- the fact that the process sending #terminate is still in this code 
tells you it's not finished so one can (for example) associate a timeout 
with #terminate and catch that. Or, one can just interrupt that process 
and close the debugger. Both are quite reasonable options to deal with 
the invariant.

Under the assumption that you'd like to support the well-behaved case, 
what do you do when using in-process termination in a non-well-behaved 
case? How do you find out that a termination request didn't finish, and 
what do you do in practice to kill it?

Also, what are the priority implications of the different schemes? One 
of the more interesting issues here is that in cases like on our servers 
I could see how one of the lower priority processes times out due to 
starvation, is then terminated but this may never complete if it's run 
from the priority it's at instead of the terminator process. Does for 
example VW bump the priority of the terminated process? How does that 
interact when the terminated process isn't well-behaved and runs wild?

I think there are numerous interesting aspects that we have somewhat 
reasonable answers in Squeak today that I don't even know what the 
implications would be if you'd go for in-process termination.

Cheers,
   - Andreas