[squeak-dev] Re: [Pharo-project] #ensure: issues

Eliot Miranda eliot.miranda at gmail.com
Mon Mar 8 00:22:20 UTC 2010


On Sun, Mar 7, 2010 at 3:33 PM, Andreas Raab <andreas.raab at gmx.de> wrote:

> On 3/7/2010 3:17 PM, Igor Stasenko wrote:
>
>> My 2 cents.
>>
>> Process termination should run in the same context as process itself.
>>
>
> Just FYI, Eliot and I had a very good discussion about this on friday. We
> both agree that it is advantageous to run unwind blocks inside the
> terminating process. We also both agree that the resulting consequences and
> subtle breakages are extremely hard to predict.
>

+1 :)


> Cheers,
>  - Andreas
>
>
>  Some simple analogy:
>> imagine a black box with two buttons 'start' and 'stop' and big red
>> warrantly label, which covers the bolts and
>> where is written: warranty will void if you attempt to disassemble the
>> box.
>> You pushing start, and it is starting making noise, you can hear it
>> does something inside,
>> but you can't see, because its boxed.
>> You pushing stop - and it stopping, but you notice that it doesn't
>> stops immediately and it takes a
>> few seconds to complete stop.
>>
>> So, all control you having from outside is a start and stop buttons.
>> Nothing else.
>> Unless.. take your wrench (or reflection hammer) and break inside to
>> see what it does.
>> But once you do that, the warranty will be void!
>>
>> That's the point. If you think of Process as an object, not as a
>> complex conglomerate of different pieces (contexts/sheduler), then its
>> obvious, that you have to honor the encapsulation rules and do not
>> break inside unless it is necessary.
>>
>> Making unwinding be handled outside of process, breaking
>> encapsulation, because you leaking object references from one process
>> to another. And we know very well, what this means - this immediately
>> leads to various security issues, and/or inability to contaminate the
>> growth of garbage in image, which can't be collected, because some
>> other process got a reference to an object which were not for its
>> eyes.
>>
>> I could continue with other examples, but i think the above made my
>> point: an outside process usually having minimal knowledge of what is
>> going on inside another process. And because of that, it is false to
>> assume that running the code in scope of different process won't have
>> unwanted and unpredictable side effects. So, a best way to avoid these
>> issues is to keep thinking of it as a black-box, with which you
>> communicating using standard protocol, and at the same way be free
>> from being responsible for handling exceptional situations - all
>> exceptions should be handled inside a process where they appear.
>>
>> On 7 March 2010 22:37, Eliot Miranda<eliot.miranda at gmail.com>  wrote:
>>
>>>
>>>
>>> On Wed, Mar 3, 2010 at 9:29 PM, Andreas Raab<andreas.raab at gmx.de>
>>>  wrote:
>>>
>>>>
>>>> On 3/3/2010 6:32 PM, Eliot Miranda wrote:
>>>>
>>>>>
>>>>> No, it is /broken/.  What exception handlers are in effect when an
>>>>> unwind is run from another process?  Its not just tat process identity
>>>>> isn't being preserved (which I would argue is important but not
>>>>> essential) its that unwinds are being run in an entirely arbitrary
>>>>> context which has nothing to do with their running normally.  e.g.
>>>>>
>>>>> givethAndTakethAway: aCollection
>>>>>      [[aCollection do: [:k| mydict at: k put: self thing].
>>>>>        self doStuff] ensure: [aCollection do: [:k| mydict removeKey:
>>>>> k]]
>>>>>         on: KeyNotFoundError
>>>>>         do: [:ex| ex proceedWith: nil]
>>>>>
>>>>> will remove whatever keys in mydict are in aCollection when run
>>>>> normally, but when run if terminated from another process will break if
>>>>> any keys in aCollection have already been removed from mydict.
>>>>>
>>>>
>>>> Yes, but my claim was that this a feature, not a bug, for a value of
>>>> "feature" which means that the terminator (<- I really wanted to use
>>>> that
>>>> :-) is capable of controlling the exception context of the ensure block.
>>>>
>>>>  Running unwinds in another process is /broken/, period.
>>>>>
>>>>
>>>> The reason why I'm not convinced of that, period or not, repetitions or
>>>> not, is because we've seen no evidence of it ever being a problem. If
>>>> it's
>>>> not a problem, neither in our production environments that have
>>>> triggered
>>>> all the problems and the fixes for process, semaphore, mutex,
>>>> termination
>>>> etc. handling over the last years, nor in messages or bug reports that
>>>> I've
>>>> seen on this or other lists, I'm just not buying the strong claim you
>>>> are
>>>> making.
>>>>
>>>> Where is the *evidence* to support your claim? Not the theoretical
>>>> discussion of what could possibly go wrong, but rather the actual,
>>>> day-to-day things that people really get bitten by.
>>>>
>>>
>>> Evidence will be hard to come by.  In Squeak people will have coded
>>> around
>>> the limitation.  I don't have access to the server applications in VW
>>> which
>>> might or might not be affected by this.  But I can say that VW implements
>>> unwinds on termination both preserving process identity and exception
>>> context, and that if it wasn't considered important that effort would not
>>> have been made in the first place.
>>>
>>>>
>>>>
>>>>  You could do this better by saying e.g.
>>>>>
>>>>>       p terminateOn: Error do: [:ex| ex return "skip it"]
>>>>>
>>>>> and have terminateOn:do: et al install the exception handler at the
>>>>> bottom of stack.
>>>>>
>>>>
>>>> Well, not really. In a realistic situation you'd probably want to handle
>>>> multiple conditions, timeouts etc. and stitching the stack together for
>>>> that
>>>> is likely way beyond your average Squeaker :-)
>>>>
>>>> [BTW, I don't get why you call this "better"; it would seem that if you
>>>> call this "better" because it's using in-process termination that your
>>>> reasoning is cyclical]
>>>>
>>>
>>> Better means it's a form which would work with either approach.
>>>
>>>
>>>>>    There are obviously issues if the unwind actions refer to process
>>>>>    local state; but then again perhaps providing said context for the
>>>>>    duration of the unwind is the right thing, along the lines of:
>>>>>
>>>>>    Process>>terminate
>>>>>
>>>>>      self isActiveProcess ifFalse:[
>>>>>        Processor activeProcess evaluate: [self unwind] onBehalfOf:
>>>>> self.
>>>>>      ].
>>>>>
>>>>>    This for example would guard errors during unwind as initially
>>>>>    intended while ensuring that process relative state is retrieved
>>>>>    correctly.
>>>>>
>>>>>
>>>>> Yes, but what if I do something like
>>>>>
>>>>>          [processMemo add: Processor activeProcess.
>>>>>           self doStuff]
>>>>>                 ensure: [processMemo remove: Processor activeProcess]
>>>>> ?
>>>>>
>>>>
>>>> It would work fine, why? Did you not notice that I used
>>>> #evaluate:onBehalfOf: in the above for precisely this reason?
>>>>
>>>
>>>  OK.  But evaluate:onBehalfOf: only fixes one of the two issues.  It
>>> fixes
>>> process identity but it doesn't handle exception context.  [BTW, I wrote
>>> evaluate:onBehalfOf: only to get process identity right when simulating
>>> execution in the debugger; I didn't envisage it being used here, but I
>>> agree
>>> it would be an improvement].
>>>
>>>
>>>>> It may have facets but that doesn't mean that running unwinds in other
>>>>> than the current process isn't badly broken, and isn't relatively easy
>>>>> to fix ;)
>>>>>
>>>>
>>>> I don't buy this, and claiming it repeatedly doesn't make it any more
>>>> real
>>>> unless you put up or shut up. I think you're vastly underestimating the
>>>> resulting fallout (does that sound like something I've said before? ;-)
>>>>
>>>> For example, here is an interesting one (and please use VW for reference
>>>> here if you'd like; I'd actually be interested in finding a *small*
>>>> download
>>>> myself to try some of this stuff): One of the issues in Levente's
>>>> example of
>>>> terminating a process halfways through an unwind block is that if the
>>>> unwind
>>>> block is ill-behaved you can really be in for trouble. Let's say you
>>>> have
>>>> this:
>>>>
>>>> [
>>>>  "just to have something to interrupt"
>>>>  [(Delay forMilliseconds: 100) wait. true] whileTrue.
>>>> ] ensure:[
>>>>  "this is the actual killer"
>>>>  [(Delay forMilliseconds: 100) wait. true] whileTrue.
>>>> ].
>>>>
>>>
>>> The issue of well-behavedness is orthogonal to in-process or
>>> out-of-process
>>> termination.  We can know whether an unwind block has been started or not
>>> and on repeating an attempt to terminate, e.g. after a timeout,  one can
>>> choose whether to restart an in-progress unwind or skip it and start with
>>> the next one, or skip them altogether and abort unwinding.  But these
>>> choices are not tied to in-process or out-of-process termination except
>>> perhaps in implementation details.
>>>
>>>
>>>> If you wish to support completing a well-behaved ensure block that has
>>>> been started, it seems tricky to see how this process would actually be
>>>> terminated, and how you'd find out that it didn't. In Squeak it's simple
>>>> -
>>>> the fact that the process sending #terminate is still in this code tells
>>>> you
>>>> it's not finished so one can (for example) associate a timeout with
>>>> #terminate and catch that. Or, one can just interrupt that process and
>>>> close
>>>> the debugger. Both are quite reasonable options to deal with the
>>>> invariant.
>>>>
>>>
>>> In VW an unwind block is always executed from a marked sender method so
>>> one
>>> can find out whether an unwind is running by looking at its sender:
>>> BlockClosure methods for unwinding
>>> valueAsUnwindBlockFrom: aContextOrNil
>>> "Unwind blocks are evaluated using this wrapper.
>>> This method is marked as special.  When the
>>> system searches for unwind blocks, it skips over
>>> all contexts between this context and the context
>>> passed in as an argument.  If the argument is
>>> nil, it skips over the sender of this context.
>>> The purpose of this is that errors inside an
>>> unwind block should not evaluate that unwind
>>> block, and they should not circumvent the
>>> running of other unwind blocks lower on the
>>> stack."
>>> <exception: #unwindInAction>
>>> | shouldTerminate |
>>> "long comment elided"
>>> shouldTerminate == nil ifTrue: [shouldTerminate := false].
>>> self value.
>>> shouldTerminate
>>> ifTrue:
>>> [Processor activeProcess terminate].
>>> MarkedMethod methods for testing
>>> isMarkedForUnwindInAction
>>> "Answer true if method is marked for unwinding in action."
>>> ^markType == #unwindInAction
>>> and exception handling skips over these when looking for handlers.
>>> Context methods for handler search
>>> skipOverUnwindingBlocks
>>> "If the method is marked for unwindInAction, check the first argument
>>> of the method. If it is nil, we just skip to the next context in the
>>> chain
>>> which will be one of #ifCurtailed: or #ensure:. If it is not nil, we skip
>>> to
>>> the context's (represented by the argument to the method) sender."
>>> ^self method isMarkedForUnwindInAction
>>> ifTrue: [(self localAt: 1) == nil
>>> ifTrue: [self sender sender]
>>> ifFalse: [(self localAt: 1) sender]]
>>> ifFalse: [self]
>>>
>>> and termination by another process (which causes it to execute
>>> auto-termination) skips over them also to avoid restarting unwinds
>>> already
>>> in progress.  [Note there's a per-process interruptProtect semaphore used
>>> to
>>> coordinate termination].
>>>
>>>
>>>  Under the assumption that you'd like to support the well-behaved case,
>>>> what do you do when using in-process termination in a non-well-behaved
>>>> case?
>>>> How do you find out that a termination request didn't finish, and what
>>>> do
>>>> you do in practice to kill it?
>>>>
>>>
>>> I would expect to be able to write something like
>>>        [process terminate]
>>>              valueIfLongerThanMilliseconds: 1000
>>>              evaluate: [process terminateWithExtremePrejudice]
>>> i.e. nuke it if it doesn't terminate within a specific time, or
>>>       [[:exit|
>>>           [process terminateAbortingCurrentUnwind]
>>>              valueIfLongerThanMilliseconds: 100
>>>              evaluate: exit] valueWithExit.
>>>         process isTerminated] whileFalse
>>> i.e. abort any unwind if it doesn't execute within a specific time but at
>>> least attempt to run all unwinds.
>>>
>>>>
>>>> Also, what are the priority implications of the different schemes? One
>>>> of
>>>> the more interesting issues here is that in cases like on our servers I
>>>> could see how one of the lower priority processes times out due to
>>>> starvation, is then terminated but this may never complete if it's run
>>>> from
>>>> the priority it's at instead of the terminator process. Does for example
>>>> VW
>>>> bump the priority of the terminated process? How does that interact when
>>>> the
>>>> terminated process isn't well-behaved and runs wild?
>>>>
>>>
>>> Priority inversion is again an orthogonal issue.  It would be great to
>>> have
>>> priority boosting semaphores.  But in any case one could write
>>>      process
>>>             makePriorityAtLeast: Processor activePriority;
>>>             terminate
>>>
>>>>
>>>> I think there are numerous interesting aspects that we have somewhat
>>>> reasonable answers in Squeak today that I don't even know what the
>>>> implications would be if you'd go for in-process termination.
>>>>
>>>
>>> Do you agree that the right approach is one of specification?  What is
>>> the
>>> correct behaviour and how close to implementing that can one reasonably
>>> get?
>>>  I think from a specification POV one wants both process identity and
>>> exception context to be respected when running unwinds during termination
>>> and the more straight-forward implementation is to arrange that a process
>>> terminates itself when requested to terminate by some other process.
>>> Orthogonal issues as to what happens to unwinds in progress when repeated
>>> termination requests are made (from within the process or without) and
>>> what
>>> priority termination runs at should be specified and then implemented.
>>>  It's
>>> reasonable for the implementation to fall short of an ideal if
>>> documented.
>>>  It's at least regrettable to have the system specified by an
>>> implementation.
>>> As far as the implications of in-process termination I don't know of a
>>> better way, beyond being able to inspect all current Squeak applications,
>>> of
>>> voicing a specification and getting people who have written applications
>>> affected by this functionality to comment on the specification.
>>> I think I get why the Squeak implementation is a it is (there's a /lot/
>>> of
>>> implementation work involved in adding unwinds to the language and it's
>>> unreasonable to expect that all of this work will get done at the first
>>> attempt).  But the fact that the implementation is not ideal shouldn't
>>> discourage us from improving it over time.
>>>
>>>>
>>>> Cheers,
>>>>  - Andreas
>>>>
>>>
>>> best
>>> Eliot
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20100307/2b6d957f/attachment.htm


More information about the Squeak-dev mailing list