[squeak-dev] My solution to handling errors in system-critical processes

Thu May 19 10:56:24 UTC 2011

Hello,

if you remember , there was a discussion about how to solve
efficiently a following problem:

We have a critical processes running in system, which usually running
an infinite loop and providing some service(s), which triggered
periodically.
A most common case is weak finalization. We need to ensure that
finalization works, and if finalization of some object causing an
error,
it should not affect the finalization of the rest of object(s) nor
affect the finalization process (like suspending it or terminating
it).
Also, in another discussion about Announcements, we found that same
requirement actually should be fulfilled by Announcement framework: a
delivery of a single announcement to some subscriber may fail,
however regardless of such failure, other subscribers should be able
to receive an announcement no matter what happen.

There was a different solutions proposed, like finalizing a single
object in separate, forked process,
so even if it will fail with error, the rest of finalizers are not
affected by this and finalization process continues to run normally.
However its not very effective, because you paying the cost of forking
process each time you need to finalize new object. There was an
optimized solution,
but nevertheless it doesn't changing the idea: perform a single item
finalization in separate process.

Another (a bit lame) approach is to simply swallow any errors, and
while it provides guarantee that your critical process won't be
terminated due to errors,
at the same time, it makes impossible to detect and fix error(s),
which of course should be taken care of to prevent them from appearing
in future.

So, my idea is to fork only if error happens.
Add a new protocol to BlockClosure, which could allow us to handle
errors in special manner:

[ self doSomething  ] on: Error fork: [:ex | handle error here ].

it is similar to #on:do: , except that in case of error, and error
handler is invoked in separate forked process, while original process
simply returns from closure activation with nil return value.
But don't think that it is implemented as simple as:

on: error fork: handleAction
  ^ self on: error do: [:ex | [handleAction cull: ex ] fork ].

it would be too easy and therefore useless. :)
Because when you do it like that, a debugger window which opens an
error, will not show you the stack contents which you wanna see.
And then you will have to manually inspect the exception and then
inspect an exception context and so on, in order to determine what
caused error.

What my implementation does, is actually splits the stack of current
process and all contexts which are above #on:fork: method is going to
forked process,
while original process simply returns to sender of #on:fork: with nil
return value.

So, consider the original stack of a single process:

a. <bottom>
a. ...
a. ... sender of #on:fork:
b. <#on:fork:>
b. #on:do:
b. ...
b. ...
b. ...
b. SomeError signal.
b. ...
b. ...
b. error handler.

then in case of exception, all contexts labeled by (a) are left in
original process, and original process continues execution from sender
of on:fork:,
while contexts labeled by (b) is transferred to a newly forked process.

In this way, if exception is unhandled (if you put 'ex pass' there),
the debugger will show a usual stack trace, as you normally see when
error occurs, except that you don't see stack below #on:do: method
(which is a context next to #on:fork:),
but that's already enough information to determine what causing an
error and even fix it and complete the action!

So, we can have a cake (a fault-tolerant critical services) and eat it
too (conveniently debug errors which could happen there), and without
extra overhead.

Please review my implementation. I tried it with could different
exceptions and it works fine.
However there could be some caveats. I tried it on Cog and it works fine.

If there everything ok, then we can start using this method in
finalization and in announcements, which will improve our systems
stability and make it much easier to deal with errors there :)

-- 
Best regards,
Igor Stasenko AKA sig.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BlockClosure-onfork.st
Type: application/octet-stream
Size: 1413 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20110519/643af8c6/BlockClosure-onfork.obj