Finalization (was: Re: [Seaside] WeakArray (again))

Sat Mar 25 03:23:12 UTC 2006

David Shaffer wrote:
> \begin{amateurHour}
> 
> It seems to me that the notification needs to be changed to actually
> queueing information about the objects which the GC deams
> un(strongly)reachable.  I spent some time staring at
> ObjectMemory>>sweepPhase, #finalizeReference: and #signalFinalization:
> which seem to be the cornerstones of this process.  All that
> #signalFinalization: is currently doing is signaling a semaphore (well,
> indicating that one should be "signaled" later).  Why not keep a list of
> (oop,i) [i is the offset of the weak reference in the oop] pairs and
> somehow communicate those back to a Smalltalk object?  As a total VM
> novice it just seems too simple ;-)  What I think I would do is
> associate a queue like thing with every weak reference container.  Then
> when an object becomes GC-able I'd place the (oop,i) pair in that shared
> queue.  What I need is someone to hold my hand through...
> 
> ...designing this "queue like thing".  How about a circular array which
> can only be "read" (move the read index) by ST code and only be written
> by the VM code?  This avoids a lot of concurrency issues.  Are there any
> examples like this in the VM?
> 
> \end{amateurHour}

What you've described is not a bad idea in general (and it's probably 
what VW does) but there are things that I don't like about it. For 
example, part of why the finalization process takes so much time is that 
there are so many weak references lost that we don't care about - the 
whole idea that just because you use a weak array you need to know when 
its contents goes away is just bogus. Secondly, once you start relying 
on "accurate" finalization information you should really make sure it's 
accurate (e.g., one signal/entry per finalized object). And once you do 
that you need to deal with the ugly corner cases of an overflow of the 
finalization queue (and the effect that you probably can't allocate any 
larger one because the GC you're currently in was triggered by a low 
space condition to begin with ;-) Nasty, nasty issues.

Having said that, let me propose a mechanism that (I think) is 
fundamentally different and fundamentally simpler. Namely, to make the 
requirement that you only get notifications for the finalization of 
objects that you explicitly register for by creating a "finalizer" 
object, e.g., an observer which is allocated before it's ever needed. 
This simple change avoids both the problem of GC needing to allocate 
memory when there is none as well as sending notifications about 
finalizations that nobody cares about, which are both very desirable 
properties. When the object becomes eligible for garbage collection, the 
finalizer is then put into a list of objects that have indeed been 
finalized and the finalization process simply pulls them out of the 
queue and sends #finalize to them.

In its simplest form, this could mean a finalizer is a structure with 
(besides the prev and next links for putting it into a structore) two 
slots a "weak" slot for the object being guarded and a "strong" slot for 
the object performing the finalization (its #finalizer). When the 
garbage collector runs across a Finalizer and notices its observed value 
is being collected, it can simply put the finalizer into the 
finalization list and is done. (btw, this scheme is *vastly* easier to 
implement than your proposed scheme since everything is pre-allocated 
and you only move the object from one list to another).

But while we're at it, we could also shoot a little bit further and get 
away from post-mortem finalization (which I find a highly overrated 
concept in practice). The only thing we'd change in the above is that 
the garbage collector would now also transfer the object from the "weak" 
into the "strong" slot[*1]. This makes the finalizer the sole last 
reference to the object. If the finalizer drops it, it's gone. If the 
finalizer decides to store it, it will survive. Lots of interesting 
possibilities and much cleaner since you gain access to the full context 
of the object and its state.

[*1] The easiest way to do this would be to simply clone the object but 
unfortunately this also has the unbounded memory problem so something a 
bit more clever might be required. Basically we really want *all* 
references to the object except from the finalizer to be cleaned up.

Note that weak arrays or other weak classes wouldn't be affected at all 
by this since only Finalizers get the notifications - all other weak 
classes would simply drop the references when they get collected and 
never get notified about anything.

Cheers,
   - Andreas