[squeak-dev] Ephemerons explained [was Re: Improving WeakKeyDictionary comment]

Wed Jun 15 22:13:53 UTC 2022

Wow, thank you so much for that fantastic description, Eliot!  I believe I
understand it!

It helped immensely that you clarified the *motivation* behind their
invention:

> What we want is not to have to send release.
> This needs only one more thing to solve the collection of these isolated
cycles reachable from ephemerons, and
> relieve the programmer of the task of sending release.

This was very helpful.  My brain has to believe there's a good reason to
learn something before it'll make space for it.  :)

Unwanted, non-isolated cycles are not something I want either but, I must
admit, I haven't done enough ToolBuilder / MVC programming to understand
why sending #release is such a big deal.  Just due to the UI being made up
of so many objects connected to so many model objects, I guess.  But I tend
to prefer a rather *buttoned down* style of design and programming.  For
example, I've never used finalization in any of my designs, because I only
ever design for explicit release of resources.

   - If I open a file, I explicitly close it, rather than use finalization.
   - If I fork a Process, I ensure I make a way for it to "fall out the
bottom" gracefully, rather than #terminate at some random point.
   - If I ever send #addDependent: I ensure I have a corresponding
#breakDependents.
   - etc.

That doesn't mean I don't respect or wouldn't try a more aggressive style.
I do appreciate being able to create objects with impunity and rely on the
garbage collector rather than having to delete them like in C++.  :)  This
feels like an extension of that.  I can see *some* appeal of not wanting to
have to send #release, it creates more code that does nothing for the
user, just the computer's memory management.  Still, it seems like it could
be written just once and tucked somewhere into the "Close" code of
SystemWindow, and be a unified part of the abstraction handling the
unregistration of the model objects.

Despite that, something deep suggests to me that there may be
certain designs that can't be so buttoned down which these features
(ephemerons and/or finalization) could simplify from a point of
infeasibility to a point of feasibility.  It's good to know this capability
is in the toolbox, I've starred this description in my email client, just
in case.  Thank you so much.

 - Chris

On Tue, Jun 7, 2022 at 5:47 PM Eliot Miranda <eliot.miranda at gmail.com>
wrote:

>
> On Sun, Jun 5, 2022 at 10:51 PM Chris Muller <ma.chris.m at gmail.com> wrote:
>
>> [snip]
>>
>> > If you need accurate information, you must check against #slowSize,
>> which
>> > means O(n) runtime. Using ephemerons may help with that.
>>
>> I hope to one day understand ephemerons and why they're useful,
>> because I sense they provide some kind of magic that could help my
>> stack.  But, I never grok'd them so far.
>>
>
> Let me see if I can explain them coherently.  I'll start top-down.
>
> Ephemerons are Associations that interact with the garbage collector; they
> kill two birds with one stone.  First, they enable the collection of
> non-isolated cycles. Second, they enable the identification of objects that
> are finalizable.
>
> Taking the first topic first, what's a non-isolated cycle?  Before that
> let's understand what an isolated cycle is.  An isolated cycle is just some
> cycle of objects that are not reachable from the system roots.  The system
> roots are effectively the Smalltalk dictionary and the current context of
> the active process.  For example, all runnable processes are reachable from
> Processor, the singleton instance of ProcessorScheduler, and it is
> reachable from Smalltalk since there is an association in Smalltalk with
> key #Processor and value the singleton instance of ProcessorScheduler.  All
> classes and global variables are held onto by associations in the Smalltalk
> dictionary.
>
> So an isolated cycle is some cycle that isn't reachable from Smalltalk or
> the current process.  Let's create a couple.
>
>     | cyclicArrays |
>     cyclicArrays := Array new: 1.
>     cyclicArrays at: 1 put: (Array with: cyclicArrays).
>     cyclicArrays := nil
>
> This creates two arrays that point to each other. The first is reachable
> from the temporary cyclicArrays, until cyclicArrays is assigned nil, at
> which point the two arrays are no longer reachable from the current process
> (or the roots) and will; be collected soon.
>
> Here's a more interesting example that comes up a lot.
>
>     | process |
>     process := [Semaphore new wait] newProcess.
>     process resume.
>     process := nil
>
> This reduces to:
>
>     [Semaphore new wait] fork
>
> When the process waits on the Semaphore it blocks, because the semaphore
> has no excess signals. When the process blocks it is removed from the set
> of runnable processes and added to the list of processes waiting on the
> Semaphore (actually the Semaphore *is* that list). So there's now an
> isolated cycle of the process and the semaphore, and it will get collected
> soon.  It is very similar to the cyclic arrays; just different classes and
> a few more objects, like the chain of contexts and the block with which the
> process was started, all wrapped by the process instance.
>
> OK, so what's a non-isolated cycle?  The classic example is that of some
> model object in MVC.  In MVC each view adds itself as a dependent of the
> model.  If the model inherits from Model then it will have its own
> dependents (an instance of Array, in its first instance variable).  So if
> the SystemWindow containing the views is deleted (removed from the world's
> set of morphs) then we have another isolated cycle between the views which
> refer directly to the model, and the model's dependents which point back to
> the views.  But if the model does *not* inherit from Model then its
> dependents are attached to it in the DependentsFields dictionary, a class
> variable of Object, and Object is in Smalltalk.  So
> the DependentsFields dictionary is effectively a root.
>
> Object methods for dependents
> addDependent: anObject
>     "Make the given object one of the receiver's dependents."
>
>     | dependents |
>     dependents := self dependents.
>     (dependents includes: anObject) ifFalse:
>         [self myDependents: (dependents copyWithDependent: anObject)].
>     ^ anObject
>
> dependents
>     "Answer a collection of objects that are 'dependent' on the receiver;
>     that is, all objects that should be notified if the receiver changes."
>
>     ^ self myDependents ifNil: [#()]
>
> myDependents
>     "Private. Answer a list of all the receiver's dependents."
>
>     ^ DependentsFields at: self ifAbsent: []
>
> Model methods for dependents
> myDependents
>     ^ dependents
>
> So if some object (not inheriting from Model) gets some views attached to
> it as dependents and these views refer to the model we have a cycle that is
> reachable from the roots via a strong association in DependentsFields.  So
> this cycle will not be collected when the SystemWindow containing those
> views is deleted because there is still a reference to the object in the
> association in DependentsFields; the object is the key of the association.
> The array of dependents is the value of the association.  Even if we were
> to make the array of dependents weak, this allows the views to be
> collected, but not the model object itself.  This is a problem, and is the
> reason the release method exists:
>
> Object methods for dependents
> release
>     "Remove references to objects that may refer to the receiver. This
> message
>     should be overridden by subclasses with any cycles, in which case the
>     subclass should also include the expression super release."
>
>     self breakDependents
>
> Now we understand the difference between isolated cycles and non-isolated
> cycles we can understand what an ephemeron is, or at least how
> it came to be invented. An ephemeron is an association designed to be used
> with dictionaries such as DependentsFields, which are used to add
> properties to objects.  What we want is not to have to send release.  We'd
> like the system to be able to figure out that the cycle between the object
> and its views would be isolated were it not for the reference from
> DependentsFields.  Let's make sure we get the point here. When the views
> exist within a SystemWindow that is on-screen (part of the world's morphs)
> the model is reachable both from the world's morphs (transitive closure
> from the world's submorphs through the morphs' model inst var) *and* from
> DependentsFields.  When the SystemWindow is deleted it is removed from the
> world's submorphs so the model and its cycle is *only* reachable from
> DependentsFields.  Other than for DependentsFields the cycle would be
> isolated and would be collected.  The reference from DependentsFields keeps
> the cycle around (prevents it from being collected).
>
> But an ephemeron is a smart association, known to the garbage collector.
> When the garbage collector traverses the objects reachable from the roots
> and finds an ephemeron it doesn't follow the key or the value of the
> ephemeron unless the key has already been reached.  The ephemeron acts like
> a guard.  If its key has already been reached it is reachable from the
> roots, and there is no possibility of an isolated cycle existing, and in
> this case an ephemeron is exactly like an association, and does nothing
> special.  But if the key hasn't been reached then the ephemeron may be
> referencing an isolated cycle.
>
> Now because traversing the object graph is done one object at a time, when
> an ephemeron is reached which has an as-yet-unreached key the garbage
> collector doesn't know if the key might be reached later in the traversal
> following the path from some other root, so it puts the ephemeron to one
> side, in the set of ephemerons with as-yet-unreachable keys.  Once the
> traversal is finished then it examines that set and discards any ephemerons
> in the set that do have reachable keys. These were objects reachable from
> the roots other than through ephemerons.  Now the set contains only
> ephemerons whose keys are *only* reachable from ephemerons reachable from
> the roots.  The garbage collector can now finish its traversal by
> traversing the ephemerons keys and values.  But it does this *after*
> identifying the set of ephemerons whose keys are only reachable from
> ephemerons.  These objects would be collectable other than being reachable
> from ephemerons.
>
> So let's go back up to the system level for a moment.  If we use Ephemeron
> instead of Association as the class of associations in various property
> dictionaries such as DependentsFields, then the garbage collector can
> identify these isolated cycles reachable from ephemerons.  This needs only
> one more thing to solve the collection of these isolated cycles reachable
> from ephemerons, and relieve the programmer of the task of sending
> release.  And that thing is to queue the set of ephemerons with keys
> reachable only from ephemerons, and signal a semaphore.  Then a process in
> the image waiting on that semaphore can fetch the ephemerons from the queue
> and send each such ephemeron a message.  I like to use the selector #mourn,
> because the ephemeron's key is about to die, and the ephemeron is mourning
> the demise of its key.  In response to #mourn the ephemeron can remove
> itself from its dictionary (which will result in the ephemeron getting
> collected soon) and send #finalize to its key, because other than the
> reference from the ephemeron the key would have been collected.
>
> So this is what the garbage collector does at the end of its traversal.
> It removes each ephemerons in the set of ephemerons with keys only
> reachable from ephemerons and adds it to the finalization queue. And this
> is the second bird. Because ephemerons delay scanning isolated cycles until
> the end of the garbage collector's traversal, they can be used to do proper
> finalization. The key of an ephemeron is identified as being collectable
> without having been collected yet.  This is very useful.  Compare this with
> the way one has to do finalization with weak arrays.  Here, the death of an
> object is noted by a weak array having the reference to the collected
> object set to nil.  To have something not-yet-collected to finalize, the
> system has to keep a copy of the object around and finalize the copy;
> keeping the copy up-to-date with its twin has complexity and performance
> overheads.  Ephemerons are much nicer.
>
>
> Now Chris if you still don't understand ephemerons (no shame in that; this
> is tricky stuff) please take time to tell me what doesn't make sense and
> I'll try again, editing this message until you do understand it.  Then it
> can serve as a learning document/help page which explains ephemerons well.
>
>
> There's one more thing to say about ephemerons at the image level. One can
> have any number of ephemerons on a single key.  You could have one in
> DependentsFields, and another in some dictionary of finalizable objects,
> and so on. So we can use ephemerons for more than one thing.  And because
> what marks an ephemeron as an ephemeron is its format (defined by its class
> definition, i.e. Association ephemeronSubclass: #Ephemeron) we can have one
> ephemeron class just for preventing non-isolated cycles, for
> DependentsFields, and another for finalization.  One class would have a
> mourn method that just removes itself from its dictionary, another would
> have a morn method that sends #finalize to its key, etc.
>
> Finally (sorry, couldn't resist ;-) ) some of you will have noticed
> something interesting about the scanning of ephemerons. This is of great
> importance to a correct implementation.  At the end of the garbage
> collector's traversal once it has collected the set of ephemerons with keys
> only reachable from ephemerons it can queue these ephemerons for
> finalization.  But then it does traverse the objects reachable from these
> ephemerons, and in doing so *it may encounter more, as-yet-unreached,
> ephemerons* and these may have as-yet-unreached keys.  So the queueing and
> traversing of ephemerons continues until the set of ephemerons with
> unreachable keys is empty, and all such ephemerons have been added to the
> finalization queue.  At this point, just as in a system without ephemerons,
> all reachable objects have been reached, but those only reachable through
> ephemerons have been identified and their guardian ephemerons notified of
> that fact by virtue of being queued for finalization.
>
>
> And if this is enough explanation for someone itching to implement
> ephemerons  I have a change set with some of the code already written.
> Find attached.
>
> _,,,^..^,,,_
> best, Eliot
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220615/49d52725/attachment.html>