[squeak-dev] Ephemerons explained [was Re: Improving WeakKeyDictionary comment]

Eliot Miranda eliot.miranda at gmail.com
Tue Jun 7 22:46:46 UTC 2022

On Sun, Jun 5, 2022 at 10:51 PM Chris Muller <ma.chris.m at gmail.com> wrote:

> [snip]
> > If you need accurate information, you must check against #slowSize, which
> > means O(n) runtime. Using ephemerons may help with that.
> I hope to one day understand ephemerons and why they're useful,
> because I sense they provide some kind of magic that could help my
> stack.  But, I never grok'd them so far.

Let me see if I can explain them coherently.  I'll start top-down.

Ephemerons are Associations that interact with the garbage collector; they
kill two birds with one stone.  First, they enable the collection of
non-isolated cycles. Second, they enable the identification of objects that
are finalizable.

Taking the first topic first, what's a non-isolated cycle?  Before that
let's understand what an isolated cycle is.  An isolated cycle is just some
cycle of objects that are not reachable from the system roots.  The system
roots are effectively the Smalltalk dictionary and the current context of
the active process.  For example, all runnable processes are reachable from
Processor, the singleton instance of ProcessorScheduler, and it is
reachable from Smalltalk since there is an association in Smalltalk with
key #Processor and value the singleton instance of ProcessorScheduler.  All
classes and global variables are held onto by associations in the Smalltalk

So an isolated cycle is some cycle that isn't reachable from Smalltalk or
the current process.  Let's create a couple.

    | cyclicArrays |
    cyclicArrays := Array new: 1.
    cyclicArrays at: 1 put: (Array with: cyclicArrays).
    cyclicArrays := nil

This creates two arrays that point to each other. The first is reachable
from the temporary cyclicArrays, until cyclicArrays is assigned nil, at
which point the two arrays are no longer reachable from the current process
(or the roots) and will; be collected soon.

Here's a more interesting example that comes up a lot.

    | process |
    process := [Semaphore new wait] newProcess.
    process resume.
    process := nil

This reduces to:

    [Semaphore new wait] fork

When the process waits on the Semaphore it blocks, because the semaphore
has no excess signals. When the process blocks it is removed from the set
of runnable processes and added to the list of processes waiting on the
Semaphore (actually the Semaphore *is* that list). So there's now an
isolated cycle of the process and the semaphore, and it will get collected
soon.  It is very similar to the cyclic arrays; just different classes and
a few more objects, like the chain of contexts and the block with which the
process was started, all wrapped by the process instance.

OK, so what's a non-isolated cycle?  The classic example is that of some
model object in MVC.  In MVC each view adds itself as a dependent of the
model.  If the model inherits from Model then it will have its own
dependents (an instance of Array, in its first instance variable).  So if
the SystemWindow containing the views is deleted (removed from the world's
set of morphs) then we have another isolated cycle between the views which
refer directly to the model, and the model's dependents which point back to
the views.  But if the model does *not* inherit from Model then its
dependents are attached to it in the DependentsFields dictionary, a class
variable of Object, and Object is in Smalltalk.  So
the DependentsFields dictionary is effectively a root.

Object methods for dependents
addDependent: anObject
    "Make the given object one of the receiver's dependents."

    | dependents |
    dependents := self dependents.
    (dependents includes: anObject) ifFalse:
        [self myDependents: (dependents copyWithDependent: anObject)].
    ^ anObject

    "Answer a collection of objects that are 'dependent' on the receiver;
    that is, all objects that should be notified if the receiver changes."

    ^ self myDependents ifNil: [#()]

    "Private. Answer a list of all the receiver's dependents."

    ^ DependentsFields at: self ifAbsent: []

Model methods for dependents
    ^ dependents

So if some object (not inheriting from Model) gets some views attached to
it as dependents and these views refer to the model we have a cycle that is
reachable from the roots via a strong association in DependentsFields.  So
this cycle will not be collected when the SystemWindow containing those
views is deleted because there is still a reference to the object in the
association in DependentsFields; the object is the key of the association.
The array of dependents is the value of the association.  Even if we were
to make the array of dependents weak, this allows the views to be
collected, but not the model object itself.  This is a problem, and is the
reason the release method exists:

Object methods for dependents
    "Remove references to objects that may refer to the receiver. This
    should be overridden by subclasses with any cycles, in which case the
    subclass should also include the expression super release."

    self breakDependents

Now we understand the difference between isolated cycles and non-isolated
cycles we can understand what an ephemeron is, or at least how
it came to be invented. An ephemeron is an association designed to be used
with dictionaries such as DependentsFields, which are used to add
properties to objects.  What we want is not to have to send release.  We'd
like the system to be able to figure out that the cycle between the object
and its views would be isolated were it not for the reference from
DependentsFields.  Let's make sure we get the point here. When the views
exist within a SystemWindow that is on-screen (part of the world's morphs)
the model is reachable both from the world's morphs (transitive closure
from the world's submorphs through the morphs' model inst var) *and* from
DependentsFields.  When the SystemWindow is deleted it is removed from the
world's submorphs so the model and its cycle is *only* reachable from
DependentsFields.  Other than for DependentsFields the cycle would be
isolated and would be collected.  The reference from DependentsFields keeps
the cycle around (prevents it from being collected).

But an ephemeron is a smart association, known to the garbage collector.
When the garbage collector traverses the objects reachable from the roots
and finds an ephemeron it doesn't follow the key or the value of the
ephemeron unless the key has already been reached.  The ephemeron acts like
a guard.  If its key has already been reached it is reachable from the
roots, and there is no possibility of an isolated cycle existing, and in
this case an ephemeron is exactly like an association, and does nothing
special.  But if the key hasn't been reached then the ephemeron may be
referencing an isolated cycle.

Now because traversing the object graph is done one object at a time, when
an ephemeron is reached which has an as-yet-unreached key the garbage
collector doesn't know if the key might be reached later in the traversal
following the path from some other root, so it puts the ephemeron to one
side, in the set of ephemerons with as-yet-unreachable keys.  Once the
traversal is finished then it examines that set and discards any ephemerons
in the set that do have reachable keys. These were objects reachable from
the roots other than through ephemerons.  Now the set contains only
ephemerons whose keys are *only* reachable from ephemerons reachable from
the roots.  The garbage collector can now finish its traversal by
traversing the ephemerons keys and values.  But it does this *after*
identifying the set of ephemerons whose keys are only reachable from
ephemerons.  These objects would be collectable other than being reachable
from ephemerons.

So let's go back up to the system level for a moment.  If we use Ephemeron
instead of Association as the class of associations in various property
dictionaries such as DependentsFields, then the garbage collector can
identify these isolated cycles reachable from ephemerons.  This needs only
one more thing to solve the collection of these isolated cycles reachable
from ephemerons, and relieve the programmer of the task of sending
release.  And that thing is to queue the set of ephemerons with keys
reachable only from ephemerons, and signal a semaphore.  Then a process in
the image waiting on that semaphore can fetch the ephemerons from the queue
and send each such ephemeron a message.  I like to use the selector #mourn,
because the ephemeron's key is about to die, and the ephemeron is mourning
the demise of its key.  In response to #mourn the ephemeron can remove
itself from its dictionary (which will result in the ephemeron getting
collected soon) and send #finalize to its key, because other than the
reference from the ephemeron the key would have been collected.

So this is what the garbage collector does at the end of its traversal.  It
removes each ephemerons in the set of ephemerons with keys only reachable
from ephemerons and adds it to the finalization queue. And this is the
second bird. Because ephemerons delay scanning isolated cycles until the
end of the garbage collector's traversal, they can be used to do proper
finalization. The key of an ephemeron is identified as being collectable
without having been collected yet.  This is very useful.  Compare this with
the way one has to do finalization with weak arrays.  Here, the death of an
object is noted by a weak array having the reference to the collected
object set to nil.  To have something not-yet-collected to finalize, the
system has to keep a copy of the object around and finalize the copy;
keeping the copy up-to-date with its twin has complexity and performance
overheads.  Ephemerons are much nicer.

Now Chris if you still don't understand ephemerons (no shame in that; this
is tricky stuff) please take time to tell me what doesn't make sense and
I'll try again, editing this message until you do understand it.  Then it
can serve as a learning document/help page which explains ephemerons well.

There's one more thing to say about ephemerons at the image level. One can
have any number of ephemerons on a single key.  You could have one in
DependentsFields, and another in some dictionary of finalizable objects,
and so on. So we can use ephemerons for more than one thing.  And because
what marks an ephemeron as an ephemeron is its format (defined by its class
definition, i.e. Association ephemeronSubclass: #Ephemeron) we can have one
ephemeron class just for preventing non-isolated cycles, for
DependentsFields, and another for finalization.  One class would have a
mourn method that just removes itself from its dictionary, another would
have a morn method that sends #finalize to its key, etc.

Finally (sorry, couldn't resist ;-) ) some of you will have noticed
something interesting about the scanning of ephemerons. This is of great
importance to a correct implementation.  At the end of the garbage
collector's traversal once it has collected the set of ephemerons with keys
only reachable from ephemerons it can queue these ephemerons for
finalization.  But then it does traverse the objects reachable from these
ephemerons, and in doing so *it may encounter more, as-yet-unreached,
ephemerons* and these may have as-yet-unreached keys.  So the queueing and
traversing of ephemerons continues until the set of ephemerons with
unreachable keys is empty, and all such ephemerons have been added to the
finalization queue.  At this point, just as in a system without ephemerons,
all reachable objects have been reached, but those only reachable through
ephemerons have been identified and their guardian ephemerons notified of
that fact by virtue of being queued for finalization.

And if this is enough explanation for someone itching to implement
ephemerons  I have a change set with some of the code already written.
Find attached.

best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220607/a2e67b11/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Ephemerons.st
Type: application/octet-stream
Size: 10593 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220607/a2e67b11/attachment-0001.obj>

More information about the Squeak-dev mailing list