[squeak-dev] Ephemerons explained [was Re: Improving WeakKeyDictionary comment]

Herbert König herbertkoenig at gmx.net
Wed Jun 8 03:30:08 UTC 2022


Thanks Eliot,

at least to me it read coherent and the topic is nothing I ever dealt
with. All I knew before was that gc removes objects that are not
referenced any more and that the chain of references has to go down to
the system root.

Cheers,

Herbert

Am 08.06.2022 um 00:46 schrieb Eliot Miranda:
>
> On Sun, Jun 5, 2022 at 10:51 PM Chris Muller <ma.chris.m at gmail.com> wrote:
>
>     [snip]
>
>     > If you need accurate information, you must check against
>     #slowSize, which
>     > means O(n) runtime. Using ephemerons may help with that.
>
>     I hope to one day understand ephemerons and why they're useful,
>     because I sense they provide some kind of magic that could help my
>     stack.  But, I never grok'd them so far.
>
>
> Let me see if I can explain them coherently.  I'll start top-down.
>
> Ephemerons are Associations that interact with the garbage collector;
> they kill two birds with one stone.  First, they enable the collection
> of non-isolated cycles. Second, they enable the identification of
> objects that are finalizable.
>
> Taking the first topic first, what's a non-isolated cycle?  Before
> that let's understand what an isolated cycle is.  An isolated cycle is
> just some cycle of objects that are not reachable from the system
> roots.  The system roots are effectively the Smalltalk dictionary and
> the current context of the active process.  For example, all runnable
> processes are reachable from Processor, the singleton instance of
> ProcessorScheduler, and it is reachable from Smalltalk since there is
> an association in Smalltalk with key #Processor and value the
> singleton instance of ProcessorScheduler.  All classes and global
> variables are held onto by associations in the Smalltalk dictionary.
>
> So an isolated cycle is some cycle that isn't reachable from Smalltalk
> or the current process.  Let's create a couple.
>
>     | cyclicArrays |
> cyclicArrays := Array new: 1.
> cyclicArrays at: 1 put: (Array with: cyclicArrays).
> cyclicArrays := nil
>
> This creates two arrays that point to each other. The first is
> reachable from the temporary cyclicArrays, until cyclicArrays is
> assigned nil, at which point the two arrays are no longer reachable
> from the current process (or the roots) and will; be collected soon.
>
> Here's a more interesting example that comes up a lot.
>
>     | process |
>     process := [Semaphore new wait] newProcess.
>     process resume.
>     process := nil
>
> This reduces to:
>
> [Semaphore new wait] fork
>
> When the process waits on the Semaphore it blocks, because the
> semaphore has no excess signals. When the process blocks it is removed
> from the set of runnable processes and added to the list of processes
> waiting on the Semaphore (actually the Semaphore *is* that list). So
> there's now an isolated cycle of the process and the semaphore, and it
> will get collected soon.  It is very similar to the cyclic arrays;
> just different classes and a few more objects, like the chain of
> contexts and the block with which the process was started, all wrapped
> by the process instance.
>
> OK, so what's a non-isolated cycle?  The classic example is that of
> some model object in MVC.  In MVC each view adds itself as a dependent
> of the model.  If the model inherits from Model then it will have its
> own dependents (an instance of Array, in its first instance
> variable).  So if the SystemWindow containing the views is deleted
> (removed from the world's set of morphs) then we have another isolated
> cycle between the views which refer directly to the model, and the
> model's dependents which point back to the views.  But if the model
> does *not* inherit from Model then its dependents are attached to it
> in the DependentsFields dictionary, a class variable of Object, and
> Object is in Smalltalk.  So the DependentsFields dictionary is
> effectively a root.
>
> Object methods for dependents
> addDependent: anObject
> "Make the given object one of the receiver's dependents."
>
> | dependents |
> dependents := self dependents.
> (dependents includes: anObject) ifFalse:
> [self myDependents: (dependents copyWithDependent: anObject)].
> ^ anObject
>
> dependents
> "Answer a collection of objects that are 'dependent' on the receiver;
> that is, all objects that should be notified if the receiver changes."
>
> ^ self myDependents ifNil: [#()]
>
> myDependents
> "Private. Answer a list of all the receiver's dependents."
>
> ^ DependentsFields at: self ifAbsent: []
>
> Model methods for dependents
> myDependents
> ^ dependents
>
> So if some object (not inheriting from Model) gets some views attached
> to it as dependents and these views refer to the model we have a cycle
> that is reachable from the roots via a strong association in
> DependentsFields.  So this cycle will not be collected when the
> SystemWindow containing those views is deleted because there is still
> a reference to the object in the association in DependentsFields; the
> object is the key of the association.  The array of dependents is the
> value of the association.  Even if we were to make the array of
> dependents weak, this allows the views to be collected, but not the
> model object itself.  This is a problem, and is the reason the release
> method exists:
>
> Object methods for dependents
> release
> "Remove references to objects that may refer to the receiver. This
> message
> should be overridden by subclasses with any cycles, in which case the
> subclass should also include the expression super release."
>
> self breakDependents
>
> Now we understand the difference between isolated cycles and
> non-isolated cycles we can understand what an ephemeron is, or at
> least how it came to be invented. An ephemeron is an association
> designed to be used with dictionaries such as DependentsFields, which
> are used to add properties to objects.  What we want is not to have to
> send release.  We'd like the system to be able to figure out that the
> cycle between the object and its views would be isolated were it not
> for the reference from DependentsFields.  Let's make sure we get the
> point here. When the views exist within a SystemWindow that is
> on-screen (part of the world's morphs) the model is reachable both
> from the world's morphs (transitive closure from the world's submorphs
> through the morphs' model inst var) *and* from DependentsFields.  When
> the SystemWindow is deleted it is removed from the
> world's submorphs so the model and its cycle is *only* reachable from
> DependentsFields.  Other than for DependentsFields the cycle would be
> isolated and would be collected.  The reference from DependentsFields
> keeps the cycle around (prevents it from being collected).
>
> But an ephemeron is a smart association, known to the garbage
> collector.  When the garbage collector traverses the objects reachable
> from the roots and finds an ephemeron it doesn't follow the key or the
> value of the ephemeron unless the key has already been reached.  The
> ephemeron acts like a guard. If its key has already been reached it is
> reachable from the roots, and there is no possibility of an isolated
> cycle existing, and in this case an ephemeron is exactly like an
> association, and does nothing special.  But if the key hasn't been
> reached then the ephemeron may be referencing an isolated cycle.
>
> Now because traversing the object graph is done one object at a time,
> when an ephemeron is reached which has an as-yet-unreached key the
> garbage collector doesn't know if the key might be reached later in
> the traversal following the path from some other root, so it puts the
> ephemeron to one side, in the set of ephemerons with
> as-yet-unreachable keys.  Once the traversal is finished then it
> examines that set and discards any ephemerons in the set that do have
> reachable keys. These were objects reachable from the roots other than
> through ephemerons.  Now the set contains only ephemerons whose keys
> are *only* reachable from ephemerons reachable from the roots.  The
> garbage collector can now finish its traversal by traversing the
> ephemerons keys and values.  But it does this *after* identifying the
> set of ephemerons whose keys are only reachable from ephemerons. 
> These objects would be collectable other than being reachable from
> ephemerons.
>
> So let's go back up to the system level for a moment.  If we use
> Ephemeron instead of Association as the class of associations in
> various property dictionaries such as DependentsFields, then the
> garbage collector can identify these isolated cycles reachable from
> ephemerons.  This needs only one more thing to solve the collection of
> these isolated cycles reachable from ephemerons, and relieve the
> programmer of the task of sending release.  And that thing is to queue
> the set of ephemerons with keys reachable only from ephemerons, and
> signal a semaphore.  Then a process in the image waiting on that
> semaphore can fetch the ephemerons from the queue and send each such
> ephemeron a message.  I like to use the selector #mourn, because the
> ephemeron's key is about to die, and the ephemeron is mourning the
> demise of its key.  In response to #mourn the ephemeron can remove
> itself from its dictionary (which will result in the ephemeron getting
> collected soon) and send #finalize to its key, because other than the
> reference from the ephemeron the key would have been collected.
>
> So this is what the garbage collector does at the end of its
> traversal.  It removes each ephemerons in the set of ephemerons with
> keys only reachable from ephemerons and adds it to the finalization
> queue. And this is the second bird. Because ephemerons delay scanning
> isolated cycles until the end of the garbage collector's traversal,
> they can be used to do proper finalization. The key of an ephemeron is
> identified as being collectable without having been collected yet. 
> This is very useful.  Compare this with the way one has to do
> finalization with weak arrays.  Here, the death of an object is noted
> by a weak array having the reference to the collected object set to
> nil.  To have something not-yet-collected to finalize, the system has
> to keep a copy of the object around and finalize the copy; keeping the
> copy up-to-date with its twin has complexity and performance
> overheads.  Ephemerons are much nicer.
>
>
> Now Chris if you still don't understand ephemerons (no shame in that;
> this is tricky stuff) please take time to tell me what doesn't make
> sense and I'll try again, editing this message until you do understand
> it.  Then it can serve as a learning document/help page which explains
> ephemerons well.
>
>
> There's one more thing to say about ephemerons at the image level. One
> can have any number of ephemerons on a single key.  You could have one
> in DependentsFields, and another in some dictionary of finalizable
> objects, and so on. So we can use ephemerons for more than one thing. 
> And because what marks an ephemeron as an ephemeron is its format
> (defined by its class definition, i.e. Association ephemeronSubclass:
> #Ephemeron) we can have one ephemeron class just for
> preventing non-isolated cycles, for DependentsFields, and another for
> finalization.  One class would have a mourn method that just removes
> itself from its dictionary, another would have a morn method that
> sends #finalize to its key, etc.
>
> Finally (sorry, couldn't resist ;-) ) some of you will have noticed
> something interesting about the scanning of ephemerons. This is of
> great importance to a correct implementation. At the end of the
> garbage collector's traversal once it has collected the set of
> ephemerons with keys only reachable from ephemerons it can queue these
> ephemerons for finalization.  But then it does traverse the objects
> reachable from these ephemerons, and in doing so *it may encounter
> more, as-yet-unreached, ephemerons* and these may have
> as-yet-unreached keys.  So the queueing and traversing of ephemerons
> continues until the set of ephemerons with unreachable keys is empty,
> and all such ephemerons have been added to the finalization queue.  At
> this point, just as in a system without ephemerons, all reachable
> objects have been reached, but those only reachable through ephemerons
> have been identified and their guardian ephemerons notified of that
> fact by virtue of being queued for finalization.
>
>
> And if this is enough explanation for someone itching to implement
> ephemerons  I have a change set with some of the code already
> written.  Find attached.
>
> _,,,^..^,,,_
> best, Eliot
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220608/b24e6e86/attachment.html>


More information about the Squeak-dev mailing list