<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
Thanks Eliot,<br>
<br>
at least to me it read coherent and the topic is nothing I ever
dealt with. All I knew before was that gc removes objects that are
not referenced any more and that the chain of references has to go
down to the system root.<br>
<br>
Cheers,<br>
<br>
Herbert<br>
<br>
<div class="moz-cite-prefix">Am 08.06.2022 um 00:46 schrieb Eliot
Miranda:<br>
</div>
<blockquote type="cite"
cite="mid:CAC20JE1QNuPcMzvFvk2KeoOk-o-c52T6puEVfYVYwH1iZtniVA@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_default" style="font-size:small"><br>
</div>
</div>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sun, Jun 5, 2022 at 10:51
PM Chris Muller <<a href="mailto:ma.chris.m@gmail.com"
moz-do-not-send="true" class="moz-txt-link-freetext">ma.chris.m@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span
class="gmail_default" style="font-size:small">[snip]</span><br>
<br>
> If you need accurate information, you must check
against #slowSize, which<br>
> means O(n) runtime. Using ephemerons may help with
that.<br>
<br>
I hope to one day understand ephemerons and why they're
useful,<br>
because I sense they provide some kind of magic that could
help my<br>
stack. But, I never grok'd them so far.<br>
</blockquote>
<div><br>
</div>
<div class="gmail_default" style="font-size:small">Let me see
if I can explain them coherently. I'll start top-down.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">Ephemerons
are Associations that interact with the garbage collector;
they kill two birds with one stone. First, they enable the
collection of non-isolated cycles. Second, they enable the
identification of objects that are finalizable.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">Taking the
first topic first, what's a non-isolated cycle? Before that
let's understand what an isolated cycle is. An isolated
cycle is just some cycle of objects that are not reachable
from the system roots. The system roots are effectively the
Smalltalk dictionary and the current context of the active
process. For example, all runnable processes are reachable
from Processor, the singleton instance of
ProcessorScheduler, and it is reachable from Smalltalk since
there is an association in Smalltalk with key #Processor and
value the singleton instance of ProcessorScheduler. All
classes and global variables are held onto by associations
in the Smalltalk dictionary.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">So an
isolated cycle is some cycle that isn't reachable from
Smalltalk or the current process. Let's create a couple.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small"> |
cyclicArrays |</div>
<div class="gmail_default" style="font-size:small">
cyclicArrays := Array new: 1.</div>
<div class="gmail_default" style="font-size:small">
cyclicArrays at: 1 put: (Array with: cyclicArrays).</div>
<div class="gmail_default" style="font-size:small">
cyclicArrays := nil</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">This
creates two arrays that point to each other. The first is
reachable from the temporary cyclicArrays, until
cyclicArrays is assigned nil, at which point the two arrays
are no longer reachable from the current process (or the
roots) and will; be collected soon.</div>
</div>
<div><br>
</div>
<div class="gmail_default" style="font-size:small">Here's a more
interesting example that comes up a lot.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small"> | process
|</div>
<div class="gmail_default" style="font-size:small"> process
:= [Semaphore new wait] newProcess.</div>
<div class="gmail_default" style="font-size:small"> process
resume.</div>
<div class="gmail_default" style="font-size:small"> process
:= nil</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">This reduces
to:</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">
[Semaphore new wait] fork<br>
</div>
<div><br>
</div>
<div class="gmail_default" style="font-size:small">When the
process waits on the Semaphore it blocks, because the
semaphore has no excess signals. When the process blocks it is
removed from the set of runnable processes and added to the
list of processes waiting on the Semaphore (actually the
Semaphore *is* that list). So there's now an isolated cycle of
the process and the semaphore, and it will get collected
soon. It is very similar to the cyclic arrays; just different
classes and a few more objects, like the chain of contexts and
the block with which the process was started, all wrapped by
the process instance.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">OK, so what's
a non-isolated cycle? The classic example is that of some
model object in MVC. In MVC each view adds itself as a
dependent of the model. If the model inherits from Model then
it will have its own dependents (an instance of Array, in its
first instance variable). So if the SystemWindow containing
the views is deleted (removed from the world's set of morphs)
then we have another isolated cycle between the views which
refer directly to the model, and the model's dependents which
point back to the views. But if the model does *not* inherit
from Model then its dependents are attached to it in
the DependentsFields dictionary, a class variable of Object,
and Object is in Smalltalk. So
the DependentsFields dictionary is effectively a root.</div>
<div><br>
</div>
<div class="gmail_default" style="font-size:small">Object
methods for dependents</div>
addDependent: anObject<br>
<span class="gmail-Apple-converted-space"> </span>"Make the
given object one of the receiver's dependents."<br>
<br>
<span class="gmail-Apple-converted-space"> </span>|
dependents |<br>
<span class="gmail-Apple-converted-space"> </span>dependents
:= self dependents.<br>
<span class="gmail-Apple-converted-space"> </span>(dependents
includes: anObject) ifFalse:<br>
<span class="gmail-Apple-converted-space"> </span> <span
class="gmail-Apple-converted-space"> </span>[self
myDependents: (dependents copyWithDependent: anObject)].<br>
<div class="gmail_default" style="font-size:small"> <span
class="gmail-Apple-converted-space"> </span>^ anObject</div>
<div><br>
</div>
dependents<br>
<span class="gmail-Apple-converted-space"> </span>"Answer a
collection of objects that are 'dependent' on the receiver;<br>
<span class="gmail-Apple-converted-space"> </span>that is,
all objects that should be notified if the receiver changes."<br>
<br>
<span class="gmail-Apple-converted-space"> </span>^ self
myDependents ifNil: [#()]
<div><br>
</div>
<div>myDependents<br>
<span class="gmail-Apple-converted-space"> </span>"Private.
Answer a list of all the receiver's dependents."<br>
<br>
<span class="gmail-Apple-converted-space"> </span>^
DependentsFields at: self ifAbsent: []</div>
<div><br>
</div>
<div>
<div class="gmail_default" style="font-size:small">Model
methods for dependents</div>
myDependents<br>
<div class="gmail_default" style="font-size:small"> <span
class="gmail-Apple-converted-space"> </span>^ dependents</div>
<div><br>
</div>
<div class="gmail_default" style="font-size:small">So if some
object (not inheriting from Model) gets some views attached
to it as dependents and these views refer to the model we
have a cycle that is reachable from the roots via a strong
association in DependentsFields. So this cycle will not be
collected when the SystemWindow containing those views is
deleted because there is still a reference to the object in
the association in DependentsFields; the object is the key
of the association. The array of dependents is the value of
the association. Even if we were to make the array of
dependents weak, this allows the views to be collected, but
not the model object itself. This is a problem, and is the
reason the release method exists:</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">Object
methods for dependents</div>
<div class="gmail_default" style="font-size:small">release<br>
<span class="gmail-Apple-converted-space"> </span>"Remove
references to objects that may refer to the receiver. This
message <br>
<span class="gmail-Apple-converted-space"> </span>should
be overridden by subclasses with any cycles, in which case
the <br>
<span class="gmail-Apple-converted-space"> </span>subclass
should also include the expression super release."<br>
<br>
<span class="gmail-Apple-converted-space"> </span>self
breakDependents<br>
</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">Now we
understand the difference between isolated cycles and
non-isolated cycles we can understand what an ephemeron
is, or at least how it came to be invented. An ephemeron is
an association designed to be used with dictionaries such as
DependentsFields, which are used to add properties to
objects. What we want is not to have to send release. We'd
like the system to be able to figure out that the cycle
between the object and its views would be isolated were it
not for the reference from DependentsFields. Let's make
sure we get the point here. When the views exist within a
SystemWindow that is on-screen (part of the world's morphs)
the model is reachable both from the world's morphs
(transitive closure from the world's submorphs through the
morphs' model inst var) *and* from DependentsFields. When
the SystemWindow is deleted it is removed from the
world's submorphs so the model and its cycle is *only*
reachable from DependentsFields. Other than for
DependentsFields the cycle would be isolated and would be
collected. The reference from DependentsFields keeps the
cycle around (prevents it from being collected).</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">But an
ephemeron is a smart association, known to the garbage
collector. When the garbage collector traverses the objects
reachable from the roots and finds an ephemeron it doesn't
follow the key or the value of the ephemeron unless the key
has already been reached. The ephemeron acts like a guard.
If its key has already been reached it is reachable from the
roots, and there is no possibility of an isolated cycle
existing, and in this case an ephemeron is exactly like an
association, and does nothing special. But if the key
hasn't been reached then the ephemeron may be referencing an
isolated cycle.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">Now because
traversing the object graph is done one object at a time,
when an ephemeron is reached which has an as-yet-unreached
key the garbage collector doesn't know if the key might be
reached later in the traversal following the path from some
other root, so it puts the ephemeron to one side, in the set
of ephemerons with as-yet-unreachable keys. Once the
traversal is finished then it examines that set and discards
any ephemerons in the set that do have reachable keys. These
were objects reachable from the roots other than through
ephemerons. Now the set contains only ephemerons whose keys
are *only* reachable from ephemerons reachable from the
roots. The garbage collector can now finish its traversal
by traversing the ephemerons keys and values. But it does
this *after* identifying the set of ephemerons whose keys
are only reachable from ephemerons. These objects would be
collectable other than being reachable from ephemerons.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">So let's go
back up to the system level for a moment. If we use
Ephemeron instead of Association as the class of
associations in various property dictionaries such as
DependentsFields, then the garbage collector can identify
these isolated cycles reachable from ephemerons. This needs
only one more thing to solve the collection of these
isolated cycles reachable from ephemerons, and relieve the
programmer of the task of sending release. And that thing
is to queue the set of ephemerons with keys reachable only
from ephemerons, and signal a semaphore. Then a process in
the image waiting on that semaphore can fetch the ephemerons
from the queue and send each such ephemeron a message. I
like to use the selector #mourn, because the ephemeron's key
is about to die, and the ephemeron is mourning the demise of
its key. In response to #mourn the ephemeron can remove
itself from its dictionary (which will result in the
ephemeron getting collected soon) and send #finalize to its
key, because other than the reference from the ephemeron the
key would have been collected.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">So this is
what the garbage collector does at the end of its
traversal. It removes each ephemerons in the set of
ephemerons with keys only reachable from ephemerons and adds
it to the finalization queue. And this is the second bird.
Because ephemerons delay scanning isolated cycles until the
end of the garbage collector's traversal, they can be used
to do proper finalization. The key of an ephemeron is
identified as being collectable without having been
collected yet. This is very useful. Compare this with the
way one has to do finalization with weak arrays. Here, the
death of an object is noted by a weak array having the
reference to the collected object set to nil. To have
something not-yet-collected to finalize, the system has to
keep a copy of the object around and finalize the copy;
keeping the copy up-to-date with its twin has complexity and
performance overheads. Ephemerons are much nicer.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">Now Chris
if you still don't understand ephemerons (no shame in that;
this is tricky stuff) please take time to tell me what
doesn't make sense and I'll try again, editing this message
until you do understand it. Then it can serve as a learning
document/help page which explains ephemerons well.</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small"><br>
</div>
<div class="gmail_default" style="font-size:small">There's one
more thing to say about ephemerons at the image level. One
can have any number of ephemerons on a single key. You
could have one in DependentsFields, and another in some
dictionary of finalizable objects, and so on. So we can use
ephemerons for more than one thing. And because what marks
an ephemeron as an ephemeron is its format (defined by its
class definition, i.e. Association ephemeronSubclass:
#Ephemeron) we can have one ephemeron class just for
preventing non-isolated cycles, for DependentsFields, and
another for finalization. One class would have a mourn
method that just removes itself from its dictionary, another
would have a morn method that sends #finalize to its key,
etc.</div>
<div><br>
</div>
<span class="gmail_default" style="font-size:small">Finally
(sorry, couldn't resist ;-) )</span> some of you will have
noticed something interesting about the scanning of
ephemerons. This is of great importance to a correct
implementation. <span class="gmail_default"
style="font-size:small">At the end of the garbage
collector's traversal once it has collected the set of
ephemerons with keys only reachable from ephemerons it can
queue these ephemerons for finalization. But then it does
traverse the objects reachable from these ephemerons, and in
doing so *it may encounter more, as-yet-unreached,
ephemerons* and these may have as-yet-unreached keys. So
the queueing and traversing of ephemerons continues until
the set of ephemerons with unreachable keys is empty, and
all such ephemerons have been added to the finalization
queue. At this point, just as in a system without
ephemerons, all reachable objects have been reached, but
those only reachable through ephemerons have been identified
and their guardian ephemerons notified of that fact by
virtue of being queued for finalization.</span></div>
<div><br>
</div>
<div><br>
</div>
<div>
<div class="gmail_default" style="font-size:small">And if this
is enough explanation for someone itching to implement
ephemerons I have a change set with some of the
code already written. Find attached.</div>
<br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div><span
style="font-size:small;border-collapse:separate">
<div>_,,,^..^,,,_<br>
</div>
<div>best, Eliot</div>
</span></div>
</div>
</div>
</div>
</div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre class="moz-quote-pre" wrap="">
</pre>
</blockquote>
<br>
</body>
</html>