Dear Squeakers,
Whenever I read the class comment of WeakKeyDictionary, I know that I have to take care of something, but I do not know what. Can we improve the comment please to tell the readers more directly what is expected of them?
- Must one ensure that #finalizeValues is sent to one's WeakKeyDictionary instance regularly? - Should one register the dictionary instance via WeakArray addWeakDependent:? - Which VM feature is the WeakRegistry class comment referring to; is it the semaphore used on the class side of WeakArray; and is it relevant to the use of WeakKeyDictionary? - If the answers to the questions above do not give it away, what must one do to use a WeakKeyDictionary safely?
I volunteer to submit the results as comments in WeakKeyDictionary, WeakArray, and WeakRegistry to the Inbox.
The following is just a transcript of the cynical voice in my head while trying to find some answers today. You do not need to read it. If you can answer the questions above, I would be more than happy. ;-)
---
WeakKeyDictionary class comment: "I am a dictionary holding only weakly on my keys. This is a bit dangerous since at any time my keys can go away. Clients are responsible to register my instances by WeakArray such that the appropriate actions can be taken upon loss of any keys.
See WeakRegistry for an example of use."
So I look at WeakArray and WeakRegistry. WeakArray has only three methods, none of which seem to deal with garbage collection, finalization, or elements vanishing. On the class side there is #addWeakDependent:, which deals with class variables that have Finalization in their name. Is this what is meant by "register my instances by WeakArray"...? There is no method comment, so I do not know whether I should call that with my new WeakKeyDictionary or not, or what the effect would be. The class comment of WeakArray also does not tell me anything about finalization or registration.
Looking at WeakRegistry, it associates the registered objects with an executor, and "When the object is garbage collected, the executor can take the appropriate action for any resources associated with the object." Not sure whether I need anything like this for my WeakKeyDictionary instance... It also says that it uses a "new VM feature" (the comment is from 2010), but does not say which one or how it works. In WeakRegistry, the 'valueDictionary' is a Weak(Identity)KeyDictionary. It is used in 10 methods, which takes some time to read. I can see that the dictionary stores WeakFinalizerItems, which holds the executors for an object. In #finalizeValues, the same is sent to the dictionary and to all objects in the 'executors' inst var, which is then set to nil. In the WeakKeyDictionary, #finalizeValues drops elements whose keys have become nil. Ah ha, so do I need to take care of sending #finalizeValues to my WeakKeyDictionary regularly...? In #installFinalizer a finalizer block is attached to the dictionary. This finalizer block adds its argument to the 'executors'. Taking a detour back to WeakKeyDictionary to find out when that block is evaluated... So when the dictionary receives #finalizeValues, fixes hash collisions, or grows, and notices keys that are gone, the values (WeakFinalizerItems) will be added to the 'executors', so that back in #finalizeValues of WeakRegistry, #finalizeValues will eventually be sent to all WeakFinalizerItems of garbage collected elements. But I still do not know what sends #finalizeValues to the WeakRegistry in the first place. The rest of the WeakRegistry methods do not seem to do anything fancy with the dictionary. Collision fixing or growing is unlikely to happen to the WeakKeyDictionary if it is not modified for a while, all the same while the keys may be garbage collected.
For another chance, I read the class side methods of WeakRegistry. The #new constructor of WeakRegistry uses the previously discovered #addWeakDependent: method of WeakArray, hear hear. Reading further about the finalizationProcess in WeakArray, a FinalizationSemaphore is kept in the SpecialObjectsArray, which means that the VM is supposed to use this semaphore. The finalizationProcess sends #finalizeValues---bingo---to the dependents when the semaphore is signalled. Combining the clues I guess that this semaphore is singalled after the garbage collection has run. Is that the mysterious "new VM feature" to which the comment of WeakRegistry---at which I am not looking right now---was referring...? Since it is not spelled out in a comment, and since I am not willing to look at the VM sources now, my guess remains an assumption. Under this assumption, I could register my WeakKeyDictionary as a weak dependent in WeakArray class, so that after each GC #finalizeValues would be sent to my dictionary, which would clean up stale entries, which is likely the thing I was expected to care about. But maybe I missed another piece of the puzzle.
Frankly, I do not wish to reverse engineer how the WeakRegistry works before I can use a WeakKeyDictionary. For that, the workings of WeakRegistry are not simple or well-described enough and it apparently needs some VM knowledge.
Kind regards, Jakob
Hi Jakob,
On Fri, 27 May 2022, Jakob Reschke wrote:
Dear Squeakers, Whenever I read the class comment of WeakKeyDictionary, I know that I have to take care of something, but I do not know what. Can we improve the comment please to tell the readers more directly what is expected of them?
- Must one ensure that #finalizeValues is sent to one's WeakKeyDictionary instance regularly?
Only if you are interested in finalization and want to do it your own way. And even then, only if you want to have your finalizer executed as soon as possible when a key of your dictionary has been garbage collected. So, if you just want a dictionary with weak keys, the answer is no.
- Should one register the dictionary instance via WeakArray addWeakDependent:?
Same as above. I'm not even sure it would work with WeakKeyDictionaries. It's intended to work with WeakRegistries, which are the main providers of finalization.
- Which VM feature is the WeakRegistry class comment referring to; is it the semaphore used on the class side of WeakArray; and is it relevant to the use of WeakKeyDictionary?
One[1] that has never really been used in Squeak and has been obsoleted by ephemerons. Even though we do not use ephemerons for finalization yet.
- If the answers to the questions above do not give it away, what must one do to use a WeakKeyDictionary safely?
The difference between a regular Dictionary and a WeakKeyDictionary is that the latter cannot hold nil as a key, and your associations may disappear when their keys are not referenced strongly.
Levente
[1] https://web.archive.org/web/20210608170146/https://bugs.squeak.org/view.php?...
I volunteer to submit the results as comments in WeakKeyDictionary, WeakArray, and WeakRegistry to the Inbox.
The following is just a transcript of the cynical voice in my head while trying to find some answers today. You do not need to read it. If you can answer the questions above, I would be more than happy. ;-)
WeakKeyDictionary class comment: "I am a dictionary holding only weakly on my keys. This is a bit dangerous since at any time my keys can go away. Clients are responsible to register my instances by WeakArray such that the appropriate actions can be taken upon loss of any keys.
See WeakRegistry for an example of use."
So I look at WeakArray and WeakRegistry. WeakArray has only three methods, none of which seem to deal with garbage collection, finalization, or elements vanishing. On the class side there is #addWeakDependent:, which deals with class variables that have Finalization in their name. Is this what is meant by "register my instances by WeakArray"...? There is no method comment, so I do not know whether I should call that with my new WeakKeyDictionary or not, or what the effect would be. The class comment of WeakArray also does not tell me anything about finalization or registration.
Looking at WeakRegistry, it associates the registered objects with an executor, and "When the object is garbage collected, the executor can take the appropriate action for any resources associated with the object." Not sure whether I need anything like this for my WeakKeyDictionary instance... It also says that it uses a "new VM feature" (the comment is from 2010), but does not say which one or how it works. In WeakRegistry, the 'valueDictionary' is a Weak(Identity)KeyDictionary. It is used in 10 methods, which takes some time to read. I can see that the dictionary stores WeakFinalizerItems, which holds the executors for an object. In #finalizeValues, the same is sent to the dictionary and to all objects in the 'executors' inst var, which is then set to nil. In the WeakKeyDictionary, #finalizeValues drops elements whose keys have become nil. Ah ha, so do I need to take care of sending #finalizeValues to my WeakKeyDictionary regularly...? In #installFinalizer a finalizer block is attached to the dictionary. This finalizer block adds its argument to the 'executors'. Taking a detour back to WeakKeyDictionary to find out when that block is evaluated... So when the dictionary receives #finalizeValues, fixes hash collisions, or grows, and notices keys that are gone, the values (WeakFinalizerItems) will be added to the 'executors', so that back in #finalizeValues of WeakRegistry, #finalizeValues will eventually be sent to all WeakFinalizerItems of garbage collected elements. But I still do not know what sends #finalizeValues to the WeakRegistry in the first place. The rest of the WeakRegistry methods do not seem to do anything fancy with the dictionary. Collision fixing or growing is unlikely to happen to the WeakKeyDictionary if it is not modified for a while, all the same while the keys may be garbage collected.
For another chance, I read the class side methods of WeakRegistry. The #new constructor of WeakRegistry uses the previously discovered #addWeakDependent: method of WeakArray, hear hear. Reading further about the finalizationProcess in WeakArray, a FinalizationSemaphore is kept in the SpecialObjectsArray, which means that the VM is supposed to use this semaphore. The finalizationProcess sends #finalizeValues---bingo---to the dependents when the semaphore is signalled. Combining the clues I guess that this semaphore is singalled after the garbage collection has run. Is that the mysterious "new VM feature" to which the comment of WeakRegistry---at which I am not looking right now---was referring...? Since it is not spelled out in a comment, and since I am not willing to look at the VM sources now, my guess remains an assumption. Under this assumption, I could register my WeakKeyDictionary as a weak dependent in WeakArray class, so that after each GC #finalizeValues would be sent to my dictionary, which would clean up stale entries, which is likely the thing I was expected to care about. But maybe I missed another piece of the puzzle.
Frankly, I do not wish to reverse engineer how the WeakRegistry works before I can use a WeakKeyDictionary. For that, the workings of WeakRegistry are not simple or well-described enough and it apparently needs some VM knowledge.
Kind regards, Jakob
Thank you Levente!
If the VM feature is not really used and ephemerons are not used yet, what mechanism is currently used? Are we back to the old, slow implementation that is mentioned in the link that you gave?
Am Sa., 28. Mai 2022 um 13:25 Uhr schrieb Levente Uzonyi < leves@caesar.elte.hu>:
On Fri, 27 May 2022, Jakob Reschke wrote:
- Which VM feature is the WeakRegistry class comment referring to; is it
the semaphore used on the class side of WeakArray; and is it relevant to the use of WeakKeyDictionary?
One[1] that has never really been used in Squeak and has been obsoleted by ephemerons. Even though we do not use ephemerons for finalization yet.
[...]
[1] https://web.archive.org/web/20210608170146/https://bugs.squeak.org/view.php?...
Hi Jakob, hi Levente,
Whenever I read the class comment of WeakKeyDictionary, I know that I have to take care of something, but I do not know what. Can we improve the comment please to tell the readers more directly what is expected of them?
- Must one ensure that #finalizeValues is sent to one's
WeakKeyDictionary instance regularly?
Yes. Either that, or you need to otherwise somehow remove the garbage collected entries from its internal array.
Otherwise, the WeakKeyDictionary will only ever grow in size internally, with a bunch of nil keys. Here's a short script showing the mechanics and assumptions of this:
|d| d:={'hello' copy -> 'world'. 'hello' copy -> 'there'} as: WeakIdentityKeyDictionary. Smalltalk garbageCollect. self assert: (d includesKey: 'hello') not. "good" self assert: d notEmpty. "bad" self assert: d array asSet size > 1. "bad" d finalizeValues. "required!" self assert: d array asSet size = 1. "good" self assert: d isEmpty. "good"
Only if you are interested in finalization and want to do it your own way. And even then, only if you want to have your finalizer executed as soon as possible when a key of your dictionary has been garbage collected. So, if you just want a dictionary with weak keys, the answer is no.
- Should one register the dictionary instance via WeakArray
addWeakDependent:?
Same as above. I'm not even sure it would work with WeakKeyDictionaries. It's intended to work with WeakRegistries, which are the main providers of finalization.
I believe it would. Check out WeakArray class>>#finalizationProcess and, perhaps doing so would allow those WeakKey dictionary's to clean themselves up automatically so you don't have to time sending #finalizeValues yourself. But, I don't trust it, because it's hard to tell, because #finalizationProcess waits on FinalizationSemaphore, which is an opaque VM-controlled semaphore which I have no idea when it's signaled.
- Which VM feature is the WeakRegistry class comment referring to; is it
the semaphore used on the class side of WeakArray; and is it relevant to the use of WeakKeyDictionary?
One[1] that has never really been used in Squeak and has been obsoleted by ephemerons. Even though we do not use ephemerons for finalization yet.
- If the answers to the questions above do not give it away, what must
one do to use a WeakKeyDictionary safely?
The difference between a regular Dictionary and a WeakKeyDictionary is that the latter cannot hold nil as a key, and your associations may disappear when their keys are not referenced strongly.
They only disappear from the perspective of using the object API (excluding #isEmpty and #notEmpty as demonstrated by the script above). Internally, the former Association objects don't disappear, and still take up memory.
Igor made some alternative-implementation WeakKeyDictionary's for Magma back in 2007 which are faster than the one's built into Squeak to this day (last I checked, anyway). If interested, load "MaBase" from SqueakMap and then see MaWeakIdentityKeyDictionary.
Regards, Chris
Please replace the script in my prior email with this one. Even though it still illustrates the problem, the previous script has a conceptual bug because I was playing around with Identity vs. regular and forgot to correct it back.
|d| d:={('hello' copy) -> 'world'. 'hello2' copy -> 'there'} as: WeakKeyDictionary. Smalltalk garbageCollect. self assert: (d includesKey: obj1) not. self assert: d notEmpty. self assert: d array asSet size > 1. d finalizeValues. self assert: d array asSet size = 1. self assert: d isEmpty. self assert: (d reject: [ : each | each isNil ]) isEmpty.
On Tue, May 31, 2022 at 7:04 PM Chris Muller asqueaker@gmail.com wrote:
Hi Jakob, hi Levente,
Whenever I read the class comment of WeakKeyDictionary, I know that I have to take care of something, but I do not know what. Can we improve the comment please to tell the readers more directly what is expected of them?
- Must one ensure that #finalizeValues is sent to one's
WeakKeyDictionary instance regularly?
Yes. Either that, or you need to otherwise somehow remove the garbage collected entries from its internal array.
Otherwise, the WeakKeyDictionary will only ever grow in size internally, with a bunch of nil keys. Here's a short script showing the mechanics and assumptions of this:
|d| d:={'hello' copy -> 'world'. 'hello' copy -> 'there'} as: WeakIdentityKeyDictionary. Smalltalk garbageCollect. self assert: (d includesKey: 'hello') not. "good" self assert: d notEmpty. "bad" self assert: d array asSet size > 1. "bad" d finalizeValues. "required!" self assert: d array asSet size = 1. "good" self assert: d isEmpty. "good"
Only if you are interested in finalization and want to do it your own way. And even then, only if you want to have your finalizer executed as soon as possible when a key of your dictionary has been garbage collected. So, if you just want a dictionary with weak keys, the answer is no.
- Should one register the dictionary instance via WeakArray
addWeakDependent:?
Same as above. I'm not even sure it would work with WeakKeyDictionaries. It's intended to work with WeakRegistries, which are the main providers of finalization.
I believe it would. Check out WeakArray class>>#finalizationProcess and, perhaps doing so would allow those WeakKey dictionary's to clean themselves up automatically so you don't have to time sending #finalizeValues yourself. But, I don't trust it, because it's hard to tell, because #finalizationProcess waits on FinalizationSemaphore, which is an opaque VM-controlled semaphore which I have no idea when it's signaled.
- Which VM feature is the WeakRegistry class comment referring to; is
it the semaphore used on the class side of WeakArray; and is it relevant to the use of WeakKeyDictionary?
One[1] that has never really been used in Squeak and has been obsoleted by ephemerons. Even though we do not use ephemerons for finalization yet.
- If the answers to the questions above do not give it away, what must
one do to use a WeakKeyDictionary safely?
The difference between a regular Dictionary and a WeakKeyDictionary is that the latter cannot hold nil as a key, and your associations may disappear when their keys are not referenced strongly.
They only disappear from the perspective of using the object API (excluding #isEmpty and #notEmpty as demonstrated by the script above). Internally, the former Association objects don't disappear, and still take up memory.
Igor made some alternative-implementation WeakKeyDictionary's for Magma back in 2007 which are faster than the one's built into Squeak to this day (last I checked, anyway). If interested, load "MaBase" from SqueakMap and then see MaWeakIdentityKeyDictionary.
Regards, Chris
Argh! This one: :)
|d| d:={('hello' copy) -> 'world'. 'hello2' copy -> 'there'} as: WeakKeyDictionary. Smalltalk garbageCollect. self assert: (d includesKey: 'hello') not. "good" self assert: d notEmpty. "bad" self assert: d array asSet size > 1. "bad" d finalizeValues. "required!" self assert: d array asSet size = 1. "good" self assert: d isEmpty. "good" self assert: (d reject: [ : each | each isNil ]) isEmpty. "alternative to finalizeValues"
On Tue, May 31, 2022 at 7:12 PM Chris Muller asqueaker@gmail.com wrote:
Please replace the script in my prior email with this one. Even though it still illustrates the problem, the previous script has a conceptual bug because I was playing around with Identity vs. regular and forgot to correct it back.
|d| d:={('hello' copy) -> 'world'. 'hello2' copy -> 'there'} as: WeakKeyDictionary. Smalltalk garbageCollect. self assert: (d includesKey: obj1) not. self assert: d notEmpty. self assert: d array asSet size > 1. d finalizeValues. self assert: d array asSet size = 1. self assert: d isEmpty. self assert: (d reject: [ : each | each isNil ]) isEmpty.
On Tue, May 31, 2022 at 7:04 PM Chris Muller asqueaker@gmail.com wrote:
Hi Jakob, hi Levente,
Whenever I read the class comment of WeakKeyDictionary, I know that I have to take care of something, but I do not know what. Can we improve the comment please to tell the readers more directly what is expected of them?
- Must one ensure that #finalizeValues is sent to one's
WeakKeyDictionary instance regularly?
Yes. Either that, or you need to otherwise somehow remove the garbage collected entries from its internal array.
Otherwise, the WeakKeyDictionary will only ever grow in size internally, with a bunch of nil keys. Here's a short script showing the mechanics and assumptions of this:
|d| d:={'hello' copy -> 'world'. 'hello' copy -> 'there'} as: WeakIdentityKeyDictionary. Smalltalk garbageCollect. self assert: (d includesKey: 'hello') not. "good" self assert: d notEmpty. "bad" self assert: d array asSet size > 1. "bad" d finalizeValues. "required!" self assert: d array asSet size = 1. "good" self assert: d isEmpty. "good"
Only if you are interested in finalization and want to do it your own way. And even then, only if you want to have your finalizer executed as soon as possible when a key of your dictionary has been garbage collected. So, if you just want a dictionary with weak keys, the answer is no.
- Should one register the dictionary instance via WeakArray
addWeakDependent:?
Same as above. I'm not even sure it would work with WeakKeyDictionaries. It's intended to work with WeakRegistries, which are the main providers of finalization.
I believe it would. Check out WeakArray class>>#finalizationProcess and, perhaps doing so would allow those WeakKey dictionary's to clean themselves up automatically so you don't have to time sending #finalizeValues yourself. But, I don't trust it, because it's hard to tell, because #finalizationProcess waits on FinalizationSemaphore, which is an opaque VM-controlled semaphore which I have no idea when it's signaled.
- Which VM feature is the WeakRegistry class comment referring to; is
it the semaphore used on the class side of WeakArray; and is it relevant to the use of WeakKeyDictionary?
One[1] that has never really been used in Squeak and has been obsoleted by ephemerons. Even though we do not use ephemerons for finalization yet.
- If the answers to the questions above do not give it away, what must
one do to use a WeakKeyDictionary safely?
The difference between a regular Dictionary and a WeakKeyDictionary is that the latter cannot hold nil as a key, and your associations may disappear when their keys are not referenced strongly.
They only disappear from the perspective of using the object API (excluding #isEmpty and #notEmpty as demonstrated by the script above). Internally, the former Association objects don't disappear, and still take up memory.
Igor made some alternative-implementation WeakKeyDictionary's for Magma back in 2007 which are faster than the one's built into Squeak to this day (last I checked, anyway). If interested, load "MaBase" from SqueakMap and then see MaWeakIdentityKeyDictionary.
Regards, Chris
Hi Chris,
On Tue, 31 May 2022, Chris Muller wrote:
Hi Jakob, hi Levente,
> Whenever I read the class comment of WeakKeyDictionary, I know that I have to take care of something, but I do not know what. Can we improve the comment please to tell the readers more directly what is expected of them? > > - Must one ensure that #finalizeValues is sent to one's WeakKeyDictionary instance regularly?
Yes. Either that, or you need to otherwise somehow remove the garbage collected entries from its internal array.
Otherwise, the WeakKeyDictionary will only ever grow in size internally, with a bunch of nil keys. Here's a short script showing the mechanics and assumptions of this:
No, that is not the case. Associations with nil keys get removed automatically, which is why you cannot have nil keys in WeakKeyDictionaries (and because there was no demand to implement something like what has been done for Sets).
What you mean is that associations do not get removed immediately. Here's is a snippet demonstrating the situation by only sending #at:put: to the dictionary and always with a new key:
| w | w := WeakKeyDictionary new. 1000 timesRepeat: [ w at: Object new put: 0 ]. Transcript open; showln: { w size. w slowSize. w capacity }. w capacity - w size timesRepeat: [ w at: Object new put: 0 ]. Smalltalk garbageCollectMost. w at: Object new put: 0. Smalltalk garbageCollectMost. Transcript showln: { w size. w slowSize. w capacity }.
The following lines should appear on the transcript: #(1000 0 1817) #(1 0 2)
The dictionary will eventually shrink to the smallest possible capacity, 2, holding one association to be garbage collected and no associations with a valid key.
As with all weak hashed collections, #isEmpty, #size, #notEmpty are not reliable because the entries may vanish any time, so there's no effort to even try to make those correct. If you need accurate information, you must check against #slowSize, which means O(n) runtime. Using ephemerons may help with that.
|d| d:={'hello' copy -> 'world'. 'hello' copy -> 'there'} as: WeakIdentityKeyDictionary. Smalltalk garbageCollect. self assert: (d includesKey: 'hello') not. "good" self assert: d notEmpty. "bad" self assert: d array asSet size > 1. "bad" d finalizeValues. "required!" self assert: d array asSet size = 1. "good" self assert: d isEmpty. "good" Only if you are interested in finalization and want to do it your own way. And even then, only if you want to have your finalizer executed as soon as possible when a key of your dictionary has been garbage collected. So, if you just want a dictionary with weak keys, the answer is no.
> - Should one register the dictionary instance via WeakArray addWeakDependent:? Same as above. I'm not even sure it would work with WeakKeyDictionaries. It's intended to work with WeakRegistries, which are the main providers of finalization.
I believe it would. Check out WeakArray class>>#finalizationProcess and, perhaps doing so would allow those WeakKey dictionary's to clean themselves up automatically so you don't have to time sending #finalizeValues yourself. But, I don't trust it, because it's hard to tell, because #finalizationProcess waits on FinalizationSemaphore, which is an opaque VM-controlled semaphore which I have no idea when it's signaled. > - Which VM feature is the WeakRegistry class comment referring to; is it the semaphore used on the class side of WeakArray; and is it relevant to the use of WeakKeyDictionary?
One[1] that has never really been used in Squeak and has been obsoleted by ephemerons. Even though we do not use ephemerons for finalization yet. > - If the answers to the questions above do not give it away, what must one do to use a WeakKeyDictionary safely? The difference between a regular Dictionary and a WeakKeyDictionary is that the latter cannot hold nil as a key, and your associations may disappear when their keys are not referenced strongly.
They only disappear from the perspective of using the object API (excluding #isEmpty and #notEmpty as demonstrated by the script above). Internally, the former Association objects don't disappear, and still take up memory.
Igor made some alternative-implementation WeakKeyDictionary's for Magma back in 2007 which are faster than the one's built into Squeak to this day (last I checked, anyway). If interested, load "MaBase" from SqueakMap and then see MaWeakIdentityKeyDictionary.
Is it really faster? Here's a basic benchmark:
[ | m | Smalltalk garbageCollect. m := MaDictionary new. { #insert -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply put: i ] ] timeToRun. #access -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply ] ] timeToRun. #update -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply put: i + 1 ] ] timeToRun. #positiveLookup -> [ 1 to: 1000000 do: [ :i | m includesKey: i hashMultiply ] ] timeToRun. #negativeLookup -> [ 1000001 to: 2000000 do: [ :i | m includesKey: i hashMultiply ] ] timeToRun. #remove -> [ 1 to: 1000000 do: [ :i | m removeKey: i hashMultiply ] ] timeToRun } ] value.
"{#insert->535 . #access->1062 . #update->250 . #positiveLookup->967 . #negativeLookup->1055 . #remove->2353}" Even with MaDictionary >> #checkForUnderflow: patched to avoid the creation of Fractions.
The same benchmark for Dictionary yields {#insert->469 . #access->945 . #update->198 . #positiveLookup->174 . #negativeLookup->301 . #remove->719} on my machine.
Levente
Hi Levente,
> - Must one ensure that #finalizeValues is sent to one's WeakKeyDictionary instance regularly?
Yes. Either that, or you need to otherwise somehow remove the garbage collected entries from its internal array.
Otherwise, the WeakKeyDictionary will only ever grow in size internally, with a bunch of nil keys. Here's a short script showing the mechanics and assumptions of this:
No, that is not the case. Associations with nil keys get removed automatically, which is why you cannot have nil keys in WeakKeyDictionaries (and because there was no demand to implement something like what has been done for Sets).
What you mean is that associations do not get removed immediately. Here's is a snippet demonstrating the situation by only sending #at:put: to the dictionary and always with a new key:
| w | w := WeakKeyDictionary new. 1000 timesRepeat: [ w at: Object new put: 0 ]. Transcript open; showln: { w size. w slowSize. w capacity }. w capacity - w size timesRepeat: [ w at: Object new put: 0 ]. Smalltalk garbageCollectMost. w at: Object new put: 0. Smalltalk garbageCollectMost. Transcript showln: { w size. w slowSize. w capacity }.
The following lines should appear on the transcript: #(1000 0 1817) #(1 0 2)
Okay, that must be new. In 5.3, I still get:
#(1000 1000 2423) #(2424 0 5119)
but, in trunk, yes! That's great, thanks for letting me know, but I'm having trouble identifying _why_ it works like that in trunk -- what change actually enabled that? It's something Igor and I worked on back in 2010 (I incorrectly said 2007, before), but it involved a VM change we couldn't get pushed through.
The dictionary will eventually shrink to the smallest possible capacity, 2,
Wow, even smaller than the default initial capacity, 3, now there's a pleasant surprise. ;) Does that mean if I do a "WeakKeyDictionary new" and never add any entries to it, its capacity will eventually shrink to 2?
holding one association to be garbage collected and no associations with a valid key.
As with all weak hashed collections, #isEmpty, #size, #notEmpty are not reliable because the entries may vanish any time, so there's no effort to even try to make those correct.
Hm, Marcel fixed #isEmpty for Weak collections in 2020. There are contexts where those methods are useful even with the understanding of the potential inaccuracy. Wouldn't you consider #at: and #includesKey: to also be affected (e.g., "not reliable") by this logic, as well?
If you need accurate information, you must check against #slowSize, which means O(n) runtime. Using ephemerons may help with that.
I hope to one day understand ephemerons and why they're useful, because I sense they provide some kind of magic that could help my stack. But, I never grok'd them so far.
Igor made some alternative-implementation WeakKeyDictionary's for Magma back in 2007 which are faster than the one's built into Squeak to this day (last I checked, anyway). If interested, load "MaBase" from SqueakMap and then see MaWeakIdentityKeyDictionary.
Is it really faster? Here's a basic benchmark:
[ | m | Smalltalk garbageCollect. m := MaDictionary new. { #insert -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply put: i ] ] timeToRun. #access -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply ] ] timeToRun. #update -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply put: i + 1 ] ] timeToRun. #positiveLookup -> [ 1 to: 1000000 do: [ :i | m includesKey: i hashMultiply ] ] timeToRun. #negativeLookup -> [ 1000001 to: 2000000 do: [ :i | m includesKey: i hashMultiply ] ] timeToRun. #remove -> [ 1 to: 1000000 do: [ :i | m removeKey: i hashMultiply ] ] timeToRun } ] value.
"{#insert->535 . #access->1062 . #update->250 . #positiveLookup->967 . #negativeLookup->1055 . #remove->2353}" The same benchmark for Dictionary yields {#insert->469 . #access->945 . #update->198 . #positiveLookup->174 . #negativeLookup->301 . #remove->719} on my machine. Even with MaDictionary >> #checkForUnderflow: patched to avoid the creation of Fractions.
Yeah, one Fraction per resize isn't going to be noticeable, but thanks for pointing it out, I patched it.
MaDictionary is actually just an abstract superclass Igor made to contain his work, it wasn't meant to be used. The real purpose of his work was the replacement of Squeak's woeful WeakKeyIdentityDictionary from the days when the identityHash was only 12-bits. Your bench didn't have any collisions, but that's what the Ma variant's optimized for.
But Squeak has come a long way since the 12-bit identityHash and #hashMultiply, so the benefit is not nearly as pronounced, but I enhanced your basic bench to try to more-closely emulate how Magma would be using them for, and shows that it's still a _little_ (albeit, not much) better than WeakKeyIdentityDictionary with #at: and #at:put:...
{ WeakIdentityKeyDictionary. MaWeakIdentityKeyDictionary} collect: [ : cls | [ | m r o testSize | testSize := 1e6. o := Object new. r := (1 to: testSize) collect: [ : e | Object new ]. r := r shuffled. "mix up identityHash's to simulate real-world access, just in case consecutive creation above might artificially help the hash algo" m := cls new. Smalltalk garbageCollect. { m class -> testSize. #insert -> [ r do: [ :i | m at: i put: i ] ] timeToRun. r := r shuffled. #access -> [ r do: [ :i | m at: i ] ] timeToRun. r := r shuffled. #negativeaccess -> [ r do: [ :i | m at: o ifAbsent: [nil] ] ] timeToRun. r := r shuffled. #update -> [ r do: [ :i | m at: i put: 0 ] ] timeToRun. r := r shuffled. #positiveLookup -> [ r do: [ :i | m includesKey: i ] ] timeToRun. r := r shuffled. #negativeLookup -> [ 1000001 to: 2000000 do: [ :i | m includesKey: o ] ] timeToRun. } ] value select: [ : e | e isVariableBinding ] ].
{{WeakIdentityKeyDictionary->1000000 . #insert->1830 . #access->1260 . #negativeaccess->705 . #update->532 . #positiveLookup->524 . #negativeLookup->42} . {MaWeakIdentityKeyDictionary->1000000 . #insert->1286 . #access->1170 . #negativeaccess->739 . #update->435 . #positiveLookup->1131 . #negativeLookup->800}}
I never noticed #includesKey: to be so slow on Ma (especially the negative case, wow!), and I think Magma does use that in one place, but most of the usage is for #at: and #at:put: which get even better as the sizes get beyond 1M elements.
But with 6.0's new auto-cleaning, I will certainly retire these Ma dictionary's!
I always learn useful stuff from you, thanks a lot for your help.
- Chris
Hi Chris,
On Mon, 6 Jun 2022, Chris Muller wrote:
Hi Levente,
> - Must one ensure that #finalizeValues is sent to one's WeakKeyDictionary instance regularly?
Yes. Either that, or you need to otherwise somehow remove the garbage collected entries from its internal array.
Otherwise, the WeakKeyDictionary will only ever grow in size internally, with a bunch of nil keys. Here's a short script showing the mechanics and assumptions of this:
No, that is not the case. Associations with nil keys get removed automatically, which is why you cannot have nil keys in WeakKeyDictionaries (and because there was no demand to implement something like what has been done for Sets).
What you mean is that associations do not get removed immediately. Here's is a snippet demonstrating the situation by only sending #at:put: to the dictionary and always with a new key:
| w | w := WeakKeyDictionary new. 1000 timesRepeat: [ w at: Object new put: 0 ]. Transcript open; showln: { w size. w slowSize. w capacity }. w capacity - w size timesRepeat: [ w at: Object new put: 0 ]. Smalltalk garbageCollectMost. w at: Object new put: 0. Smalltalk garbageCollectMost. Transcript showln: { w size. w slowSize. w capacity }.
The following lines should appear on the transcript: #(1000 0 1817) #(1 0 2)
Okay, that must be new. In 5.3, I still get:
#(1000 1000 2423) #(2424 0 5119)
but, in trunk, yes! That's great, thanks for letting me know, but I'm having trouble identifying _why_ it works like that in trunk -- what change actually enabled that? It's something Igor and I worked on
Well, it works in 5.3 as well, but the snippet I sent assumes that #capacity returns the capacity - the number of elements the collection can hold without growing. For 5.3 and before the corresponding snippet is:
w := WeakKeyDictionary new. 1000 timesRepeat: [ w at: Object new put: 0 ]. Transcript open; showln: { w size. w slowSize. w capacity }. w capacity * 3 // 4 - w size timesRepeat: [ w at: Object new put: 0 ]. Smalltalk garbageCollectMost. w at: Object new put: 0. Smalltalk garbageCollectMost. Transcript showln: { w size. w slowSize. w capacity }.
back in 2010 (I incorrectly said 2007, before), but it involved a VM change we couldn't get pushed through.
The dictionary will eventually shrink to the smallest possible capacity, 2,
Wow, even smaller than the default initial capacity, 3, now there's a pleasant surprise. ;) Does that mean if I do a "WeakKeyDictionary new" and never add any entries to it, its capacity will eventually shrink to 2?
No, it won't. You need to trigger the automatic resizing of the dictionary, and it's pretty hard without adding new entries to it.
holding one association to be garbage collected and no associations with a valid key.
As with all weak hashed collections, #isEmpty, #size, #notEmpty are not reliable because the entries may vanish any time, so there's no effort to even try to make those correct.
Hm, Marcel fixed #isEmpty for Weak collections in 2020. There are contexts where those methods are useful even with the understanding of the potential inaccuracy. Wouldn't you consider #at: and
Thanks, now I remember that #isEmpty and #notEmpty work as expected.
#includesKey: to also be affected (e.g., "not reliable") by this logic, as well?
No, #includesKey: is exact in the sense that while the method was being executed the result matches the status of the requested key. Though a GC may happen right after #includesKey: returns rendering the result incorrect.
If you need accurate information, you must check against #slowSize, which means O(n) runtime. Using ephemerons may help with that.
I hope to one day understand ephemerons and why they're useful, because I sense they provide some kind of magic that could help my stack. But, I never grok'd them so far.
Igor made some alternative-implementation WeakKeyDictionary's for Magma back in 2007 which are faster than the one's built into Squeak to this day (last I checked, anyway). If interested, load "MaBase" from SqueakMap and then see MaWeakIdentityKeyDictionary.
Is it really faster? Here's a basic benchmark:
[ | m | Smalltalk garbageCollect. m := MaDictionary new. { #insert -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply put: i ] ] timeToRun. #access -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply ] ] timeToRun. #update -> [ 1 to: 1000000 do: [ :i | m at: i hashMultiply put: i + 1 ] ] timeToRun. #positiveLookup -> [ 1 to: 1000000 do: [ :i | m includesKey: i hashMultiply ] ] timeToRun. #negativeLookup -> [ 1000001 to: 2000000 do: [ :i | m includesKey: i hashMultiply ] ] timeToRun. #remove -> [ 1 to: 1000000 do: [ :i | m removeKey: i hashMultiply ] ] timeToRun } ] value.
"{#insert->535 . #access->1062 . #update->250 . #positiveLookup->967 . #negativeLookup->1055 . #remove->2353}" The same benchmark for Dictionary yields {#insert->469 . #access->945 . #update->198 . #positiveLookup->174 . #negativeLookup->301 . #remove->719} on my machine. Even with MaDictionary >> #checkForUnderflow: patched to avoid the creation of Fractions.
Yeah, one Fraction per resize isn't going to be noticeable, but thanks for pointing it out, I patched it.
I noticed it by running #timeProfile and it showed up with IIRC 20% of the runtime. Creating a fraction is not cheap, but I think the resizing algorithm of MaDictionary is broken.
I created MaDictionary2, which uses a different hash function, a different table size, and a conservative resize-strategy. The performance difference is amazing, give it a try: https://leves.web.elte.hu/squeak/Ma-Collections-ul.169.mcz
I think the difference comes from better cache locality, but it's just a wild guess. I have no idea how it became so fast even though it lacks the typical micro-optimizations one would apply there.
Levente
MaDictionary is actually just an abstract superclass Igor made to contain his work, it wasn't meant to be used. The real purpose of his work was the replacement of Squeak's woeful WeakKeyIdentityDictionary from the days when the identityHash was only 12-bits. Your bench didn't have any collisions, but that's what the Ma variant's optimized for.
But Squeak has come a long way since the 12-bit identityHash and #hashMultiply, so the benefit is not nearly as pronounced, but I enhanced your basic bench to try to more-closely emulate how Magma would be using them for, and shows that it's still a _little_ (albeit, not much) better than WeakKeyIdentityDictionary with #at: and #at:put:...
{ WeakIdentityKeyDictionary. MaWeakIdentityKeyDictionary} collect: [ : cls | [ | m r o testSize | testSize := 1e6. o := Object new. r := (1 to: testSize) collect: [ : e | Object new ]. r := r shuffled. "mix up identityHash's to simulate real-world access, just in case consecutive creation above might artificially help the hash algo" m := cls new. Smalltalk garbageCollect. { m class -> testSize. #insert -> [ r do: [ :i | m at: i put: i ] ] timeToRun. r := r shuffled. #access -> [ r do: [ :i | m at: i ] ] timeToRun. r := r shuffled. #negativeaccess -> [ r do: [ :i | m at: o ifAbsent: [nil] ] ] timeToRun. r := r shuffled. #update -> [ r do: [ :i | m at: i put: 0 ] ] timeToRun. r := r shuffled. #positiveLookup -> [ r do: [ :i | m includesKey: i ] ] timeToRun. r := r shuffled. #negativeLookup -> [ 1000001 to: 2000000 do: [ :i | m includesKey: o ] ] timeToRun. } ] value select: [ : e | e isVariableBinding ] ].
{{WeakIdentityKeyDictionary->1000000 . #insert->1830 . #access->1260 . #negativeaccess->705 . #update->532 . #positiveLookup->524 . #negativeLookup->42} . {MaWeakIdentityKeyDictionary->1000000 . #insert->1286 . #access->1170 . #negativeaccess->739 . #update->435 . #positiveLookup->1131 . #negativeLookup->800}}
I never noticed #includesKey: to be so slow on Ma (especially the negative case, wow!), and I think Magma does use that in one place, but most of the usage is for #at: and #at:put: which get even better as the sizes get beyond 1M elements.
But with 6.0's new auto-cleaning, I will certainly retire these Ma dictionary's!
I always learn useful stuff from you, thanks a lot for your help.
- Chris
Hi,
So to sum up: if the WeakKeyDictionary gets new elements added frequently, causing it to consider growing every once in a while, it may not be necessary to register it with WeakArray. On the other hand, if the dictionary is seldom modified, one can register it with WeakArray to get stale associations collected "as soon as possible." Alternatively I should consider sending #finalizeValues by myself as regularly as it makes sense for my application. Is that right?
In my latest use case I wanted to use a WeakIdentityKeyDictionary to hold certain "on-demand inst vars" for other kinds of objects (like the DependentsFields dictionary in Object, which adds a list of dependents to any kind of object on demand). So in general I would like each on-demand value to be collected when the object to which it belongs gets collected. But I expect that new keys will be added only in rare circumstances (except during the unit tests), and that the keys will be garbage-collected only in rare circumstances as well. The keys are expected to be PackageInfos from the package organizer, so changes come only when the extra info is accessed for a package for the first time, or when a package gets removed from the system. I tried turning my dictionary from an IdentityDictionary to a WeakIdentityKeyDictionary (with #newFrom:) without registering, and it kept lots of garbage elements, which were previously added during repeated test runs. Soon after registering it with WeakArray, most elements were removed. On the other hand, now most of the invocations to #finalizeValues go to waste because as mentioned, packages do not disappear frequently under normal circumstances. Hmm...
Kind regards, Jakob
Am Mo., 6. Juni 2022 um 11:00 Uhr schrieb Levente Uzonyi < leves@caesar.elte.hu>:
Hi Chris,
On Mon, 6 Jun 2022, Chris Muller wrote:
Hi Levente,
> - Must one ensure that #finalizeValues is sent to one's
WeakKeyDictionary instance regularly?
Yes. Either that, or you need to otherwise somehow remove the garbage
collected entries from its internal array.
Otherwise, the WeakKeyDictionary will only ever grow in size
internally, with a bunch of nil keys. Here's a short script showing the mechanics and assumptions of this:
No, that is not the case. Associations with nil keys get removed automatically, which is why you cannot have nil keys in WeakKeyDictionaries (and because there was no demand to implement something like what has been done for Sets).
What you mean is that associations do not get removed immediately. Here's is a snippet demonstrating the situation by only sending #at:put: to the dictionary and always with a new key:
| w | w := WeakKeyDictionary new. 1000 timesRepeat: [ w at: Object new put: 0 ]. Transcript open; showln: { w size. w slowSize. w capacity }. w capacity - w size timesRepeat: [ w at: Object new put: 0 ]. Smalltalk garbageCollectMost. w at: Object new put: 0. Smalltalk garbageCollectMost. Transcript showln: { w size. w slowSize. w capacity }.
The following lines should appear on the transcript: #(1000 0 1817) #(1 0 2)
Okay, that must be new. In 5.3, I still get:
#(1000 1000 2423) #(2424 0 5119)
but, in trunk, yes! That's great, thanks for letting me know, but I'm having trouble identifying _why_ it works like that in trunk -- what change actually enabled that? It's something Igor and I worked on
Well, it works in 5.3 as well, but the snippet I sent assumes that #capacity returns the capacity - the number of elements the collection can hold without growing. For 5.3 and before the corresponding snippet is:
w := WeakKeyDictionary new. 1000 timesRepeat: [ w at: Object new put: 0 ]. Transcript open; showln: { w size. w slowSize. w capacity }. w capacity * 3 // 4 - w size timesRepeat: [ w at: Object new put: 0 ]. Smalltalk garbageCollectMost. w at: Object new put: 0. Smalltalk garbageCollectMost. Transcript showln: { w size. w slowSize. w capacity }.
back in 2010 (I incorrectly said 2007, before), but it involved a VM change we couldn't get pushed through.
The dictionary will eventually shrink to the smallest possible capacity, 2,
Wow, even smaller than the default initial capacity, 3, now there's a pleasant surprise. ;) Does that mean if I do a "WeakKeyDictionary new" and never add any entries to it, its capacity will eventually shrink to 2?
No, it won't. You need to trigger the automatic resizing of the dictionary, and it's pretty hard without adding new entries to it.
holding one association to be garbage collected and no associations with a valid key.
As with all weak hashed collections, #isEmpty, #size, #notEmpty are not reliable because the entries may vanish any time, so there's no effort
to
even try to make those correct.
Hm, Marcel fixed #isEmpty for Weak collections in 2020. There are contexts where those methods are useful even with the understanding of the potential inaccuracy. Wouldn't you consider #at: and
Thanks, now I remember that #isEmpty and #notEmpty work as expected.
#includesKey: to also be affected (e.g., "not reliable") by this logic, as well?
No, #includesKey: is exact in the sense that while the method was being executed the result matches the status of the requested key. Though a GC may happen right after #includesKey: returns rendering the result incorrect.
If you need accurate information, you must check against #slowSize,
which
means O(n) runtime. Using ephemerons may help with that.
I hope to one day understand ephemerons and why they're useful, because I sense they provide some kind of magic that could help my stack. But, I never grok'd them so far.
Igor made some alternative-implementation WeakKeyDictionary's for
Magma back in 2007 which are faster than the one's built into Squeak to this day (last I checked, anyway). If interested, load "MaBase" from SqueakMap and
then see MaWeakIdentityKeyDictionary.
Is it really faster? Here's a basic benchmark:
[ | m | Smalltalk garbageCollect. m := MaDictionary new. { #insert -> [ 1 to: 1000000 do: [ :i | m at: i
hashMultiply put: i ] ] timeToRun.
#access -> [ 1 to: 1000000 do: [ :i | m at: i
hashMultiply ] ] timeToRun.
#update -> [ 1 to: 1000000 do: [ :i | m at: i
hashMultiply put: i + 1 ] ] timeToRun.
#positiveLookup -> [ 1 to: 1000000 do: [ :i | m
includesKey: i hashMultiply ] ] timeToRun.
#negativeLookup -> [ 1000001 to: 2000000 do: [ :i | m
includesKey: i hashMultiply ] ] timeToRun.
#remove -> [ 1 to: 1000000 do: [ :i | m removeKey: i
hashMultiply ] ] timeToRun
} ] value.
"{#insert->535 . #access->1062 . #update->250 . #positiveLookup->967 .
#negativeLookup->1055 . #remove->2353}"
The same benchmark for Dictionary yields {#insert->469 . #access->945 . #update->198 . #positiveLookup->174 .
#negativeLookup->301 . #remove->719} on my machine.
Even with MaDictionary >> #checkForUnderflow: patched to avoid the
creation of Fractions.
Yeah, one Fraction per resize isn't going to be noticeable, but thanks for pointing it out, I patched it.
I noticed it by running #timeProfile and it showed up with IIRC 20% of the runtime. Creating a fraction is not cheap, but I think the resizing algorithm of MaDictionary is broken.
I created MaDictionary2, which uses a different hash function, a different table size, and a conservative resize-strategy. The performance difference is amazing, give it a try: https://leves.web.elte.hu/squeak/Ma-Collections-ul.169.mcz
I think the difference comes from better cache locality, but it's just a wild guess. I have no idea how it became so fast even though it lacks the typical micro-optimizations one would apply there.
Levente
MaDictionary is actually just an abstract superclass Igor made to contain his work, it wasn't meant to be used. The real purpose of his work was the replacement of Squeak's woeful WeakKeyIdentityDictionary from the days when the identityHash was only 12-bits. Your bench didn't have any collisions, but that's what the Ma variant's optimized for.
But Squeak has come a long way since the 12-bit identityHash and #hashMultiply, so the benefit is not nearly as pronounced, but I enhanced your basic bench to try to more-closely emulate how Magma would be using them for, and shows that it's still a _little_ (albeit, not much) better than WeakKeyIdentityDictionary with #at: and #at:put:...
{ WeakIdentityKeyDictionary. MaWeakIdentityKeyDictionary} collect: [ : cls | [ | m r o testSize | testSize := 1e6. o := Object new. r := (1 to: testSize) collect: [ : e | Object new ]. r := r shuffled. "mix up identityHash's to simulate real-world access, just in case consecutive creation above might artificially help the hash algo" m := cls new. Smalltalk garbageCollect. { m class -> testSize. #insert -> [ r do: [ :i | m at: i put: i ] ] timeToRun. r := r shuffled. #access -> [ r do: [ :i | m at: i ] ] timeToRun. r := r shuffled. #negativeaccess -> [ r do: [ :i | m at: o ifAbsent: [nil] ] ] timeToRun. r := r shuffled. #update -> [ r do: [ :i | m at: i put: 0 ] ] timeToRun. r := r shuffled. #positiveLookup -> [ r do: [ :i | m includesKey: i ] ] timeToRun. r := r shuffled. #negativeLookup -> [ 1000001 to: 2000000 do: [ :i | m includesKey: o ] ] timeToRun. } ] value select: [ : e | e isVariableBinding ] ].
{{WeakIdentityKeyDictionary->1000000 . #insert->1830 . #access->1260 . #negativeaccess->705 . #update->532 . #positiveLookup->524 . #negativeLookup->42} . {MaWeakIdentityKeyDictionary->1000000 . #insert->1286 . #access->1170 . #negativeaccess->739 . #update->435 . #positiveLookup->1131 . #negativeLookup->800}}
I never noticed #includesKey: to be so slow on Ma (especially the negative case, wow!), and I think Magma does use that in one place, but most of the usage is for #at: and #at:put: which get even better as the sizes get beyond 1M elements.
But with 6.0's new auto-cleaning, I will certainly retire these Ma
dictionary's!
I always learn useful stuff from you, thanks a lot for your help.
- Chris
Hi Jakob,
On Mon, 6 Jun 2022, Jakob Reschke wrote:
Hi, So to sum up: if the WeakKeyDictionary gets new elements added frequently, causing it to consider growing every once in a while, it may not be necessary to register it with WeakArray. On the other hand, if the dictionary is
They don't need to grow. It's enough if they reach and go over their full capacity. They may even shrink afterwards as my example showed.
seldom modified, one can register it with WeakArray to get stale associations collected "as soon as possible." Alternatively I should consider sending #finalizeValues by myself as regularly as it makes sense for my application. Is that right?
I would still refrain from registering a WeakKeyDictionary to be triggered by the finalization process, because #finalizeValues will be sent by a different process, and Dictionaries are not thread-safe, so race conditions may occur. WeakRegistries are safe to register with the finalization rpocess, because they are thread-safe.
In my latest use case I wanted to use a WeakIdentityKeyDictionary to hold certain "on-demand inst vars" for other kinds of objects (like the DependentsFields dictionary in Object, which adds a list of dependents to any kind of object on demand). So in general I would like each on-demand value to be collected when the object to which it belongs gets collected. But I expect that new keys will be added only in rare circumstances (except during the unit tests), and that the keys will be garbage-collected only in rare circumstances as well. The keys are expected to be PackageInfos from the package organizer, so changes come only when the extra info is accessed for a package for the first time, or when a package gets removed from the system. I tried turning my dictionary from an IdentityDictionary to a WeakIdentityKeyDictionary (with #newFrom:) without registering, and it kept lots of garbage elements, which were previously added during repeated test runs. Soon after registering it with WeakArray, most elements were removed. On the other hand, now most of the invocations to #finalizeValues go to waste because as mentioned, packages do not disappear frequently under normal circumstances. Hmm...
Currently, the right thing to do is to use a WeakIdentityKeyDictionary and register a finalizer for each key (into the global WeakRegistry). Something like:
m := Mutex new. w := WeakIdentityKeyDictionary new. 1 to: 100 do: [ :onDemandInstVars | | pseudoPackage | pseudoPackage := Object new. m critical: [ w at: pseudoPackage put: onDemandInstVars ]. pseudoPackage toFinalizeSend: #cull: "Must be cull: to ignore the argument nil" to: [ m critical: [ w finalizeValues ] ] with: nil ]. before := w array count: #notNil. Smalltalk garbageCollectMost. after := w array count: #notNil. { before. after }
Levente
Hi Levente,
Currently, the right thing to do is to use a WeakIdentityKeyDictionary
and register a finalizer for each key (into the global WeakRegistry).
Something like:
m := Mutex new. w := WeakIdentityKeyDictionary new. 1 to: 100 do: [ :onDemandInstVars | | pseudoPackage | pseudoPackage := Object new. m critical: [ w at: pseudoPackage put: onDemandInstVars ]. pseudoPackage toFinalizeSend: #cull: "Must be cull: to ignore the argument nil" to: [ m critical: [ w finalizeValues ] ] with: nil ]. before := w array count: #notNil. Smalltalk garbageCollectMost. after := w array count: #notNil. { before. after }
Hmm. For learning or for glory, and because you didn't put quotes around this being the right thing to do, I feel the need to challenge it. :)
Unless I'm mistaken, this approach would mean finalizeValues will be called for every single element GC'd, which would be immensely wasteful.
Plus, having to register a finalizer for every key? That seems overly burdensome just to use a WeakKeyDictionary. It should "just work", but if it can't, the next best thing is to simply calling #finalizeValues manually.
| w lastClean | w := WeakIdentityKeyDictionary new. 1 to: 100 do: [ :onDemandInstVars | | pseudoPackage | pseudoPackage := Object new. w at: pseudoPackage put: onDemandInstVars ].
Done. Then, somewhere in the application just put the following line at an appropriate place in the application. Use the Time guard if necessary."
(Time millisecondsSince: lastClean) > 30000 ifTrue: [ w finalizeValues. lastClean := Time millisecondClockValue ].
Simple. Less code, fewer objects, less machine execution including no need for any Mutex serialization.
This is what I'd feel safe enough to call the "right thing to do", in quotes at least.. :)
- Chris
On Sun, Jun 5, 2022 at 10:51 PM Chris Muller ma.chris.m@gmail.com wrote:
[snip]
If you need accurate information, you must check against #slowSize, which means O(n) runtime. Using ephemerons may help with that.
I hope to one day understand ephemerons and why they're useful, because I sense they provide some kind of magic that could help my stack. But, I never grok'd them so far.
Let me see if I can explain them coherently. I'll start top-down.
Ephemerons are Associations that interact with the garbage collector; they kill two birds with one stone. First, they enable the collection of non-isolated cycles. Second, they enable the identification of objects that are finalizable.
Taking the first topic first, what's a non-isolated cycle? Before that let's understand what an isolated cycle is. An isolated cycle is just some cycle of objects that are not reachable from the system roots. The system roots are effectively the Smalltalk dictionary and the current context of the active process. For example, all runnable processes are reachable from Processor, the singleton instance of ProcessorScheduler, and it is reachable from Smalltalk since there is an association in Smalltalk with key #Processor and value the singleton instance of ProcessorScheduler. All classes and global variables are held onto by associations in the Smalltalk dictionary.
So an isolated cycle is some cycle that isn't reachable from Smalltalk or the current process. Let's create a couple.
| cyclicArrays | cyclicArrays := Array new: 1. cyclicArrays at: 1 put: (Array with: cyclicArrays). cyclicArrays := nil
This creates two arrays that point to each other. The first is reachable from the temporary cyclicArrays, until cyclicArrays is assigned nil, at which point the two arrays are no longer reachable from the current process (or the roots) and will; be collected soon.
Here's a more interesting example that comes up a lot.
| process | process := [Semaphore new wait] newProcess. process resume. process := nil
This reduces to:
[Semaphore new wait] fork
When the process waits on the Semaphore it blocks, because the semaphore has no excess signals. When the process blocks it is removed from the set of runnable processes and added to the list of processes waiting on the Semaphore (actually the Semaphore *is* that list). So there's now an isolated cycle of the process and the semaphore, and it will get collected soon. It is very similar to the cyclic arrays; just different classes and a few more objects, like the chain of contexts and the block with which the process was started, all wrapped by the process instance.
OK, so what's a non-isolated cycle? The classic example is that of some model object in MVC. In MVC each view adds itself as a dependent of the model. If the model inherits from Model then it will have its own dependents (an instance of Array, in its first instance variable). So if the SystemWindow containing the views is deleted (removed from the world's set of morphs) then we have another isolated cycle between the views which refer directly to the model, and the model's dependents which point back to the views. But if the model does *not* inherit from Model then its dependents are attached to it in the DependentsFields dictionary, a class variable of Object, and Object is in Smalltalk. So the DependentsFields dictionary is effectively a root.
Object methods for dependents addDependent: anObject "Make the given object one of the receiver's dependents."
| dependents | dependents := self dependents. (dependents includes: anObject) ifFalse: [self myDependents: (dependents copyWithDependent: anObject)]. ^ anObject
dependents "Answer a collection of objects that are 'dependent' on the receiver; that is, all objects that should be notified if the receiver changes."
^ self myDependents ifNil: [#()]
myDependents "Private. Answer a list of all the receiver's dependents."
^ DependentsFields at: self ifAbsent: []
Model methods for dependents myDependents ^ dependents
So if some object (not inheriting from Model) gets some views attached to it as dependents and these views refer to the model we have a cycle that is reachable from the roots via a strong association in DependentsFields. So this cycle will not be collected when the SystemWindow containing those views is deleted because there is still a reference to the object in the association in DependentsFields; the object is the key of the association. The array of dependents is the value of the association. Even if we were to make the array of dependents weak, this allows the views to be collected, but not the model object itself. This is a problem, and is the reason the release method exists:
Object methods for dependents release "Remove references to objects that may refer to the receiver. This message should be overridden by subclasses with any cycles, in which case the subclass should also include the expression super release."
self breakDependents
Now we understand the difference between isolated cycles and non-isolated cycles we can understand what an ephemeron is, or at least how it came to be invented. An ephemeron is an association designed to be used with dictionaries such as DependentsFields, which are used to add properties to objects. What we want is not to have to send release. We'd like the system to be able to figure out that the cycle between the object and its views would be isolated were it not for the reference from DependentsFields. Let's make sure we get the point here. When the views exist within a SystemWindow that is on-screen (part of the world's morphs) the model is reachable both from the world's morphs (transitive closure from the world's submorphs through the morphs' model inst var) *and* from DependentsFields. When the SystemWindow is deleted it is removed from the world's submorphs so the model and its cycle is *only* reachable from DependentsFields. Other than for DependentsFields the cycle would be isolated and would be collected. The reference from DependentsFields keeps the cycle around (prevents it from being collected).
But an ephemeron is a smart association, known to the garbage collector. When the garbage collector traverses the objects reachable from the roots and finds an ephemeron it doesn't follow the key or the value of the ephemeron unless the key has already been reached. The ephemeron acts like a guard. If its key has already been reached it is reachable from the roots, and there is no possibility of an isolated cycle existing, and in this case an ephemeron is exactly like an association, and does nothing special. But if the key hasn't been reached then the ephemeron may be referencing an isolated cycle.
Now because traversing the object graph is done one object at a time, when an ephemeron is reached which has an as-yet-unreached key the garbage collector doesn't know if the key might be reached later in the traversal following the path from some other root, so it puts the ephemeron to one side, in the set of ephemerons with as-yet-unreachable keys. Once the traversal is finished then it examines that set and discards any ephemerons in the set that do have reachable keys. These were objects reachable from the roots other than through ephemerons. Now the set contains only ephemerons whose keys are *only* reachable from ephemerons reachable from the roots. The garbage collector can now finish its traversal by traversing the ephemerons keys and values. But it does this *after* identifying the set of ephemerons whose keys are only reachable from ephemerons. These objects would be collectable other than being reachable from ephemerons.
So let's go back up to the system level for a moment. If we use Ephemeron instead of Association as the class of associations in various property dictionaries such as DependentsFields, then the garbage collector can identify these isolated cycles reachable from ephemerons. This needs only one more thing to solve the collection of these isolated cycles reachable from ephemerons, and relieve the programmer of the task of sending release. And that thing is to queue the set of ephemerons with keys reachable only from ephemerons, and signal a semaphore. Then a process in the image waiting on that semaphore can fetch the ephemerons from the queue and send each such ephemeron a message. I like to use the selector #mourn, because the ephemeron's key is about to die, and the ephemeron is mourning the demise of its key. In response to #mourn the ephemeron can remove itself from its dictionary (which will result in the ephemeron getting collected soon) and send #finalize to its key, because other than the reference from the ephemeron the key would have been collected.
So this is what the garbage collector does at the end of its traversal. It removes each ephemerons in the set of ephemerons with keys only reachable from ephemerons and adds it to the finalization queue. And this is the second bird. Because ephemerons delay scanning isolated cycles until the end of the garbage collector's traversal, they can be used to do proper finalization. The key of an ephemeron is identified as being collectable without having been collected yet. This is very useful. Compare this with the way one has to do finalization with weak arrays. Here, the death of an object is noted by a weak array having the reference to the collected object set to nil. To have something not-yet-collected to finalize, the system has to keep a copy of the object around and finalize the copy; keeping the copy up-to-date with its twin has complexity and performance overheads. Ephemerons are much nicer.
Now Chris if you still don't understand ephemerons (no shame in that; this is tricky stuff) please take time to tell me what doesn't make sense and I'll try again, editing this message until you do understand it. Then it can serve as a learning document/help page which explains ephemerons well.
There's one more thing to say about ephemerons at the image level. One can have any number of ephemerons on a single key. You could have one in DependentsFields, and another in some dictionary of finalizable objects, and so on. So we can use ephemerons for more than one thing. And because what marks an ephemeron as an ephemeron is its format (defined by its class definition, i.e. Association ephemeronSubclass: #Ephemeron) we can have one ephemeron class just for preventing non-isolated cycles, for DependentsFields, and another for finalization. One class would have a mourn method that just removes itself from its dictionary, another would have a morn method that sends #finalize to its key, etc.
Finally (sorry, couldn't resist ;-) ) some of you will have noticed something interesting about the scanning of ephemerons. This is of great importance to a correct implementation. At the end of the garbage collector's traversal once it has collected the set of ephemerons with keys only reachable from ephemerons it can queue these ephemerons for finalization. But then it does traverse the objects reachable from these ephemerons, and in doing so *it may encounter more, as-yet-unreached, ephemerons* and these may have as-yet-unreached keys. So the queueing and traversing of ephemerons continues until the set of ephemerons with unreachable keys is empty, and all such ephemerons have been added to the finalization queue. At this point, just as in a system without ephemerons, all reachable objects have been reached, but those only reachable through ephemerons have been identified and their guardian ephemerons notified of that fact by virtue of being queued for finalization.
And if this is enough explanation for someone itching to implement ephemerons I have a change set with some of the code already written. Find attached.
_,,,^..^,,,_ best, Eliot
Thanks Eliot,
at least to me it read coherent and the topic is nothing I ever dealt with. All I knew before was that gc removes objects that are not referenced any more and that the chain of references has to go down to the system root.
Cheers,
Herbert
Am 08.06.2022 um 00:46 schrieb Eliot Miranda:
On Sun, Jun 5, 2022 at 10:51 PM Chris Muller ma.chris.m@gmail.com wrote:
[snip] > If you need accurate information, you must check against #slowSize, which > means O(n) runtime. Using ephemerons may help with that. I hope to one day understand ephemerons and why they're useful, because I sense they provide some kind of magic that could help my stack. But, I never grok'd them so far.
Let me see if I can explain them coherently. I'll start top-down.
Ephemerons are Associations that interact with the garbage collector; they kill two birds with one stone. First, they enable the collection of non-isolated cycles. Second, they enable the identification of objects that are finalizable.
Taking the first topic first, what's a non-isolated cycle? Before that let's understand what an isolated cycle is. An isolated cycle is just some cycle of objects that are not reachable from the system roots. The system roots are effectively the Smalltalk dictionary and the current context of the active process. For example, all runnable processes are reachable from Processor, the singleton instance of ProcessorScheduler, and it is reachable from Smalltalk since there is an association in Smalltalk with key #Processor and value the singleton instance of ProcessorScheduler. All classes and global variables are held onto by associations in the Smalltalk dictionary.
So an isolated cycle is some cycle that isn't reachable from Smalltalk or the current process. Let's create a couple.
| cyclicArrays | cyclicArrays := Array new: 1. cyclicArrays at: 1 put: (Array with: cyclicArrays). cyclicArrays := nil
This creates two arrays that point to each other. The first is reachable from the temporary cyclicArrays, until cyclicArrays is assigned nil, at which point the two arrays are no longer reachable from the current process (or the roots) and will; be collected soon.
Here's a more interesting example that comes up a lot.
| process | process := [Semaphore new wait] newProcess. process resume. process := nil
This reduces to:
[Semaphore new wait] fork
When the process waits on the Semaphore it blocks, because the semaphore has no excess signals. When the process blocks it is removed from the set of runnable processes and added to the list of processes waiting on the Semaphore (actually the Semaphore *is* that list). So there's now an isolated cycle of the process and the semaphore, and it will get collected soon. It is very similar to the cyclic arrays; just different classes and a few more objects, like the chain of contexts and the block with which the process was started, all wrapped by the process instance.
OK, so what's a non-isolated cycle? The classic example is that of some model object in MVC. In MVC each view adds itself as a dependent of the model. If the model inherits from Model then it will have its own dependents (an instance of Array, in its first instance variable). So if the SystemWindow containing the views is deleted (removed from the world's set of morphs) then we have another isolated cycle between the views which refer directly to the model, and the model's dependents which point back to the views. But if the model does *not* inherit from Model then its dependents are attached to it in the DependentsFields dictionary, a class variable of Object, and Object is in Smalltalk. So the DependentsFields dictionary is effectively a root.
Object methods for dependents addDependent: anObject "Make the given object one of the receiver's dependents."
| dependents | dependents := self dependents. (dependents includes: anObject) ifFalse: [self myDependents: (dependents copyWithDependent: anObject)]. ^ anObject
dependents "Answer a collection of objects that are 'dependent' on the receiver; that is, all objects that should be notified if the receiver changes."
^ self myDependents ifNil: [#()]
myDependents "Private. Answer a list of all the receiver's dependents."
^ DependentsFields at: self ifAbsent: []
Model methods for dependents myDependents ^ dependents
So if some object (not inheriting from Model) gets some views attached to it as dependents and these views refer to the model we have a cycle that is reachable from the roots via a strong association in DependentsFields. So this cycle will not be collected when the SystemWindow containing those views is deleted because there is still a reference to the object in the association in DependentsFields; the object is the key of the association. The array of dependents is the value of the association. Even if we were to make the array of dependents weak, this allows the views to be collected, but not the model object itself. This is a problem, and is the reason the release method exists:
Object methods for dependents release "Remove references to objects that may refer to the receiver. This message should be overridden by subclasses with any cycles, in which case the subclass should also include the expression super release."
self breakDependents
Now we understand the difference between isolated cycles and non-isolated cycles we can understand what an ephemeron is, or at least how it came to be invented. An ephemeron is an association designed to be used with dictionaries such as DependentsFields, which are used to add properties to objects. What we want is not to have to send release. We'd like the system to be able to figure out that the cycle between the object and its views would be isolated were it not for the reference from DependentsFields. Let's make sure we get the point here. When the views exist within a SystemWindow that is on-screen (part of the world's morphs) the model is reachable both from the world's morphs (transitive closure from the world's submorphs through the morphs' model inst var) *and* from DependentsFields. When the SystemWindow is deleted it is removed from the world's submorphs so the model and its cycle is *only* reachable from DependentsFields. Other than for DependentsFields the cycle would be isolated and would be collected. The reference from DependentsFields keeps the cycle around (prevents it from being collected).
But an ephemeron is a smart association, known to the garbage collector. When the garbage collector traverses the objects reachable from the roots and finds an ephemeron it doesn't follow the key or the value of the ephemeron unless the key has already been reached. The ephemeron acts like a guard. If its key has already been reached it is reachable from the roots, and there is no possibility of an isolated cycle existing, and in this case an ephemeron is exactly like an association, and does nothing special. But if the key hasn't been reached then the ephemeron may be referencing an isolated cycle.
Now because traversing the object graph is done one object at a time, when an ephemeron is reached which has an as-yet-unreached key the garbage collector doesn't know if the key might be reached later in the traversal following the path from some other root, so it puts the ephemeron to one side, in the set of ephemerons with as-yet-unreachable keys. Once the traversal is finished then it examines that set and discards any ephemerons in the set that do have reachable keys. These were objects reachable from the roots other than through ephemerons. Now the set contains only ephemerons whose keys are *only* reachable from ephemerons reachable from the roots. The garbage collector can now finish its traversal by traversing the ephemerons keys and values. But it does this *after* identifying the set of ephemerons whose keys are only reachable from ephemerons. These objects would be collectable other than being reachable from ephemerons.
So let's go back up to the system level for a moment. If we use Ephemeron instead of Association as the class of associations in various property dictionaries such as DependentsFields, then the garbage collector can identify these isolated cycles reachable from ephemerons. This needs only one more thing to solve the collection of these isolated cycles reachable from ephemerons, and relieve the programmer of the task of sending release. And that thing is to queue the set of ephemerons with keys reachable only from ephemerons, and signal a semaphore. Then a process in the image waiting on that semaphore can fetch the ephemerons from the queue and send each such ephemeron a message. I like to use the selector #mourn, because the ephemeron's key is about to die, and the ephemeron is mourning the demise of its key. In response to #mourn the ephemeron can remove itself from its dictionary (which will result in the ephemeron getting collected soon) and send #finalize to its key, because other than the reference from the ephemeron the key would have been collected.
So this is what the garbage collector does at the end of its traversal. It removes each ephemerons in the set of ephemerons with keys only reachable from ephemerons and adds it to the finalization queue. And this is the second bird. Because ephemerons delay scanning isolated cycles until the end of the garbage collector's traversal, they can be used to do proper finalization. The key of an ephemeron is identified as being collectable without having been collected yet. This is very useful. Compare this with the way one has to do finalization with weak arrays. Here, the death of an object is noted by a weak array having the reference to the collected object set to nil. To have something not-yet-collected to finalize, the system has to keep a copy of the object around and finalize the copy; keeping the copy up-to-date with its twin has complexity and performance overheads. Ephemerons are much nicer.
Now Chris if you still don't understand ephemerons (no shame in that; this is tricky stuff) please take time to tell me what doesn't make sense and I'll try again, editing this message until you do understand it. Then it can serve as a learning document/help page which explains ephemerons well.
There's one more thing to say about ephemerons at the image level. One can have any number of ephemerons on a single key. You could have one in DependentsFields, and another in some dictionary of finalizable objects, and so on. So we can use ephemerons for more than one thing. And because what marks an ephemeron as an ephemeron is its format (defined by its class definition, i.e. Association ephemeronSubclass: #Ephemeron) we can have one ephemeron class just for preventing non-isolated cycles, for DependentsFields, and another for finalization. One class would have a mourn method that just removes itself from its dictionary, another would have a morn method that sends #finalize to its key, etc.
Finally (sorry, couldn't resist ;-) ) some of you will have noticed something interesting about the scanning of ephemerons. This is of great importance to a correct implementation. At the end of the garbage collector's traversal once it has collected the set of ephemerons with keys only reachable from ephemerons it can queue these ephemerons for finalization. But then it does traverse the objects reachable from these ephemerons, and in doing so *it may encounter more, as-yet-unreached, ephemerons* and these may have as-yet-unreached keys. So the queueing and traversing of ephemerons continues until the set of ephemerons with unreachable keys is empty, and all such ephemerons have been added to the finalization queue. At this point, just as in a system without ephemerons, all reachable objects have been reached, but those only reachable through ephemerons have been identified and their guardian ephemerons notified of that fact by virtue of being queued for finalization.
And if this is enough explanation for someone itching to implement ephemerons I have a change set with some of the code already written. Find attached.
_,,,^..^,,,_ best, Eliot
Wow, thank you so much for that fantastic description, Eliot! I believe I understand it!
It helped immensely that you clarified the *motivation* behind their invention:
What we want is not to have to send release. This needs only one more thing to solve the collection of these isolated
cycles reachable from ephemerons, and
relieve the programmer of the task of sending release.
This was very helpful. My brain has to believe there's a good reason to learn something before it'll make space for it. :)
Unwanted, non-isolated cycles are not something I want either but, I must admit, I haven't done enough ToolBuilder / MVC programming to understand why sending #release is such a big deal. Just due to the UI being made up of so many objects connected to so many model objects, I guess. But I tend to prefer a rather *buttoned down* style of design and programming. For example, I've never used finalization in any of my designs, because I only ever design for explicit release of resources.
- If I open a file, I explicitly close it, rather than use finalization. - If I fork a Process, I ensure I make a way for it to "fall out the bottom" gracefully, rather than #terminate at some random point. - If I ever send #addDependent: I ensure I have a corresponding #breakDependents. - etc.
That doesn't mean I don't respect or wouldn't try a more aggressive style. I do appreciate being able to create objects with impunity and rely on the garbage collector rather than having to delete them like in C++. :) This feels like an extension of that. I can see *some* appeal of not wanting to have to send #release, it creates more code that does nothing for the user, just the computer's memory management. Still, it seems like it could be written just once and tucked somewhere into the "Close" code of SystemWindow, and be a unified part of the abstraction handling the unregistration of the model objects.
Despite that, something deep suggests to me that there may be certain designs that can't be so buttoned down which these features (ephemerons and/or finalization) could simplify from a point of infeasibility to a point of feasibility. It's good to know this capability is in the toolbox, I've starred this description in my email client, just in case. Thank you so much.
- Chris
On Tue, Jun 7, 2022 at 5:47 PM Eliot Miranda eliot.miranda@gmail.com wrote:
On Sun, Jun 5, 2022 at 10:51 PM Chris Muller ma.chris.m@gmail.com wrote:
[snip]
If you need accurate information, you must check against #slowSize,
which
means O(n) runtime. Using ephemerons may help with that.
I hope to one day understand ephemerons and why they're useful, because I sense they provide some kind of magic that could help my stack. But, I never grok'd them so far.
Let me see if I can explain them coherently. I'll start top-down.
Ephemerons are Associations that interact with the garbage collector; they kill two birds with one stone. First, they enable the collection of non-isolated cycles. Second, they enable the identification of objects that are finalizable.
Taking the first topic first, what's a non-isolated cycle? Before that let's understand what an isolated cycle is. An isolated cycle is just some cycle of objects that are not reachable from the system roots. The system roots are effectively the Smalltalk dictionary and the current context of the active process. For example, all runnable processes are reachable from Processor, the singleton instance of ProcessorScheduler, and it is reachable from Smalltalk since there is an association in Smalltalk with key #Processor and value the singleton instance of ProcessorScheduler. All classes and global variables are held onto by associations in the Smalltalk dictionary.
So an isolated cycle is some cycle that isn't reachable from Smalltalk or the current process. Let's create a couple.
| cyclicArrays | cyclicArrays := Array new: 1. cyclicArrays at: 1 put: (Array with: cyclicArrays). cyclicArrays := nil
This creates two arrays that point to each other. The first is reachable from the temporary cyclicArrays, until cyclicArrays is assigned nil, at which point the two arrays are no longer reachable from the current process (or the roots) and will; be collected soon.
Here's a more interesting example that comes up a lot.
| process | process := [Semaphore new wait] newProcess. process resume. process := nil
This reduces to:
[Semaphore new wait] fork
When the process waits on the Semaphore it blocks, because the semaphore has no excess signals. When the process blocks it is removed from the set of runnable processes and added to the list of processes waiting on the Semaphore (actually the Semaphore *is* that list). So there's now an isolated cycle of the process and the semaphore, and it will get collected soon. It is very similar to the cyclic arrays; just different classes and a few more objects, like the chain of contexts and the block with which the process was started, all wrapped by the process instance.
OK, so what's a non-isolated cycle? The classic example is that of some model object in MVC. In MVC each view adds itself as a dependent of the model. If the model inherits from Model then it will have its own dependents (an instance of Array, in its first instance variable). So if the SystemWindow containing the views is deleted (removed from the world's set of morphs) then we have another isolated cycle between the views which refer directly to the model, and the model's dependents which point back to the views. But if the model does *not* inherit from Model then its dependents are attached to it in the DependentsFields dictionary, a class variable of Object, and Object is in Smalltalk. So the DependentsFields dictionary is effectively a root.
Object methods for dependents addDependent: anObject "Make the given object one of the receiver's dependents."
| dependents | dependents := self dependents. (dependents includes: anObject) ifFalse: [self myDependents: (dependents copyWithDependent: anObject)]. ^ anObject
dependents "Answer a collection of objects that are 'dependent' on the receiver; that is, all objects that should be notified if the receiver changes."
^ self myDependents ifNil: [#()]
myDependents "Private. Answer a list of all the receiver's dependents."
^ DependentsFields at: self ifAbsent: []
Model methods for dependents myDependents ^ dependents
So if some object (not inheriting from Model) gets some views attached to it as dependents and these views refer to the model we have a cycle that is reachable from the roots via a strong association in DependentsFields. So this cycle will not be collected when the SystemWindow containing those views is deleted because there is still a reference to the object in the association in DependentsFields; the object is the key of the association. The array of dependents is the value of the association. Even if we were to make the array of dependents weak, this allows the views to be collected, but not the model object itself. This is a problem, and is the reason the release method exists:
Object methods for dependents release "Remove references to objects that may refer to the receiver. This message should be overridden by subclasses with any cycles, in which case the subclass should also include the expression super release."
self breakDependents
Now we understand the difference between isolated cycles and non-isolated cycles we can understand what an ephemeron is, or at least how it came to be invented. An ephemeron is an association designed to be used with dictionaries such as DependentsFields, which are used to add properties to objects. What we want is not to have to send release. We'd like the system to be able to figure out that the cycle between the object and its views would be isolated were it not for the reference from DependentsFields. Let's make sure we get the point here. When the views exist within a SystemWindow that is on-screen (part of the world's morphs) the model is reachable both from the world's morphs (transitive closure from the world's submorphs through the morphs' model inst var) *and* from DependentsFields. When the SystemWindow is deleted it is removed from the world's submorphs so the model and its cycle is *only* reachable from DependentsFields. Other than for DependentsFields the cycle would be isolated and would be collected. The reference from DependentsFields keeps the cycle around (prevents it from being collected).
But an ephemeron is a smart association, known to the garbage collector. When the garbage collector traverses the objects reachable from the roots and finds an ephemeron it doesn't follow the key or the value of the ephemeron unless the key has already been reached. The ephemeron acts like a guard. If its key has already been reached it is reachable from the roots, and there is no possibility of an isolated cycle existing, and in this case an ephemeron is exactly like an association, and does nothing special. But if the key hasn't been reached then the ephemeron may be referencing an isolated cycle.
Now because traversing the object graph is done one object at a time, when an ephemeron is reached which has an as-yet-unreached key the garbage collector doesn't know if the key might be reached later in the traversal following the path from some other root, so it puts the ephemeron to one side, in the set of ephemerons with as-yet-unreachable keys. Once the traversal is finished then it examines that set and discards any ephemerons in the set that do have reachable keys. These were objects reachable from the roots other than through ephemerons. Now the set contains only ephemerons whose keys are *only* reachable from ephemerons reachable from the roots. The garbage collector can now finish its traversal by traversing the ephemerons keys and values. But it does this *after* identifying the set of ephemerons whose keys are only reachable from ephemerons. These objects would be collectable other than being reachable from ephemerons.
So let's go back up to the system level for a moment. If we use Ephemeron instead of Association as the class of associations in various property dictionaries such as DependentsFields, then the garbage collector can identify these isolated cycles reachable from ephemerons. This needs only one more thing to solve the collection of these isolated cycles reachable from ephemerons, and relieve the programmer of the task of sending release. And that thing is to queue the set of ephemerons with keys reachable only from ephemerons, and signal a semaphore. Then a process in the image waiting on that semaphore can fetch the ephemerons from the queue and send each such ephemeron a message. I like to use the selector #mourn, because the ephemeron's key is about to die, and the ephemeron is mourning the demise of its key. In response to #mourn the ephemeron can remove itself from its dictionary (which will result in the ephemeron getting collected soon) and send #finalize to its key, because other than the reference from the ephemeron the key would have been collected.
So this is what the garbage collector does at the end of its traversal. It removes each ephemerons in the set of ephemerons with keys only reachable from ephemerons and adds it to the finalization queue. And this is the second bird. Because ephemerons delay scanning isolated cycles until the end of the garbage collector's traversal, they can be used to do proper finalization. The key of an ephemeron is identified as being collectable without having been collected yet. This is very useful. Compare this with the way one has to do finalization with weak arrays. Here, the death of an object is noted by a weak array having the reference to the collected object set to nil. To have something not-yet-collected to finalize, the system has to keep a copy of the object around and finalize the copy; keeping the copy up-to-date with its twin has complexity and performance overheads. Ephemerons are much nicer.
Now Chris if you still don't understand ephemerons (no shame in that; this is tricky stuff) please take time to tell me what doesn't make sense and I'll try again, editing this message until you do understand it. Then it can serve as a learning document/help page which explains ephemerons well.
There's one more thing to say about ephemerons at the image level. One can have any number of ephemerons on a single key. You could have one in DependentsFields, and another in some dictionary of finalizable objects, and so on. So we can use ephemerons for more than one thing. And because what marks an ephemeron as an ephemeron is its format (defined by its class definition, i.e. Association ephemeronSubclass: #Ephemeron) we can have one ephemeron class just for preventing non-isolated cycles, for DependentsFields, and another for finalization. One class would have a mourn method that just removes itself from its dictionary, another would have a morn method that sends #finalize to its key, etc.
Finally (sorry, couldn't resist ;-) ) some of you will have noticed something interesting about the scanning of ephemerons. This is of great importance to a correct implementation. At the end of the garbage collector's traversal once it has collected the set of ephemerons with keys only reachable from ephemerons it can queue these ephemerons for finalization. But then it does traverse the objects reachable from these ephemerons, and in doing so *it may encounter more, as-yet-unreached, ephemerons* and these may have as-yet-unreached keys. So the queueing and traversing of ephemerons continues until the set of ephemerons with unreachable keys is empty, and all such ephemerons have been added to the finalization queue. At this point, just as in a system without ephemerons, all reachable objects have been reached, but those only reachable through ephemerons have been identified and their guardian ephemerons notified of that fact by virtue of being queued for finalization.
And if this is enough explanation for someone itching to implement ephemerons I have a change set with some of the code already written. Find attached.
_,,,^..^,,,_ best, Eliot
squeak-dev@lists.squeakfoundation.org