I have been trying to gradually update trunk VMMaker to better align with oscog VMMaker (an admittedly slow process, but hopefully still worthwhile). I have gotten the interpreter primitives moved into class InterpreterPrimitives and verified no changes to generated code. This greatly reduces the clutter in class Interpreter, so it's a nice change I think.
My next step was to update all of the primitives to use the #primitiveFailFor: idiom, in which the successFlag variable is replaced with primFailCode (integer value, 0 for success, 1, 2, 3... for failure codes). This would get us closer to the point where the standard interpreter and stack/cog would use a common set of primitives. A lot of changes were required for this, but the resulting VM works fine ... except for performance.
On a standard interpreter, use of primFailCode seems to result in a nearly 12% reduction in bytecode performance as measured by tinyBenchmarks:
Standard interpreter (using successFlag): 0 tinyBenchmarks. '439108061 bytecodes/sec; 15264622 sends/sec' 0 tinyBenchmarks. '433164128 bytecodes/sec; 14740358 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '440999138 bytecodes/sec; 15052960 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 14485815 sends/sec'
After updating the standard interpreter (using primFailCode): 0 tinyBenchmarks. '393241167 bytecodes/sec; 14066256 sends/sec' 0 tinyBenchmarks. '392036753 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 14272953 sends/sec' 0 tinyBenchmarks. '400625978 bytecodes/sec; 14991818 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 15176750 sends/sec'
This is a much larger performance difference than I expected to see. Actually I expected no measurable difference at all, and I was just testing to verify this. But 12% is a lot, so I want to ask if I'm missing something?
The changes to generated code generally take the form of:
Testing success status, original: if (successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
My approach to doing the updates was as follows: - Replace all occurrences of "successFlag := true" with "self initPrimCall", which initialize primFailCode to 0. - Replace all "successFlag := false" with "self primitiveFail". - Replace all "successFlag ifTrue: [] ifFalse: []" with "self successful ifTrue: [] ifFalse: []". - Update #primitiveFail, #failed and #success: to use primFailCode rather than successFlag. - Remove successFlag variable.
Obviously I don't want to publish the code on SqS/VMMaker, but I can mail an interp.c if anyone wants to see the gory details (It is too large to post on this mailing list though).
Any advice appreciated. I suspect I'm missing something basic here.
Thanks, Dave
Thanks A LOT for your effort!!!
Stef
On May 22, 2011, at 5:54 PM, David T. Lewis wrote:
I have been trying to gradually update trunk VMMaker to better align with oscog VMMaker (an admittedly slow process, but hopefully still worthwhile). I have gotten the interpreter primitives moved into class InterpreterPrimitives and verified no changes to generated code. This greatly reduces the clutter in class Interpreter, so it's a nice change I think.
My next step was to update all of the primitives to use the #primitiveFailFor: idiom, in which the successFlag variable is replaced with primFailCode (integer value, 0 for success, 1, 2, 3... for failure codes). This would get us closer to the point where the standard interpreter and stack/cog would use a common set of primitives. A lot of changes were required for this, but the resulting VM works fine ... except for performance.
On a standard interpreter, use of primFailCode seems to result in a nearly 12% reduction in bytecode performance as measured by tinyBenchmarks:
Standard interpreter (using successFlag): 0 tinyBenchmarks. '439108061 bytecodes/sec; 15264622 sends/sec' 0 tinyBenchmarks. '433164128 bytecodes/sec; 14740358 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '440999138 bytecodes/sec; 15052960 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 14485815 sends/sec'
After updating the standard interpreter (using primFailCode): 0 tinyBenchmarks. '393241167 bytecodes/sec; 14066256 sends/sec' 0 tinyBenchmarks. '392036753 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 14272953 sends/sec' 0 tinyBenchmarks. '400625978 bytecodes/sec; 14991818 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 15176750 sends/sec'
This is a much larger performance difference than I expected to see. Actually I expected no measurable difference at all, and I was just testing to verify this. But 12% is a lot, so I want to ask if I'm missing something?
The changes to generated code generally take the form of:
Testing success status, original: if (successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
My approach to doing the updates was as follows:
- Replace all occurrences of "successFlag := true" with "self initPrimCall",
which initialize primFailCode to 0.
- Replace all "successFlag := false" with "self primitiveFail".
- Replace all "successFlag ifTrue: [] ifFalse: []" with
"self successful ifTrue: [] ifFalse: []".
- Update #primitiveFail, #failed and #success: to use primFailCode rather
than successFlag.
- Remove successFlag variable.
Obviously I don't want to publish the code on SqS/VMMaker, but I can mail an interp.c if anyone wants to see the gory details (It is too large to post on this mailing list though).
Any advice appreciated. I suspect I'm missing something basic here.
Thanks, Dave
A correction to the code that I quoted below: The generated code before and after the change looks like this (sorry I forgot a "foo"):
Testing success status, original: if (foo->successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: foo->successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
Dave
On Sun, May 22, 2011 at 11:54:18AM -0400, David T. Lewis wrote:
I have been trying to gradually update trunk VMMaker to better align with oscog VMMaker (an admittedly slow process, but hopefully still worthwhile). I have gotten the interpreter primitives moved into class InterpreterPrimitives and verified no changes to generated code. This greatly reduces the clutter in class Interpreter, so it's a nice change I think.
My next step was to update all of the primitives to use the #primitiveFailFor: idiom, in which the successFlag variable is replaced with primFailCode (integer value, 0 for success, 1, 2, 3... for failure codes). This would get us closer to the point where the standard interpreter and stack/cog would use a common set of primitives. A lot of changes were required for this, but the resulting VM works fine ... except for performance.
On a standard interpreter, use of primFailCode seems to result in a nearly 12% reduction in bytecode performance as measured by tinyBenchmarks:
Standard interpreter (using successFlag): 0 tinyBenchmarks. '439108061 bytecodes/sec; 15264622 sends/sec' 0 tinyBenchmarks. '433164128 bytecodes/sec; 14740358 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '440999138 bytecodes/sec; 15052960 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 14485815 sends/sec'
After updating the standard interpreter (using primFailCode): 0 tinyBenchmarks. '393241167 bytecodes/sec; 14066256 sends/sec' 0 tinyBenchmarks. '392036753 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 14272953 sends/sec' 0 tinyBenchmarks. '400625978 bytecodes/sec; 14991818 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 15176750 sends/sec'
This is a much larger performance difference than I expected to see. Actually I expected no measurable difference at all, and I was just testing to verify this. But 12% is a lot, so I want to ask if I'm missing something?
The changes to generated code generally take the form of:
Testing success status, original: if (successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
My approach to doing the updates was as follows:
- Replace all occurrences of "successFlag := true" with "self initPrimCall", which initialize primFailCode to 0.
- Replace all "successFlag := false" with "self primitiveFail".
- Replace all "successFlag ifTrue: [] ifFalse: []" with "self successful ifTrue: [] ifFalse: []".
- Update #primitiveFail, #failed and #success: to use primFailCode rather than successFlag.
- Remove successFlag variable.
Obviously I don't want to publish the code on SqS/VMMaker, but I can mail an interp.c if anyone wants to see the gory details (It is too large to post on this mailing list though).
Any advice appreciated. I suspect I'm missing something basic here.
Thanks, Dave
Thanks A LOT for your effort!!!
Stef
On May 22, 2011, at 5:54 PM, David T. Lewis wrote:
I have been trying to gradually update trunk VMMaker to better align with oscog VMMaker (an admittedly slow process, but hopefully still worthwhile). I have gotten the interpreter primitives moved into class InterpreterPrimitives and verified no changes to generated code. This greatly reduces the clutter in class Interpreter, so it's a nice change I think.
My next step was to update all of the primitives to use the #primitiveFailFor: idiom, in which the successFlag variable is replaced with primFailCode (integer value, 0 for success, 1, 2, 3... for failure codes). This would get us closer to the point where the standard interpreter and stack/cog would use a common set of primitives. A lot of changes were required for this, but the resulting VM works fine ... except for performance.
On a standard interpreter, use of primFailCode seems to result in a nearly 12% reduction in bytecode performance as measured by tinyBenchmarks:
Standard interpreter (using successFlag): 0 tinyBenchmarks. '439108061 bytecodes/sec; 15264622 sends/sec' 0 tinyBenchmarks. '433164128 bytecodes/sec; 14740358 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '440999138 bytecodes/sec; 15052960 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 14485815 sends/sec'
After updating the standard interpreter (using primFailCode): 0 tinyBenchmarks. '393241167 bytecodes/sec; 14066256 sends/sec' 0 tinyBenchmarks. '392036753 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 14272953 sends/sec' 0 tinyBenchmarks. '400625978 bytecodes/sec; 14991818 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 15176750 sends/sec'
This is a much larger performance difference than I expected to see. Actually I expected no measurable difference at all, and I was just testing to verify this. But 12% is a lot, so I want to ask if I'm missing something?
The changes to generated code generally take the form of:
Testing success status, original: if (successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
My approach to doing the updates was as follows:
- Replace all occurrences of "successFlag := true" with "self initPrimCall",
which initialize primFailCode to 0.
- Replace all "successFlag := false" with "self primitiveFail".
- Replace all "successFlag ifTrue: [] ifFalse: []" with
"self successful ifTrue: [] ifFalse: []".
- Update #primitiveFail, #failed and #success: to use primFailCode rather
than successFlag.
- Remove successFlag variable.
Obviously I don't want to publish the code on SqS/VMMaker, but I can mail an interp.c if anyone wants to see the gory details (It is too large to post on this mailing list though).
Any advice appreciated. I suspect I'm missing something basic here.
Thanks, Dave
Hi David,
the difference looks to me to do with the fact that successFlag is flat and primErrorCode is in the VM struct. Try generating a VM where either primFailCode is also flat or, better still, all variables are flat. In my experience the flat form is faster on x86 (and faster with both the intel and gcc compilers; not tested with llvm yet). BTW, if you use the Cog generator it'll generate accesses to variables which might be in the VM struct as GIV(theVariableInQuestion) (where GIV stands for global interpreter variable), and this allows one to choose whether these variables are kept in a struct or kept as separate variables at compile-time instead of generation time, as controlled by the USE_GLOBAL_STRUCT compile-time constant, e.g. gcc -DUSE_GLOBAL_STRUCT=0 gcc3x-interp.c.
HTH Eliot
On Sun, May 22, 2011 at 8:54 AM, David T. Lewis lewis@mail.msen.com wrote:
I have been trying to gradually update trunk VMMaker to better align with oscog VMMaker (an admittedly slow process, but hopefully still worthwhile). I have gotten the interpreter primitives moved into class InterpreterPrimitives and verified no changes to generated code. This greatly reduces the clutter in class Interpreter, so it's a nice change I think.
My next step was to update all of the primitives to use the #primitiveFailFor: idiom, in which the successFlag variable is replaced with primFailCode (integer value, 0 for success, 1, 2, 3... for failure codes). This would get us closer to the point where the standard interpreter and stack/cog would use a common set of primitives. A lot of changes were required for this, but the resulting VM works fine ... except for performance.
On a standard interpreter, use of primFailCode seems to result in a nearly 12% reduction in bytecode performance as measured by tinyBenchmarks:
Standard interpreter (using successFlag): 0 tinyBenchmarks. '439108061 bytecodes/sec; 15264622 sends/sec' 0 tinyBenchmarks. '433164128 bytecodes/sec; 14740358 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '440999138 bytecodes/sec; 15052960 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 14485815 sends/sec'
After updating the standard interpreter (using primFailCode): 0 tinyBenchmarks. '393241167 bytecodes/sec; 14066256 sends/sec' 0 tinyBenchmarks. '392036753 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 14272953 sends/sec' 0 tinyBenchmarks. '400625978 bytecodes/sec; 14991818 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 15176750 sends/sec'
This is a much larger performance difference than I expected to see. Actually I expected no measurable difference at all, and I was just testing to verify this. But 12% is a lot, so I want to ask if I'm missing something?
The changes to generated code generally take the form of:
Testing success status, original: if (successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
My approach to doing the updates was as follows:
- Replace all occurrences of "successFlag := true" with "self
initPrimCall", which initialize primFailCode to 0.
- Replace all "successFlag := false" with "self primitiveFail".
- Replace all "successFlag ifTrue: [] ifFalse: []" with
"self successful ifTrue: [] ifFalse: []".
- Update #primitiveFail, #failed and #success: to use primFailCode rather
than successFlag.
- Remove successFlag variable.
Obviously I don't want to publish the code on SqS/VMMaker, but I can mail an interp.c if anyone wants to see the gory details (It is too large to post on this mailing list though).
Any advice appreciated. I suspect I'm missing something basic here.
Thanks, Dave
On Mon, May 23, 2011 at 01:44:48PM -0700, Eliot Miranda wrote:
Hi David,
the difference looks to me to do with the fact that successFlag is flat
and primErrorCode is in the VM struct. Try generating a VM where either primFailCode is also flat or, better still, all variables are flat. In my experience the flat form is faster on x86 (and faster with both the intel and gcc compilers; not tested with llvm yet). BTW, if you use the Cog generator it'll generate accesses to variables which might be in the VM struct as GIV(theVariableInQuestion) (where GIV stands for global interpreter variable), and this allows one to choose whether these variables are kept in a struct or kept as separate variables at compile-time instead of generation time, as controlled by the USE_GLOBAL_STRUCT compile-time constant, e.g. gcc -DUSE_GLOBAL_STRUCT=0 gcc3x-interp.c.
Eliot,
Thanks, and I have to apologize because I quoted the code incorrectly in my original message. The generated code before and after the change actually looks like this (sorry I forgot the "foo"):
Testing success status, original: if (foo->successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: foo->successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
So in each case the global struct is being used, both for successFlag and primFailCode. Sorry for the confusion. In any case, I'm still left scratching my head over the size of the performance difference.
Dave
On Sun, May 22, 2011 at 8:54 AM, David T. Lewis lewis@mail.msen.com wrote:
I have been trying to gradually update trunk VMMaker to better align with oscog VMMaker (an admittedly slow process, but hopefully still worthwhile). I have gotten the interpreter primitives moved into class InterpreterPrimitives and verified no changes to generated code. This greatly reduces the clutter in class Interpreter, so it's a nice change I think.
My next step was to update all of the primitives to use the #primitiveFailFor: idiom, in which the successFlag variable is replaced with primFailCode (integer value, 0 for success, 1, 2, 3... for failure codes). This would get us closer to the point where the standard interpreter and stack/cog would use a common set of primitives. A lot of changes were required for this, but the resulting VM works fine ... except for performance.
On a standard interpreter, use of primFailCode seems to result in a nearly 12% reduction in bytecode performance as measured by tinyBenchmarks:
Standard interpreter (using successFlag): 0 tinyBenchmarks. '439108061 bytecodes/sec; 15264622 sends/sec' 0 tinyBenchmarks. '433164128 bytecodes/sec; 14740358 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '440999138 bytecodes/sec; 15052960 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 14485815 sends/sec'
After updating the standard interpreter (using primFailCode): 0 tinyBenchmarks. '393241167 bytecodes/sec; 14066256 sends/sec' 0 tinyBenchmarks. '392036753 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 14272953 sends/sec' 0 tinyBenchmarks. '400625978 bytecodes/sec; 14991818 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 15176750 sends/sec'
This is a much larger performance difference than I expected to see. Actually I expected no measurable difference at all, and I was just testing to verify this. But 12% is a lot, so I want to ask if I'm missing something?
The changes to generated code generally take the form of:
Testing success status, original: if (successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
My approach to doing the updates was as follows:
- Replace all occurrences of "successFlag := true" with "self
initPrimCall", which initialize primFailCode to 0.
- Replace all "successFlag := false" with "self primitiveFail".
- Replace all "successFlag ifTrue: [] ifFalse: []" with
"self successful ifTrue: [] ifFalse: []".
- Update #primitiveFail, #failed and #success: to use primFailCode rather
than successFlag.
- Remove successFlag variable.
Obviously I don't want to publish the code on SqS/VMMaker, but I can mail an interp.c if anyone wants to see the gory details (It is too large to post on this mailing list though).
Any advice appreciated. I suspect I'm missing something basic here.
Thanks, Dave
On Mon, May 23, 2011 at 2:08 PM, David T. Lewis lewis@mail.msen.com wrote:
On Mon, May 23, 2011 at 01:44:48PM -0700, Eliot Miranda wrote:
Hi David,
the difference looks to me to do with the fact that successFlag is
flat
and primErrorCode is in the VM struct. Try generating a VM where either primFailCode is also flat or, better still, all variables are flat. In
my
experience the flat form is faster on x86 (and faster with both the intel and gcc compilers; not tested with llvm yet). BTW, if you use the Cog generator it'll generate accesses to variables which might be in the VM struct as GIV(theVariableInQuestion) (where GIV stands for global interpreter variable), and this allows one to choose whether these
variables
are kept in a struct or kept as separate variables at compile-time
instead
of generation time, as controlled by the USE_GLOBAL_STRUCT compile-time constant, e.g. gcc -DUSE_GLOBAL_STRUCT=0 gcc3x-interp.c.
Eliot,
Thanks, and I have to apologize because I quoted the code incorrectly in my original message. The generated code before and after the change actually looks like this (sorry I forgot the "foo"):
Ah, ok.
Testing success status, original: if (foo->successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: foo->successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
So in each case the global struct is being used, both for successFlag and primFailCode. Sorry for the confusion. In any case, I'm still left scratching my head over the size of the performance difference.
One thought, where are successFlag and primFailCode in the struct? Perhaps the size of the offset needed to access them makes a difference?
Dave
On Sun, May 22, 2011 at 8:54 AM, David T. Lewis lewis@mail.msen.com
wrote:
I have been trying to gradually update trunk VMMaker to better align with oscog VMMaker (an admittedly slow process, but hopefully still worthwhile). I have gotten the interpreter primitives moved into class InterpreterPrimitives and verified no changes to generated code. This greatly reduces the clutter in class Interpreter, so it's a nice change I think.
My next step was to update all of the primitives to use the #primitiveFailFor: idiom, in which the successFlag variable is replaced with primFailCode (integer value, 0 for success, 1, 2, 3... for failure codes). This
would
get us closer to the point where the standard interpreter and stack/cog would use a common set of primitives. A lot of changes were required
for
this, but the resulting VM works fine ... except for performance.
On a standard interpreter, use of primFailCode seems to result in a nearly 12% reduction in bytecode performance as measured by
tinyBenchmarks:
Standard interpreter (using successFlag): 0 tinyBenchmarks. '439108061 bytecodes/sec; 15264622 sends/sec' 0 tinyBenchmarks. '433164128 bytecodes/sec; 14740358 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '440999138 bytecodes/sec; 15052960 sends/sec' 0 tinyBenchmarks. '445993031 bytecodes/sec; 14485815 sends/sec'
After updating the standard interpreter (using primFailCode): 0 tinyBenchmarks. '393241167 bytecodes/sec; 14066256 sends/sec' 0 tinyBenchmarks. '392036753 bytecodes/sec; 15040691 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 14272953 sends/sec' 0 tinyBenchmarks. '400625978 bytecodes/sec; 14991818 sends/sec' 0 tinyBenchmarks. '393846153 bytecodes/sec; 15176750 sends/sec'
This is a much larger performance difference than I expected to see. Actually I expected no measurable difference at all, and I was just testing to verify this. But 12% is a lot, so I want to ask if I'm missing something?
The changes to generated code generally take the form of:
Testing success status, original: if (successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
My approach to doing the updates was as follows:
- Replace all occurrences of "successFlag := true" with "self
initPrimCall", which initialize primFailCode to 0.
- Replace all "successFlag := false" with "self primitiveFail".
- Replace all "successFlag ifTrue: [] ifFalse: []" with
"self successful ifTrue: [] ifFalse: []".
- Update #primitiveFail, #failed and #success: to use primFailCode
rather
than successFlag.
- Remove successFlag variable.
Obviously I don't want to publish the code on SqS/VMMaker, but I can
an interp.c if anyone wants to see the gory details (It is too large to post on this mailing list though).
Any advice appreciated. I suspect I'm missing something basic here.
Thanks, Dave
On Mon, May 23, 2011 at 02:33:52PM -0700, Eliot Miranda wrote:
On Mon, May 23, 2011 at 2:08 PM, David T. Lewis lewis@mail.msen.com wrote:
Testing success status, original: if (foo->successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: foo->successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
So in each case the global struct is being used, both for successFlag and primFailCode. Sorry for the confusion. In any case, I'm still left scratching my head over the size of the performance difference.
One thought, where are successFlag and primFailCode in the struct? Perhaps the size of the offset needed to access them makes a difference?
In both cases they are the first element of the struct, so that cannot be it.
I think I had better circle back and redo my tests. Maybe I made a mistake somewhere.
Thanks, Dave
On Mon, May 23, 2011 at 07:30:09PM -0400, David T. Lewis wrote:
On Mon, May 23, 2011 at 02:33:52PM -0700, Eliot Miranda wrote:
On Mon, May 23, 2011 at 2:08 PM, David T. Lewis lewis@mail.msen.com wrote:
Testing success status, original: if (foo->successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: foo->successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
So in each case the global struct is being used, both for successFlag and primFailCode. Sorry for the confusion. In any case, I'm still left scratching my head over the size of the performance difference.
One thought, where are successFlag and primFailCode in the struct? Perhaps the size of the offset needed to access them makes a difference?
In both cases they are the first element of the struct, so that cannot be it.
I think I had better circle back and redo my tests. Maybe I made a mistake somewhere.
No mistake, the performance problem was real.
Good news - I found the cause. Better news - this may be good for a performance boost on StackVM and possibly Cog also.
The performance hit was due almost entirely to InterpreterPrimitives>>failed, and perhaps a little bit to #successful and #success: also.
This issue with #failed is due to "^primFailCode ~= 0" which, for purposes of C translation, can be recoded as "^primFailCode" with an override in the simulator as "^primFailCode ~= 0". This produces a significant speed improvement, at least as fast as for the original interpreter implementation using successFlag.
I expect that the same change applied to StackInterpreter may give a similar 10% improvement (though I have not tried it). I don't know what to expect with Cog, but it may give a boost there as well.
Changes attached, also included in VMMaker-dtl.237 on SqueakSource.
Dave
No mistake, the performance problem was real.
Good news - I found the cause. Better news - this may be good for a performance boost on StackVM and possibly Cog also.
The performance hit was due almost entirely to InterpreterPrimitives>>failed, and perhaps a little bit to #successful and #success: also.
This issue with #failed is due to "^primFailCode ~= 0" which, for purposes of C translation, can be recoded as "^primFailCode" with an override in the simulator as "^primFailCode ~= 0". This produces a significant speed improvement, at least as fast as for the original interpreter implementation using successFlag.
I expect that the same change applied to StackInterpreter may give a similar 10% improvement (though I have not tried it). I don't know what to expect with Cog, but it may give a boost there as well.
Changes attached, also included in VMMaker-dtl.237 on SqueakSource.
Dave
added to http://code.google.com/p/cog/issues/detail?id=45
it is strange that such small detail could make a lot of difference in speed.
On Tue, May 24, 2011 at 10:46:02AM +0200, Igor Stasenko wrote:
No mistake, the performance problem was real.
Good news - I found the cause. Better news - this may be good for a performance boost on StackVM and possibly Cog also.
The performance hit was due almost entirely to InterpreterPrimitives>>failed, and perhaps a little bit to #successful and #success: also.
This issue with #failed is due to "^primFailCode ~= 0" which, for purposes of C translation, can be recoded as "^primFailCode" with an override in the simulator as "^primFailCode ~= 0". This produces a significant speed improvement, at least as fast as for the original interpreter implementation using successFlag.
I expect that the same change applied to StackInterpreter may give a similar 10% improvement (though I have not tried it). I don't know what to expect with Cog, but it may give a boost there as well.
Changes attached, also included in VMMaker-dtl.237 on SqueakSource.
Dave
Thanks Igor.
it is strange that such small detail could make a lot of difference in speed.
Yes, I was very surprised to see it also. It will be interesting to see if it has a similar effect for StackInterpreter. I probably will not have time to check this for a while, so if you try it please let us know what you find.
Dave
On 24 May 2011 14:00, David T. Lewis lewis@mail.msen.com wrote:
On Tue, May 24, 2011 at 10:46:02AM +0200, Igor Stasenko wrote:
No mistake, the performance problem was real.
Good news - I found the cause. Better news - this may be good for a performance boost on StackVM and possibly Cog also.
The performance hit was due almost entirely to InterpreterPrimitives>>failed, and perhaps a little bit to #successful and #success: also.
This issue with #failed is due to "^primFailCode ~= 0" which, for purposes of C translation, can be recoded as "^primFailCode" with an override in the simulator as "^primFailCode ~= 0". This produces a significant speed improvement, at least as fast as for the original interpreter implementation using successFlag.
I expect that the same change applied to StackInterpreter may give a similar 10% improvement (though I have not tried it). I don't know what to expect with Cog, but it may give a boost there as well.
Changes attached, also included in VMMaker-dtl.237 on SqueakSource.
Dave
Thanks Igor.
it is strange that such small detail could make a lot of difference in speed.
Yes, I was very surprised to see it also. It will be interesting to see if it has a similar effect for StackInterpreter. I probably will not have time to check this for a while, so if you try it please let us know what you find.
What you using to measure difference in speed?
Dave
On Tue, May 24, 2011 at 02:16:05PM +0200, Igor Stasenko wrote:
On 24 May 2011 14:00, David T. Lewis lewis@mail.msen.com wrote:
On Tue, May 24, 2011 at 10:46:02AM +0200, Igor Stasenko wrote:
No mistake, the performance problem was real.
Good news - I found the cause. Better news - this may be good for a performance boost on StackVM and possibly Cog also.
The performance hit was due almost entirely to InterpreterPrimitives>>failed, and perhaps a little bit to #successful and #success: also.
This issue with #failed is due to "^primFailCode ~= 0" which, for purposes of C translation, can be recoded as "^primFailCode" with an override in the simulator as "^primFailCode ~= 0". This produces a significant speed improvement, at least as fast as for the original interpreter implementation using successFlag.
I expect that the same change applied to StackInterpreter may give a similar 10% improvement (though I have not tried it). I don't know what to expect with Cog, but it may give a boost there as well.
Changes attached, also included in VMMaker-dtl.237 on SqueakSource.
Dave
Thanks Igor.
it is strange that such small detail ??could make a lot of difference in speed.
Yes, I was very surprised to see it also. It will be interesting to see if it has a similar effect for StackInterpreter. I probably will not have time to check this for a while, so if you try it please let us know what you find.
What you using to measure difference in speed?
I just use tinyBenchmarks as a smoke test to make sure that changes in the slang do not affect performance. So I am looking at different variants of the code, running each one five times to get an average. Obviously this does not reflect real performance, but it is useful for spotting problems. Examples on my system:
Standard interpreter VM with successFlag 0 tinyBenchmarks '444444444 bytecodes/sec; 14317245 sends/sec' 0 tinyBenchmarks '435374149 bytecodes/sec; 14012854 sends/sec' 0 tinyBenchmarks '437606837 bytecodes/sec; 15277259 sends/sec' 0 tinyBenchmarks '437981180 bytecodes/sec; 15252007 sends/sec' 0 tinyBenchmarks '443674176 bytecodes/sec; 14406658 sends/sec'
Interpreter VM with primFailCode 0 tinyBenchmarks '398133748 bytecodes/sec; 14895019 sends/sec' 0 tinyBenchmarks '393241167 bytecodes/sec; 14228935 sends/sec' 0 tinyBenchmarks '396284829 bytecodes/sec; 14250910 sends/sec' 0 tinyBenchmarks '396591789 bytecodes/sec; 14907050 sends/sec' 0 tinyBenchmarks '401883830 bytecodes/sec; 14520007 sends/sec'
Interpreter VM with primFailCode after optimizing #failed, #success:, and #successful 0 tinyBenchmarks '447161572 bytecodes/sec; 14979650 sends/sec' 0 tinyBenchmarks '442523768 bytecodes/sec; 14955371 sends/sec' 0 tinyBenchmarks '447161572 bytecodes/sec; 14991818 sends/sec' 0 tinyBenchmarks '443290043 bytecodes/sec; 14350644 sends/sec' 0 tinyBenchmarks '445604873 bytecodes/sec; 15114601 sends/sec'
Similar tests showed that the differences were almost entirely associated with #failed.
I have to say that I am still uncomfortable about this, because I cannot really explain why the change has such a large effect. The #failed method is used only in a few places in the interpreter itself. So if you are able to independently verify (or refute) any of these results, that would be great.
Thanks, Dave
On Mon, May 23, 2011 at 8:42 PM, David T. Lewis lewis@mail.msen.com wrote:
On Mon, May 23, 2011 at 07:30:09PM -0400, David T. Lewis wrote:
On Mon, May 23, 2011 at 02:33:52PM -0700, Eliot Miranda wrote:
On Mon, May 23, 2011 at 2:08 PM, David T. Lewis lewis@mail.msen.com
wrote:
Testing success status, original: if (foo->successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: foo->successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
So in each case the global struct is being used, both for successFlag and primFailCode. Sorry for the confusion. In any case, I'm still
left
scratching my head over the size of the performance difference.
One thought, where are successFlag and primFailCode in the struct?
Perhaps
the size of the offset needed to access them makes a difference?
In both cases they are the first element of the struct, so that cannot be it.
I think I had better circle back and redo my tests. Maybe I made a mistake somewhere.
No mistake, the performance problem was real.
Good news - I found the cause. Better news - this may be good for a performance boost on StackVM and possibly Cog also.
thanks!
The performance hit was due almost entirely to InterpreterPrimitives>>failed, and perhaps a little bit to #successful and #success: also.
This issue with #failed is due to "^primFailCode ~= 0" which, for purposes of C translation, can be recoded as "^primFailCode" with an override in the simulator as "^primFailCode ~= 0". This produces a significant speed improvement, at least as fast as for the original interpreter implementation using successFlag.
Note that with the Cog code generator and for the purposes of the simulator this can read
failed <api> ^self cCode: [primFailCode] inSmalltalk: [primFailCode ~= 0]
The Cog inliner maps self cCode: aCBlock inSmalltalk: anStBlock to aCBlock at TMethod creation time, hence avoiding the inability to inline cCode:inSmallalk:. See MessageNode>>asTranslatorNode: in the Cog VMMaker. I'll integrate as such in Cog.
I expect that the same change applied to StackInterpreter may give a similar 10% improvement (though I have not tried it). I don't know what to expect with Cog, but it may give a boost there as well.
Changes attached, also included in VMMaker-dtl.237 on SqueakSource.
Dave
On Tue, May 24, 2011 at 09:07:30AM -0700, Eliot Miranda wrote:
On Mon, May 23, 2011 at 8:42 PM, David T. Lewis lewis@mail.msen.com wrote:
The performance hit was due almost entirely to InterpreterPrimitives>>failed, and perhaps a little bit to #successful and #success: also.
This issue with #failed is due to "^primFailCode ~= 0" which, for purposes of C translation, can be recoded as "^primFailCode" with an override in the simulator as "^primFailCode ~= 0". This produces a significant speed improvement, at least as fast as for the original interpreter implementation using successFlag.
Note that with the Cog code generator and for the purposes of the simulator this can read
failed
<api> ^self cCode: [primFailCode] inSmalltalk: [primFailCode ~= 0]
The Cog inliner maps self cCode: aCBlock inSmalltalk: anStBlock to aCBlock at TMethod creation time, hence avoiding the inability to inline cCode:inSmallalk:. See MessageNode>>asTranslatorNode: in the Cog VMMaker. I'll integrate as such in Cog.
Thanks. I had some problems with inlining when I wrote it that way, so I had to back off to just using an override in the simulator. I'll look to pick up the appropriate fixes for this from Cog as merging proceeds.
BTW I, did not actually test the simulator after doing this, hope I did not break anything for Craig's Spoon work ;)
Dave
...I did not actually test the simulator after doing this, hope I did not break anything for Craig's Spoon work ;)
Oh, if you did I'm sure I'll let you know about it a few days after you've fixed it. :) Actually, now I'm looking forward to ephemeron conflicts. :)
-C
-- Craig Latta www.netjam.org/resume +31 06 2757 7177 + 1 415 287 3547
On 24 May 2011 20:19, Craig Latta craig@netjam.org wrote:
...I did not actually test the simulator after doing this, hope I did not break anything for Craig's Spoon work ;)
Oh, if you did I'm sure I'll let you know about it a few days after you've fixed it. :) Actually, now I'm looking forward to ephemeron conflicts. :)
What conflicts?
Please elaborate :)
Hi Igor--
...now I'm looking forward to ephemeron conflicts. :)
What conflicts?
Please elaborate :)
Oh, none yet, I just suspect there will be some with the stuff I wrote to GC stale methods.
-C
-- Craig Latta www.netjam.org/resume +31 06 2757 7177 + 1 415 287 3547
On 25 May 2011 00:51, Craig Latta craig@netjam.org wrote:
Hi Igor--
...now I'm looking forward to ephemeron conflicts. :)
What conflicts?
Please elaborate :)
Oh, none yet, I just suspect there will be some with the stuff I wrote to GC stale methods.
Yes. This could be a problem. Consider a following:
MyClass>>someMethod ^ #( 'abc' 'def' )
ephemeron := Ephemeron new key: self someMethod first value: somethingElse.
So, we created an ephemeron, whose key are object which came from method's literals. And even worse, as it shown above, it could be not a direct literal, but nested object.
Now, if you GC this stale #someMethod , it will apparently turn ephemeron's value to be weakly referenced, and its key will be lost and replaced by nil.
To circumvent that, you have to make sure that all literals which kept by method are still reachable from roots by other means. Another approach is to detect and do something with such problematic ephemerons, but as example shows, this could be tricky.
Btw this will happen with other weak refs as well.
array := WeakArray with: self someMethod first.
do you have a solution for that?
Of course it depends on your intents.
On Tue, May 24, 2011 at 5:34 PM, Igor Stasenko siguctua@gmail.com wrote:
On 25 May 2011 00:51, Craig Latta craig@netjam.org wrote:
Hi Igor--
...now I'm looking forward to ephemeron conflicts. :)
What conflicts?
Please elaborate :)
Oh, none yet, I just suspect there will be some with the stuff I
wrote to GC stale methods.
Yes. This could be a problem. Consider a following:
MyClass>>someMethod ^ #( 'abc' 'def' )
ephemeron := Ephemeron new key: self someMethod first value: somethingElse.
So, we created an ephemeron, whose key are object which came from method's literals. And even worse, as it shown above, it could be not a direct literal, but nested object.
Now, if you GC this stale #someMethod , it will apparently turn ephemeron's value to be weakly referenced, and its key will be lost and replaced by nil.
Uh, no. The ephemeron refers to the string 'abc' that happened to be referenced by the method. But that string won't be garbage collected until there are no references to it in the system, including from the ephemeron. i.e. the ephemeron will either need to nil its key or itself be collected before the 'abc' string can be collected. There is no magic here with references to objects from methods. In Smalltalk, methods are just objects. [and in the Cog VM there is a little bit of chicanery to preserve the illusion that there are no machien code methods involved, but that's what it does; hide the machine code].
To circumvent that, you have to make sure that all literals which kept by method are still reachable from roots by other means. Another approach is to detect and do something with such problematic ephemerons, but as example shows, this could be tricky.
Btw this will happen with other weak refs as well.
array := WeakArray with: self someMethod first.
do you have a solution for that?
Of course it depends on your intents.
-- Best regards, Igor Stasenko AKA sig.
On 25 May 2011 02:40, Eliot Miranda eliot.miranda@gmail.com wrote:
On Tue, May 24, 2011 at 5:34 PM, Igor Stasenko siguctua@gmail.com wrote:
On 25 May 2011 00:51, Craig Latta craig@netjam.org wrote:
Hi Igor--
...now I'm looking forward to ephemeron conflicts. :)
What conflicts?
Please elaborate :)
Oh, none yet, I just suspect there will be some with the stuff I wrote to GC stale methods.
Yes. This could be a problem. Consider a following:
MyClass>>someMethod ^ #( 'abc' 'def' )
ephemeron := Ephemeron new key: self someMethod first value: somethingElse.
So, we created an ephemeron, whose key are object which came from method's literals. And even worse, as it shown above, it could be not a direct literal, but nested object.
Now, if you GC this stale #someMethod , it will apparently turn ephemeron's value to be weakly referenced, and its key will be lost and replaced by nil.
Uh, no. The ephemeron refers to the string 'abc' that happened to be referenced by the method. But that string won't be garbage collected until there are no references to it in the system, including from the ephemeron. i.e. the ephemeron will either need to nil its key or itself be collected before the 'abc' string can be collected.
No. Ephemeron's key held weakly. Right? So, if key points to that 'abc' , and the only strong reference to 'abc' is via such method, then if you remove method from system, the ephemeron's key will be replaced by nil .. and rest of logic which follows, but its not interesting..
Because same will happen if you just use weak objects:
array := WeakArray with: (self someMethod first) "which answers a literal from that method".
now, imagine that you want to temporarily unload such "stale" method from memory, but make sure that system pretends that it is still in memory (if it is intended), and still reachable from roots, you have to do something to make sure that weak ref, held by array are not gone.
So, if you can solve this problem for usual weak refs, then you solve that for ephemerons too.
There is no magic here with references to objects from methods. In Smalltalk, methods are just objects. [and in the Cog VM there is a little bit of chicanery to preserve the illusion that there are no machien code methods involved, but that's what it does; hide the machine code].
On Tue, May 24, 2011 at 5:57 PM, Igor Stasenko siguctua@gmail.com wrote:
On 25 May 2011 02:40, Eliot Miranda eliot.miranda@gmail.com wrote:
On Tue, May 24, 2011 at 5:34 PM, Igor Stasenko siguctua@gmail.com
wrote:
On 25 May 2011 00:51, Craig Latta craig@netjam.org wrote:
Hi Igor--
...now I'm looking forward to ephemeron conflicts. :)
What conflicts?
Please elaborate :)
Oh, none yet, I just suspect there will be some with the stuff I
wrote to GC stale methods.
Yes. This could be a problem. Consider a following:
MyClass>>someMethod ^ #( 'abc' 'def' )
ephemeron := Ephemeron new key: self someMethod first value: somethingElse.
So, we created an ephemeron, whose key are object which came from method's literals. And even worse, as it shown above, it could be not a direct literal, but nested object.
Now, if you GC this stale #someMethod , it will apparently turn ephemeron's value to be weakly referenced, and its key will be lost and replaced by nil.
Uh, no. The ephemeron refers to the string 'abc' that happened to be
referenced by the method. But that string won't be garbage collected until there are no references to it in the system, including from the ephemeron. i.e. the ephemeron will either need to nil its key or itself be collected before the 'abc' string can be collected.
No. Ephemeron's key held weakly. Right?
No. An ephemeron refers to its key strongly, but it is notified when the only references to its key are from the keys of ephemerons. The key of an ephemeron won't be collected unless the ephemeron nils its key on finalization, or removes itself from whereever it lived on finalization.
Ephemerons are *not the same* as weak references. They are like guardians. They are triggers that fire when the objects to which their keys refer are only reachable from their keys. But they hold onto their keys (and all their other references) strongly. Essentially they are used to detect when they are the last reference to some object. But by holding onto that object they allow it to be finalized.
Take the classic case of finalization applied to, say, a file. The issue is that we want the file's buffer to be flushed when it is finalized, not just the file descriptor to be closed. If we use post-mortem finalization (where we finalize a surrogate object when we detect that the actual object has been GCed) then we have to duplicate the buffer update in the surrogate (we can share the buffer between the actual file and its surrogate, but we must update e.g. the file pointer into the buffer) and so do more work keeping the surrogate in sync with the actual file. With ephemerons however, there is no need to use a surrogate; we use the actual object. When its ephemeron detects (actually the GC detects, but its the ephemeron that gets notified) the file is unreachable it finalizes its file and removes itself from the ephemeron registry. When the GC next runs the ephemeron is collected, and because the ephemeron is the only reference to the file, the file gets collected too.
This example also shows that for safe use of ephemerons one needs to run a full garbage collection and subsequent finalizations before quitting the system, to ensure that uses like the above get performed.
So, if key points to that 'abc' , and the only strong reference to
'abc' is via such method, then if you remove method from system, the ephemeron's key will be replaced by nil .. and rest of logic which follows, but its not interesting..
No. That's not correct. See above. Ephemerons are /not/ like weak references. They're related, but they don't behave at all the same.
HTH Eliot
Because same will happen if you just use weak objects:
array := WeakArray with: (self someMethod first) "which answers a literal from that method".
now, imagine that you want to temporarily unload such "stale" method from memory, but make sure that system pretends that it is still in memory (if it is intended), and still reachable from roots, you have to do something to make sure that weak ref, held by array are not gone.
So, if you can solve this problem for usual weak refs, then you solve that for ephemerons too.
There is no magic here with references to objects from methods. In
Smalltalk, methods are just objects. [and in the Cog VM there is a little bit of chicanery to preserve the illusion that there are no machien code methods involved, but that's what it does; hide the machine code].
-- Best regards, Igor Stasenko AKA sig.
On 25 May 2011 03:12, Eliot Miranda eliot.miranda@gmail.com wrote:
On Tue, May 24, 2011 at 5:57 PM, Igor Stasenko siguctua@gmail.com wrote:
On 25 May 2011 02:40, Eliot Miranda eliot.miranda@gmail.com wrote:
On Tue, May 24, 2011 at 5:34 PM, Igor Stasenko siguctua@gmail.com wrote:
On 25 May 2011 00:51, Craig Latta craig@netjam.org wrote:
Hi Igor--
> ...now I'm looking forward to ephemeron conflicts. :) >
What conflicts?
Please elaborate :)
Oh, none yet, I just suspect there will be some with the stuff I wrote to GC stale methods.
Yes. This could be a problem. Consider a following:
MyClass>>someMethod ^ #( 'abc' 'def' )
ephemeron := Ephemeron new key: self someMethod first value: somethingElse.
So, we created an ephemeron, whose key are object which came from method's literals. And even worse, as it shown above, it could be not a direct literal, but nested object.
Now, if you GC this stale #someMethod , it will apparently turn ephemeron's value to be weakly referenced, and its key will be lost and replaced by nil.
Uh, no. The ephemeron refers to the string 'abc' that happened to be referenced by the method. But that string won't be garbage collected until there are no references to it in the system, including from the ephemeron. i.e. the ephemeron will either need to nil its key or itself be collected before the 'abc' string can be collected.
No. Ephemeron's key held weakly. Right?
No. An ephemeron refers to its key strongly, but it is notified when the only references to its key are from the keys of ephemerons. The key of an ephemeron won't be collected unless the ephemeron nils its key on finalization, or removes itself from whereever it lived on finalization.
Ah. Ok. Then my implementation is not correct, since in it the key are assumed to be held weakly. It is strange that such detail slept through my mind unnoticed. I will think what should be changed to correct that.
Ephemerons are *not the same* as weak references. They are like guardians. They are triggers that fire when the objects to which their keys refer are only reachable from their keys. But they hold onto their keys (and all their other references) strongly. Essentially they are used to detect when they are the last reference to some object. But by holding onto that object they allow it to be finalized. Take the classic case of finalization applied to, say, a file. The issue is that we want the file's buffer to be flushed when it is finalized, not just the file descriptor to be closed. If we use post-mortem finalization (where we finalize a surrogate object when we detect that the actual object has been GCed) then we have to duplicate the buffer update in the surrogate (we can share the buffer between the actual file and its surrogate, but we must update e.g. the file pointer into the buffer) and so do more work keeping the surrogate in sync with the actual file. With ephemerons however, there is no need to use a surrogate; we use the actual object. When its ephemeron detects (actually the GC detects, but its the ephemeron that gets notified) the file is unreachable it finalizes its file and removes itself from the ephemeron registry. When the GC next runs the ephemeron is collected, and because the ephemeron is the only reference to the file, the file gets collected too. This example also shows that for safe use of ephemerons one needs to run a full garbage collection and subsequent finalizations before quitting the system, to ensure that uses like the above get performed.
I agree that detecting an "almost collectable" property is useful. However it is certainly could be worked around. As Squeak history shows it is possible.
So, if key points to that 'abc' , and the only strong reference to 'abc' is via such method, then if you remove method from system, the ephemeron's key will be replaced by nil .. and rest of logic which follows, but its not interesting..
No. That's not correct. See above. Ephemerons are /not/ like weak references. They're related, but they don't behave at all the same. HTH Eliot
Because same will happen if you just use weak objects:
array := WeakArray with: (self someMethod first) "which answers a literal from that method".
now, imagine that you want to temporarily unload such "stale" method from memory, but make sure that system pretends that it is still in memory (if it is intended), and still reachable from roots, you have to do something to make sure that weak ref, held by array are not gone.
So, if you can solve this problem for usual weak refs, then you solve that for ephemerons too.
There is no magic here with references to objects from methods. In Smalltalk, methods are just objects. [and in the Cog VM there is a little bit of chicanery to preserve the illusion that there are no machien code methods involved, but that's what it does; hide the machine code].
-- Best regards, Igor Stasenko AKA sig.
vm-dev@lists.squeakfoundation.org