Hi,
I'm mostly ready with this cleanup, only a few methods left in the image that use #== for integer comparison. Some of them must use #==, others may be changed, but I wasn't sure if the code will work if it's changed or not. If you'd like to check these methods, then evaluate the following in a workspace:
| hasMatch visitor | hasMatch := false. visitor := ParseNodeEnumerator ofBlock: [ :node | (node isMessageNode and: [ ((#(== ~~) identityIncludes: node selector key) and: [ node receiver isConstantNumber or: [ node arguments first isConstantNumber ] ]) ]) ifTrue: [ hasMatch := true ] ] select: [ :node | hasMatch not ]. SystemNavigation default browseAllSelect: [ :method | hasMatch := false. method decompile accept: visitor. hasMatch ]
Cheers, Levente
If I may ask, what is the advantage of changing integer comparisons from #== to #=? (I might have missed the discussion leading up to it; if there was one just point me to it)
Cheers, - Andreas
On 11/15/2010 9:00 PM, Levente Uzonyi wrote:
Hi,
I'm mostly ready with this cleanup, only a few methods left in the image that use #== for integer comparison. Some of them must use #==, others may be changed, but I wasn't sure if the code will work if it's changed or not. If you'd like to check these methods, then evaluate the following in a workspace:
| hasMatch visitor | hasMatch := false. visitor := ParseNodeEnumerator ofBlock: [ :node | (node isMessageNode and: [ ((#(== ~~) identityIncludes: node selector key) and: [ node receiver isConstantNumber or: [ node arguments first isConstantNumber ] ]) ]) ifTrue: [ hasMatch := true ] ] select: [ :node | hasMatch not ]. SystemNavigation default browseAllSelect: [ :method | hasMatch := false. method decompile accept: visitor. hasMatch ]
Cheers, Levente
On Mon, 15 Nov 2010, Andreas Raab wrote:
If I may ask, what is the advantage of changing integer comparisons from #== to #=? (I might have missed the discussion leading up to it; if there was one just point me to it)
Originally here: http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-February/100600.... . The mantis issue is here: http://bugs.squeak.org/view.php?id=2788 .
Levente
Cheers,
- Andreas
On 11/15/2010 9:00 PM, Levente Uzonyi wrote:
Hi,
I'm mostly ready with this cleanup, only a few methods left in the image that use #== for integer comparison. Some of them must use #==, others may be changed, but I wasn't sure if the code will work if it's changed or not. If you'd like to check these methods, then evaluate the following in a workspace:
| hasMatch visitor | hasMatch := false. visitor := ParseNodeEnumerator ofBlock: [ :node | (node isMessageNode and: [ ((#(== ~~) identityIncludes: node selector key) and: [ node receiver isConstantNumber or: [ node arguments first isConstantNumber ] ]) ]) ifTrue: [ hasMatch := true ] ] select: [ :node | hasMatch not ]. SystemNavigation default browseAllSelect: [ :method | hasMatch := false. method decompile accept: visitor. hasMatch ]
Cheers, Levente
Hi Levente,
Levente Uzonyi wrote:
Hi,
I'm mostly ready with this cleanup, only a few methods left in the image that use #== for integer comparison. Some of them must use #==, others may be changed, but I wasn't sure if the code will work if it's changed or not. If you'd like to check these methods, then evaluate the following in a workspace:
... code here...
Cheers, Levente
Thanks for the snippet! It is now a method in Cuis. I also checked a bit on trunk. In all senders of #nextObject you can apply the pattern in #allObjectsDo:. Instead of assuming == 0 for last #nextObject, it assumes that Object new will go at the end. If that ever changed, a few places would need fixing...
As an experiment, in #allObjectsDo: I tried to remove the Object new stuff and replace with '[ 0 = object and: [ object isMemberOf: SmallInteger ]] whileFalse: ', but my test became extremely slow. I suspect #= is sending the message even if it has its own bytecode... So, the only method that remains with the '0 ==' pattern in Cuis is #critical:ifLocked: , as I'm not sure if we can clean it.
Cheers, Juan Vuletich
On Tue, 16 Nov 2010, Juan Vuletich wrote:
Hi Levente,
Levente Uzonyi wrote:
Hi,
I'm mostly ready with this cleanup, only a few methods left in the image that use #== for integer comparison. Some of them must use #==, others may be changed, but I wasn't sure if the code will work if it's changed or not. If you'd like to check these methods, then evaluate the following in a workspace:
... code here...
Cheers, Levente
Thanks for the snippet! It is now a method in Cuis. I also checked a bit on trunk. In all senders of #nextObject you can apply the pattern in #allObjectsDo:. Instead of assuming == 0 for last #nextObject, it assumes that Object new will go at the end. If that ever changed, a few places would need fixing...
I'm not sure if it's ok to use that pattern in ImageSegment. Even if it's ok, the code is so messy, that it seems to be hard to apply the pattern without breaking the code.
As an experiment, in #allObjectsDo: I tried to remove the Object new stuff and replace with '[ 0 = object and: [ object isMemberOf: SmallInteger ]] whileFalse: ', but my test became extremely slow. I suspect #= is sending the message even if it has its own bytecode... So, the only method that remains with the '0 ==' pattern in Cuis is #critical:ifLocked: , as I'm not sure if we can clean it.
#= is a special selector, so I'm sure it won't send a message if it doesn't have to. It's possible that #= is a bit slower than #== because it has to do some extra checks, but the difference is probably neglible. Here's a benchmark:
| offset equality identity | Smalltalk garbageCollect. offset := [ 1 to: 1000000 do: [ :i | ] ] timeToRun. equality := [ :x | [ 1 to: 1000000 do: [ :i | 0 = x ] ] timeToRun ]. identity := [ :x | [ 1 to: 1000000 do: [ :i | 0 == x ] ] timeToRun ]. { 0. 1. SmallInteger maxVal. SmallInteger maxVal + 1. nil. Object new. 0.0. Array new. 1/2 } collect: [ :each | each -> ({ equality value: each. identity value: each. equality value: each. identity value: each. equality value: each. identity value: each } - offset) ].
And my results on CogVM:
{ 0->#(3 2 3 1 3 2). 1->#(2 4 3 0 3 4). 1073741823->#(3 3 2 1 3 5). 1073741824->#(221 4 223 1 223 4). nil->#(14 4 14 0 15 4). an Object->#(14 5 13 1 14 4). 0.0->#(622 5 623 2 624 4). #()->#(16 5 14 1 16 4). (1/2)->#(260 4 259 1 258 5) }
So the cause of the slowdown in your #allObjectsDo: implementation is that #= is only fast if both the receiver and the argument are SmallIntegers, otherwise there will be message sends.
Another way to implement #allObjectsDo: without the marker object is to replace the loop condition with: object class == SmallInteger. It's actually the same as your implementation without the #= check. #isMemberOf: is optimized to non-real sends. This is as fast as the current implementation on my pc.
Levente
Cheers, Juan Vuletich
On Tue, Nov 16, 2010 at 3:32 PM, Levente Uzonyi leves@elte.hu wrote:
On Tue, 16 Nov 2010, Juan Vuletich wrote:
Hi Levente,
Levente Uzonyi wrote:
Hi,
I'm mostly ready with this cleanup, only a few methods left in the image that use #== for integer comparison. Some of them must use #==, others may be changed, but I wasn't sure if the code will work if it's changed or not. If you'd like to check these methods, then evaluate the following in a workspace:
... code here...
Cheers, Levente
Thanks for the snippet! It is now a method in Cuis. I also checked a bit on trunk. In all senders of #nextObject you can apply the pattern in #allObjectsDo:. Instead of assuming == 0 for last #nextObject, it assumes that Object new will go at the end. If that ever changed, a few places would need fixing...
I'm not sure if it's ok to use that pattern in ImageSegment. Even if it's ok, the code is so messy, that it seems to be hard to apply the pattern without breaking the code.
As an experiment, in #allObjectsDo: I tried to remove the Object new stuff and replace with '[ 0 = object and: [ object isMemberOf: SmallInteger ]] whileFalse: ', but my test became extremely slow. I suspect #= is sending the message even if it has its own bytecode... So, the only method that remains with the '0 ==' pattern in Cuis is #critical:ifLocked: , as I'm not sure if we can clean it.
#= is a special selector, so I'm sure it won't send a message if it doesn't have to.
Um, on Cog it sends a message if the method containing the #= has been jitted. But the issue is /not/ whether there is a send or not. The issue is what the behaviour of the primitive code is. In the interpreter #= will short-circuit (avoid the send) if both the receiver and the argument are either a SmallInteger or a Float. In the JIT the primitive will not fail if the receiver and argument are both SmallIntegers or Floats or if the receiver is a Float and the argument is an Integer, but will fail if the receiver is a SmallInteger and the argument is a Float.
In general the VM is free to fail for non-matching numeric types (this is historical, derived from the blue book) so I don't feel Cog is at fault here. Certainly it is not answering incorrect results; it is simply taking more sends to produce the result in some circumstances. I think in cases like this (where you're enumerating over all objects) using #== is a wiser choice.
HTH
It's possible that #= is a bit slower than #== because it has to do some
extra checks, but the difference is probably neglible. Here's a benchmark:
| offset equality identity | Smalltalk garbageCollect. offset := [ 1 to: 1000000 do: [ :i | ] ] timeToRun. equality := [ :x | [ 1 to: 1000000 do: [ :i | 0 = x ] ] timeToRun ]. identity := [ :x | [ 1 to: 1000000 do: [ :i | 0 == x ] ] timeToRun ]. { 0. 1. SmallInteger maxVal. SmallInteger maxVal + 1. nil. Object new. 0.0. Array new. 1/2 } collect: [ :each | each -> ({ equality value: each. identity value: each. equality value: each. identity value: each. equality value: each. identity value: each } - offset) ].
And my results on CogVM:
{ 0->#(3 2 3 1 3 2). 1->#(2 4 3 0 3 4). 1073741823->#(3 3 2 1 3 5). 1073741824->#(221 4 223 1 223 4). nil->#(14 4 14 0 15 4). an Object->#(14 5 13 1 14 4). 0.0->#(622 5 623 2 624 4). #()->#(16 5 14 1 16 4). (1/2)->#(260 4 259 1 258 5) }
I'd do many more iterations. I wouldn't put any faith any millisecond numbers that are less than 3 digits (100 ms).
So the cause of the slowdown in your #allObjectsDo: implementation is that #= is only fast if both the receiver and the argument are SmallIntegers, otherwise there will be message sends.
Another way to implement #allObjectsDo: without the marker object is to replace the loop condition with: object class == SmallInteger. It's actually the same as your implementation without the #= check. #isMemberOf: is optimized to non-real sends. This is as fast as the current implementation on my pc.
Levente
Cheers, Juan Vuletich
On Tue, 16 Nov 2010, Eliot Miranda wrote:
Um, on Cog it sends a message if the method containing the #= has been jitted. But the issue is /not/ whether there is a send or not. The issue
Does it mean, that #= can't be used in "atomic" code anymore?
Levente
is what the behaviour of the primitive code is. In the interpreter #= will short-circuit (avoid the send) if both the receiver and the argument are either a SmallInteger or a Float. In the JIT the primitive will not fail if the receiver and argument are both SmallIntegers or Floats or if the receiver is a Float and the argument is an Integer, but will fail if the receiver is a SmallInteger and the argument is a Float.
snip
On Tue, Nov 16, 2010 at 4:21 PM, Levente Uzonyi leves@elte.hu wrote:
On Tue, 16 Nov 2010, Eliot Miranda wrote:
Um, on Cog it sends a message if the method containing the #= has been
jitted. But the issue is /not/ whether there is a send or not. The issue
Does it mean, that #= can't be used in "atomic" code anymore?
But it couldn't before. You could only use it if the classes and ranges were just right. For example, #= isn't "atomic" for large integers, and the #= primitive will fail for large integers > 64 bits. In assuming #= is "atomic" for SmallInteger x Float you're relying on an optimization in the Interpreter. Look at the Blue book and you won't see this behaviour. Read the ANSI spec and you won't see this behaviour specified. So you're juts getting away with it on one particular implementation. (IMO)
Again IMO, you /can/ assume that SmallInteger>#= and Float>#= won't fail if their argument is of the same class as the receiver, but you can't assume that there won't be a send.
best Eliot
Levente
is what the behaviour of the primitive code is. In the interpreter #=
will short-circuit (avoid the send) if both the receiver and the argument are either a SmallInteger or a Float. In the JIT the primitive will not fail if the receiver and argument are both SmallIntegers or Floats or if the receiver is a Float and the argument is an Integer, but will fail if the receiver is a SmallInteger and the argument is a Float.
snip
On 11/16/2010 4:21 PM, Levente Uzonyi wrote:
On Tue, 16 Nov 2010, Eliot Miranda wrote:
Um, on Cog it sends a message if the method containing the #= has been jitted. But the issue is /not/ whether there is a send or not. The issue
Does it mean, that #= can't be used in "atomic" code anymore?
It never was. In the old (pre-closure) days you might have gotten away with using #= only because of the context cache which would avoid creating a new context if a previous one could be recycled. But code like:
obj := self someObject. [0 = obj nextObject] whileFalse:[count := count + 1].
really should never have worked to begin with because: * sending #= will create a context (unless recycled) * the plus in the counter will create a context (unless count is in SmallInt range) * the addition may create a new object (unless count in SmallInt range) * either one of the blocks should create contexts (unless optimized)
There's a long list of reasons why code such as the above is simply broken. You *must* use an end marker, i.e.,
last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ].
This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no.
Cheers, - Andreas
On Tue, 16 Nov 2010, Andreas Raab wrote:
On 11/16/2010 4:21 PM, Levente Uzonyi wrote:
On Tue, 16 Nov 2010, Eliot Miranda wrote:
Um, on Cog it sends a message if the method containing the #= has been jitted. But the issue is /not/ whether there is a send or not. The issue
Does it mean, that #= can't be used in "atomic" code anymore?
It never was. In the old (pre-closure) days you might have gotten away with using #= only because of the context cache which would avoid creating a new context if a previous one could be recycled. But code like:
obj := self someObject. [0 = obj nextObject] whileFalse:[count := count + 1].
I wasn't clear when I said atomic code. I expected #= (and #<, #>, etc) to _not_ be a real message send when both the receiver and the argument are SmallIntegers. Otherwise what's the point of having separate bytecodes for them?
Here's some example code which I expected to be atomic:
| x | x := aSmallInteger. "really a SmallInteger" x > 0 ifTrue: [ x := x - 1 ] ifFalse: [ x = 0 ifFalse: [ x := 0 ] ]
really should never have worked to begin with because:
- sending #= will create a context (unless recycled)
- the plus in the counter will create a context (unless count is in SmallInt
range)
- the addition may create a new object (unless count in SmallInt range)
- either one of the blocks should create contexts (unless optimized)
There's a long list of reasons why code such as the above is simply broken. You *must* use an end marker, i.e.,
last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ].
This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no.
To make it clear: I didn't want to change this code at all, I was just discussing about the current and previous implenetation with Juan.
Levente
Cheers,
- Andreas
On 11/16/2010 8:05 PM, Levente Uzonyi wrote:
I wasn't clear when I said atomic code. I expected #= (and #<, #>, etc) to _not_ be a real message send when both the receiver and the argument are SmallIntegers. Otherwise what's the point of having separate bytecodes for them?
Space. It makes a big difference for the most common selectors (#at:, #at:put:, #size, #+ etc) to be be encoded as bytecodes. It avoids having to allocate a literal every time you see the selector. Often, the special selector bytecodes look like this:
Interpreter>>bytecodePrimNextPut messageSelector := self specialSelector: 20. argumentCount := 1. self normalSend.
I.e., it just dispatches to normalSend where the regular lookup takes place. Of course, that also means it's a prime place for an optimization that will evaluate eagerly for known receiver types and so (over time) optimizations were added, but many of the optimizations that may make sense in an interpreter have very different tradeoffs in the jit. For a jit to generate the level of optimization makes no sense because the code size simply explodes at no benefit if the inline caches are any good (ours *are* the best Eliot knows how to do and that is a meaningful statement).
On to a finer point. The terminology "real message send" is misleading. Generally, we (the VM hackers) mean by "real" send a send that requires a method activation, i.e., the creation of a context, but *not* the lookup of the method. That excludes for example all (successful) primitives from being "real sends", and as a consequence writing "1 + 2" is not a real send by that measure (with or without the bytecode present) since the primitive will be executed successfully and no "real" send (method activation) has taken place.
To make matters more complicated, when we talk about "real" sends in the context of thread switches, semaphores and critical sections, what we mean is whether there is a suspension point in the send or not. Obviously, some primitives (#suspend, #yield) must have suspension points so not all activation-free methods are also suspension-point-free. I am not entirely sure what the current set of rules for suspension points in Cog is; in the interpreter it was part of the activation sequence so any primitive that isn't process related would not have a suspension point but I don't know if that's still true in Cog.
Cheers, - Andreas
On Tue, 16 Nov 2010, Andreas Raab wrote:
On 11/16/2010 8:05 PM, Levente Uzonyi wrote:
I wasn't clear when I said atomic code. I expected #= (and #<, #>, etc) to _not_ be a real message send when both the receiver and the argument are SmallIntegers. Otherwise what's the point of having separate bytecodes for them?
Space. It makes a big difference for the most common selectors (#at:, #at:put:, #size, #+ etc) to be be encoded as bytecodes. It avoids having to allocate a literal every time you see the selector. Often, the special
I just evaluated this:
| count specialSelectors | count := 0. specialSelectors := Smalltalk specialSelectors select: [ :each | each isSymbol ]. CompiledMethod allInstancesDo: [ :method | | messages | messages := method messages. count := count + (specialSelectors count: [ :selector | messages includes: selector ]) ]. count
The result is 50947 for a slightly modified Trunk image. This means that this technique saves less than 200kB (assuming 32-bit slots in the literal frame), but it uses 32 bytecodes.
200 kB is not much compared to the size of the image (about 1-2%).
Also the 32 most frequently used methods are not the 32 special selectors.
| b mostFrequentSelectors specialSelectors | b := Bag new. CompiledMethod allInstancesDo: [ :method | b addAll: method messages ]. mostFrequentSelectors := (b sortedCounts first: 32) replace: #value. specialSelectors := Smalltalk specialSelectors select: #isSymbol. { specialSelectors difference: mostFrequentSelectors. "Shouldn't be special" mostFrequentSelectors difference: specialSelectors. "Should be special" }.
#( #(#'>=' #'~=' #/ #'\' #bitShift: #'//' #bitAnd: #bitOr: #next #atEnd #blockCopy: #value #value: #x #y) #(#, #assert: #first #name #add: #nextPutAll: #isEmpty #error: #asString #includes: #default #translated #not #on: #collect:))
Another 40kB could be saved by changing these.
Btw, there's a "free" bytecode: 200 - #blockCopy:.
selector bytecodes look like this:
Interpreter>>bytecodePrimNextPut messageSelector := self specialSelector: 20. argumentCount := 1. self normalSend.
I.e., it just dispatches to normalSend where the regular lookup takes place. Of course, that also means it's a prime place for an optimization that will evaluate eagerly for known receiver types and so (over time) optimizations were added, but many of the optimizations that may make sense in an interpreter have very different tradeoffs in the jit. For a jit to generate the level of optimization makes no sense because the code size simply explodes at no benefit if the inline caches are any good (ours *are* the best Eliot knows how to do and that is a meaningful statement).
On to a finer point. The terminology "real message send" is misleading. Generally, we (the VM hackers) mean by "real" send a send that requires a method activation, i.e., the creation of a context, but *not* the lookup of the method. That excludes for example all (successful) primitives from being "real sends", and as a consequence writing "1 + 2" is not a real send by that measure (with or without the bytecode present) since the primitive will be executed successfully and no "real" send (method activation) has taken place.
To make matters more complicated, when we talk about "real" sends in the context of thread switches, semaphores and critical sections, what we mean is whether there is a suspension point in the send or not. Obviously, some primitives (#suspend, #yield) must have suspension points so not all activation-free methods are also suspension-point-free. I am not entirely sure what the current set of rules for suspension points in Cog is; in the interpreter it was part of the activation sequence so any primitive that isn't process related would not have a suspension point but I don't know if that's still true in Cog.
Thanks, this was very informative.
Levente
Cheers,
- Andreas
On 11/16/2010 9:52 PM, Levente Uzonyi wrote:
On Tue, 16 Nov 2010, Andreas Raab wrote:
On 11/16/2010 8:05 PM, Levente Uzonyi wrote:
I wasn't clear when I said atomic code. I expected #= (and #<, #>, etc) to _not_ be a real message send when both the receiver and the argument are SmallIntegers. Otherwise what's the point of having separate bytecodes for them?
Space. It makes a big difference for the most common selectors (#at:, #at:put:, #size, #+ etc) to be be encoded as bytecodes. It avoids having to allocate a literal every time you see the selector. Often, the special
I just evaluated this:
| count specialSelectors | count := 0. specialSelectors := Smalltalk specialSelectors select: [ :each | each isSymbol ]. CompiledMethod allInstancesDo: [ :method | | messages | messages := method messages. count := count + (specialSelectors count: [ :selector | messages includes: selector ]) ]. count
The result is 50947 for a slightly modified Trunk image. This means that this technique saves less than 200kB (assuming 32-bit slots in the literal frame), but it uses 32 bytecodes.
Heh, heh. Yes, I realized that after I sent my messages. My comment applies to the "old days" when memory was at a prime and such optimizations made sense. Nowadays for all practical purposes that arguments holds no water.
200 kB is not much compared to the size of the image (about 1-2%).
Also the 32 most frequently used methods are not the 32 special selectors.
Yup. That changes too over time.
| b mostFrequentSelectors specialSelectors | b := Bag new. CompiledMethod allInstancesDo: [ :method | b addAll: method messages ]. mostFrequentSelectors := (b sortedCounts first: 32) replace: #value. specialSelectors := Smalltalk specialSelectors select: #isSymbol. { specialSelectors difference: mostFrequentSelectors. "Shouldn't be special" mostFrequentSelectors difference: specialSelectors. "Should be special" }.
#( #(#'>=' #'~=' #/ #'\' #bitShift: #'//' #bitAnd: #bitOr: #next #atEnd #blockCopy: #value #value: #x #y) #(#, #assert: #first #name #add: #nextPutAll: #isEmpty #error: #asString #includes: #default #translated #not #on: #collect:))
Another 40kB could be saved by changing these.
Btw, there's a "free" bytecode: 200 - #blockCopy:.
Indeed, it's no longer used for closures (but it is part of the ST80 spec).
Cheers, - Andreas
selector bytecodes look like this:
Interpreter>>bytecodePrimNextPut messageSelector := self specialSelector: 20. argumentCount := 1. self normalSend.
I.e., it just dispatches to normalSend where the regular lookup takes place. Of course, that also means it's a prime place for an optimization that will evaluate eagerly for known receiver types and so (over time) optimizations were added, but many of the optimizations that may make sense in an interpreter have very different tradeoffs in the jit. For a jit to generate the level of optimization makes no sense because the code size simply explodes at no benefit if the inline caches are any good (ours *are* the best Eliot knows how to do and that is a meaningful statement).
On to a finer point. The terminology "real message send" is misleading. Generally, we (the VM hackers) mean by "real" send a send that requires a method activation, i.e., the creation of a context, but *not* the lookup of the method. That excludes for example all (successful) primitives from being "real sends", and as a consequence writing "1 + 2" is not a real send by that measure (with or without the bytecode present) since the primitive will be executed successfully and no "real" send (method activation) has taken place.
To make matters more complicated, when we talk about "real" sends in the context of thread switches, semaphores and critical sections, what we mean is whether there is a suspension point in the send or not. Obviously, some primitives (#suspend, #yield) must have suspension points so not all activation-free methods are also suspension-point-free. I am not entirely sure what the current set of rules for suspension points in Cog is; in the interpreter it was part of the activation sequence so any primitive that isn't process related would not have a suspension point but I don't know if that's still true in Cog.
Thanks, this was very informative.
Levente
Cheers,
- Andreas
On 17.11.2010, at 06:52, Levente Uzonyi wrote:
Also the 32 most frequently used methods are not the 32 special selectors.
| b mostFrequentSelectors specialSelectors | b := Bag new. CompiledMethod allInstancesDo: [ :method | b addAll: method messages ]. mostFrequentSelectors := (b sortedCounts first: 32) replace: #value. specialSelectors := Smalltalk specialSelectors select: #isSymbol. { specialSelectors difference: mostFrequentSelectors. "Shouldn't be special" mostFrequentSelectors difference: specialSelectors. "Should be special" }.
#( #(#'>=' #'~=' #/ #'\' #bitShift: #'//' #bitAnd: #bitOr: #next #atEnd #blockCopy: #value #value: #x #y) #(#, #assert: #first #name #add: #nextPutAll: #isEmpty #error: #asString #includes: #default #translated #not #on: #collect:))
Hehe, that just shows how unconcerned we have become with performance and memory ;)
- Bert -
On Tue, Nov 16, 2010 at 9:52 PM, Levente Uzonyi leves@elte.hu wrote:
On Tue, 16 Nov 2010, Andreas Raab wrote:
On 11/16/2010 8:05 PM, Levente Uzonyi wrote:
I wasn't clear when I said atomic code. I expected #= (and #<, #>, etc) to _not_ be a real message send when both the receiver and the argument are SmallIntegers. Otherwise what's the point of having separate bytecodes for them?
Space. It makes a big difference for the most common selectors (#at:, #at:put:, #size, #+ etc) to be be encoded as bytecodes. It avoids having to allocate a literal every time you see the selector. Often, the special
I just evaluated this:
| count specialSelectors | count := 0. specialSelectors := Smalltalk specialSelectors select: [ :each | each isSymbol ]. CompiledMethod allInstancesDo: [ :method | | messages | messages := method messages. count := count + (specialSelectors count: [ :selector | messages includes: selector ]) ]. count
The result is 50947 for a slightly modified Trunk image. This means that this technique saves less than 200kB (assuming 32-bit slots in the literal frame), but it uses 32 bytecodes.
200 kB is not much compared to the size of the image (about 1-2%).
Also the 32 most frequently used methods are not the 32 special selectors.
| b mostFrequentSelectors specialSelectors | b := Bag new. CompiledMethod allInstancesDo: [ :method | b addAll: method messages ]. mostFrequentSelectors := (b sortedCounts first: 32) replace: #value. specialSelectors := Smalltalk specialSelectors select: #isSymbol. { specialSelectors difference: mostFrequentSelectors. "Shouldn't be special" mostFrequentSelectors difference: specialSelectors. "Should be special" }.
#( #(#'>=' #'~=' #/ #'\' #bitShift: #'//' #bitAnd: #bitOr: #next #atEnd #blockCopy: #value #value: #x #y) #(#, #assert: #first #name #add: #nextPutAll: #isEmpty #error: #asString #includes: #default #translated #not #on: #collect:))
Another 40kB could be saved by changing these.
Btw, there's a "free" bytecode: 200 - #blockCopy:.
I would like to reserve this for a page of new bytecodes, some of which I'd like to use for adaptive optimization/speculative inlining work that Marcus, Colin and I are planning to start in the new year.
best Eliot
selector bytecodes look like this:
Interpreter>>bytecodePrimNextPut messageSelector := self specialSelector: 20. argumentCount := 1. self normalSend.
I.e., it just dispatches to normalSend where the regular lookup takes place. Of course, that also means it's a prime place for an optimization that will evaluate eagerly for known receiver types and so (over time) optimizations were added, but many of the optimizations that may make sense in an interpreter have very different tradeoffs in the jit. For a jit to generate the level of optimization makes no sense because the code size simply explodes at no benefit if the inline caches are any good (ours *are* the best Eliot knows how to do and that is a meaningful statement).
On to a finer point. The terminology "real message send" is misleading. Generally, we (the VM hackers) mean by "real" send a send that requires a method activation, i.e., the creation of a context, but *not* the lookup of the method. That excludes for example all (successful) primitives from being "real sends", and as a consequence writing "1 + 2" is not a real send by that measure (with or without the bytecode present) since the primitive will be executed successfully and no "real" send (method activation) has taken place.
To make matters more complicated, when we talk about "real" sends in the context of thread switches, semaphores and critical sections, what we mean is whether there is a suspension point in the send or not. Obviously, some primitives (#suspend, #yield) must have suspension points so not all activation-free methods are also suspension-point-free. I am not entirely sure what the current set of rules for suspension points in Cog is; in the interpreter it was part of the activation sequence so any primitive that isn't process related would not have a suspension point but I don't know if that's still true in Cog.
Thanks, this was very informative.
Levente
Cheers,
- Andreas
Hi Both,
On Tue, Nov 16, 2010 at 9:00 PM, Andreas Raab andreas.raab@gmx.de wrote:
On 11/16/2010 8:05 PM, Levente Uzonyi wrote:
I wasn't clear when I said atomic code. I expected #= (and #<, #>, etc) to _not_ be a real message send when both the receiver and the argument are SmallIntegers. Otherwise what's the point of having separate bytecodes for them?
Space. It makes a big difference for the most common selectors (#at:, #at:put:, #size, #+ etc) to be be encoded as bytecodes. It avoids having to allocate a literal every time you see the selector. Often, the special selector bytecodes look like this:
Interpreter>>bytecodePrimNextPut messageSelector := self specialSelector: 20. argumentCount := 1. self normalSend.
I.e., it just dispatches to normalSend where the regular lookup takes place. Of course, that also means it's a prime place for an optimization that will evaluate eagerly for known receiver types and so (over time) optimizations were added, but many of the optimizations that may make sense in an interpreter have very different tradeoffs in the jit. For a jit to generate the level of optimization makes no sense because the code size simply explodes at no benefit if the inline caches are any good (ours *are* the best Eliot knows how to do and that is a meaningful statement).
On to a finer point. The terminology "real message send" is misleading. Generally, we (the VM hackers) mean by "real" send a send that requires a method activation, i.e., the creation of a context, but *not* the lookup of the method. That excludes for example all (successful) primitives from being "real sends", and as a consequence writing "1 + 2" is not a real send by that measure (with or without the bytecode present) since the primitive will be executed successfully and no "real" send (method activation) has taken place.\
I want to disagree slightly. For me the special selector bytecodes are both a space optimization and a performance opimization, important enough to have acquired its own term, static type prediction, essentially optimizing by implementing without lookup the highest dynamic frequency type, which is what's done for the arithmetic special selector bytecodes, #+, #-, #< #> et al. What exactly do we mean here? We mean that if the types are of a particular small set then there is an attempt to perform the operation that would be performed by a send without actually performing the send, and falling back to the full send if either an error occurs in the operation or if the types are invalid. e.g. for #+ we can attempt to perform the operation if receiver and argument are both SmallIntegers but the result may overflow. So we send if the types are wrong or if the primitive operation can't be performed, and the machine falls back on a real send.
Interestingly there is a cost to static type prediction which can make it not worth-while. If you look at VisualWorks' HPS VM you'll find that there is no static type prediction for #= and #~= because its dynamic frequency for SmallInteger is too small. There /is/ static type prediction for #+ #- #< #<= et al because these selectors are used so frequently in to:do: loops (unlike #= & ~=).
So I distinguish between the avoidance of the send, the actual send and the operation the send binds to. The special selector bytecode for #+ can avoid a real send if the receiver and argument are both SmallIntegers (or in the interpreter SmallIntegers and/or Floats) /and/ if the result does not overflow. But it then does a real send to whatever the receiver is. However that send might bind to a method that has a primitive (SmallInteger, LargeInteger Float) or not (Fraction, a non numeric object). The primitive may perform the operation and avoid building an activation. For example in the Cog JIT it might take too much space to implement static type prediction for #+ and (SmallInteger | Float) x (SmallInteger | Float) and instead just implement it for SmallInteger x SmallInteger, but the primitive for SmallInteger>#+ might quite happily implement SmallInteger x (SmallInteger | Float). Hence a particular special selector #+ invocation could end up in either avoiding the send, causing a real send that invokes the SmallInteger>#+ primitive that succeeds, or a real send that invokes the SmallInteger>#+ primtiive that fails and builds a frame, or a real send that invokes a method without a primitive that builds a frame. But all the last three are real sends.
To make matters more complicated, when we talk about "real" sends in the context of thread switches, semaphores and critical sections, what we mean is whether there is a suspension point in the send or not. Obviously, some primitives (#suspend, #yield) must have suspension points so not all activation-free methods are also suspension-point-free. I am not entirely sure what the current set of rules for suspension points in Cog is; in the interpreter it was part of the activation sequence so any primitive that isn't process related would not have a suspension point but I don't know if that's still true in Cog.
Right, I agree. Basically the situation in Cog is that the suspension points are frame-building sends /not/ marked with primitive 221 or 222 (BlockClosure>valueNoContextSwitch & BlockClosure>valueNoContextSwitch:), backward branches (i.e. at the end of a while or to:do: loop), and the Process/Semaphore/Mutex primitives (suspend, yield, wait, signal, enterCriticalSection, exitCriticalSection). So at backward branches and at frame build the VM checks for external events which may cause process switches, but certain primitives may cause suspensions directly.
I suppose my real point is that sends which invoke primitives are still real sends, but special selector bytecodes that return a result short-circuit sends.
best Eliot
Cheers,
- Andreas
You *must* use an end marker, i.e.,
last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ].
This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no.
What happens if a process with higher priority interrupts the iteration and creates more objects?
On 26.11.2010, at 17:43, Andres Valloud wrote:
You *must* use an end marker, i.e.,
last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ].
This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no.
What happens if a process with higher priority interrupts the iteration and creates more objects?
New objects would come after the end marker in memory.
- Bert -
Can you assume that allocation is sequential?
On 11/26/10 8:59 , Bert Freudenberg wrote:
On 26.11.2010, at 17:43, Andres Valloud wrote:
You *must* use an end marker, i.e.,
last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ].
This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no.
What happens if a process with higher priority interrupts the iteration and creates more objects?
New objects would come after the end marker in memory.
- Bert -
On 26.11.2010, at 18:09, Andres Valloud wrote:
Can you assume that allocation is sequential?
I can.
- Bert -
On 11/26/10 8:59 , Bert Freudenberg wrote:
On 26.11.2010, at 17:43, Andres Valloud wrote:
You *must* use an end marker, i.e.,
last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ].
This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no.
What happens if a process with higher priority interrupts the iteration and creates more objects?
New objects would come after the end marker in memory.
- Bert -
I'm not familiar with Squeak's GC... can you also assume that a GC (e.g.: a "new space" scavenge) won't move the marker so that the enumeration goes past it?
On 11/26/10 9:12 , Bert Freudenberg wrote:
On 26.11.2010, at 18:09, Andres Valloud wrote:
Can you assume that allocation is sequential?
I can.
- Bert -
On 11/26/10 8:59 , Bert Freudenberg wrote:
On 26.11.2010, at 17:43, Andres Valloud wrote:
You *must* use an end marker, i.e.,
last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ].
This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no.
What happens if a process with higher priority interrupts the iteration and creates more objects?
New objects would come after the end marker in memory.
- Bert -
On Fri, Nov 26, 2010 at 9:15 AM, Andres Valloud < avalloud@smalltalk.comcastbiz.net> wrote:
I'm not familiar with Squeak's GC... can you also assume that a GC (e.g.: a "new space" scavenge) won't move the marker so that the enumeration goes past it?
IIUC, that's irrelevant. FOr the marker technique to work all that's needed is that the relative order of live objects isn't changed. The only thing that can change that is a become:. In the absence of a become: the end marker, no matter how many GCs occur, stays above/after all previously activated objects and below/before all subsequently allocated objects. The Squeak GC effectively only compacts to squeeze out reclaimed objects, it does not reorder. So this is very different to the VM/HPS GC.
HTH Eliot
On 11/26/10 9:12 , Bert Freudenberg wrote:
On 26.11.2010, at 18:09, Andres Valloud wrote:
Can you assume that allocation is sequential?
I can.
- Bert -
On 11/26/10 8:59 , Bert Freudenberg wrote:
On 26.11.2010, at 17:43, Andres Valloud wrote:
You *must* use an end marker, i.e.,
last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ].
This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no.
What happens if a process with higher priority interrupts the iteration and creates more objects?
New objects would come after the end marker in memory.
- Bert -
Yes, from what I've seen, Squeak's GC is different from that in HPS. I found some information scattered here and there in Google, but it's mostly commentary. Ideally, I'd look at the source code and figure it out but I don't quite have the time to look at it in all its detail. I would really appreciate a current, 3-5 sentence summary of Squeak's GC implementation. Does anyone have such a pointer? I hope this request is not deemed ridiculous, I'm thinking it would be just as unreasonable to expect anybody to look at HPS' GC implementation and get the gist of what's going on in 5-10 minutes. Thanks in advance!
On 11/27/10 14:06 , Eliot Miranda wrote:
On Fri, Nov 26, 2010 at 9:15 AM, Andres Valloud <avalloud@smalltalk.comcastbiz.net mailto:avalloud@smalltalk.comcastbiz.net> wrote:
I'm not familiar with Squeak's GC... can you also assume that a GC (e.g.: a "new space" scavenge) won't move the marker so that the enumeration goes past it?
IIUC, that's irrelevant. FOr the marker technique to work all that's needed is that the relative order of live objects isn't changed. The only thing that can change that is a become:. In the absence of a become: the end marker, no matter how many GCs occur, stays above/after all previously activated objects and below/before all subsequently allocated objects. The Squeak GC effectively only compacts to squeeze out reclaimed objects, it does not reorder. So this is very different to the VM/HPS GC.
HTH Eliot
On 11/26/10 9:12 , Bert Freudenberg wrote: On 26.11.2010, at 18:09, Andres Valloud wrote: Can you assume that allocation is sequential? I can. - Bert - On 11/26/10 8:59 , Bert Freudenberg wrote: On 26.11.2010, at 17:43, Andres Valloud wrote: You *must* use an end marker, i.e., last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ]. This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no. What happens if a process with higher priority interrupts the iteration and creates more objects? New objects would come after the end marker in memory. - Bert -
http://stephane.ducasse.free.fr/FreeBooks/CollectiveNBlueBook/Rowledge-Final...
On pp. 20 it has an excellent summary about how the Squeak GC works.
Cheers, - Andreas
On 11/30/2010 4:13 PM, Andres Valloud wrote:
Yes, from what I've seen, Squeak's GC is different from that in HPS. I found some information scattered here and there in Google, but it's mostly commentary. Ideally, I'd look at the source code and figure it out but I don't quite have the time to look at it in all its detail. I would really appreciate a current, 3-5 sentence summary of Squeak's GC implementation. Does anyone have such a pointer? I hope this request is not deemed ridiculous, I'm thinking it would be just as unreasonable to expect anybody to look at HPS' GC implementation and get the gist of what's going on in 5-10 minutes. Thanks in advance!
On 11/27/10 14:06 , Eliot Miranda wrote:
On Fri, Nov 26, 2010 at 9:15 AM, Andres Valloud <avalloud@smalltalk.comcastbiz.net mailto:avalloud@smalltalk.comcastbiz.net> wrote:
I'm not familiar with Squeak's GC... can you also assume that a GC (e.g.: a "new space" scavenge) won't move the marker so that the enumeration goes past it?
IIUC, that's irrelevant. FOr the marker technique to work all that's needed is that the relative order of live objects isn't changed. The only thing that can change that is a become:. In the absence of a become: the end marker, no matter how many GCs occur, stays above/after all previously activated objects and below/before all subsequently allocated objects. The Squeak GC effectively only compacts to squeeze out reclaimed objects, it does not reorder. So this is very different to the VM/HPS GC.
HTH Eliot
On 11/26/10 9:12 , Bert Freudenberg wrote:
On 26.11.2010, at 18:09, Andres Valloud wrote:
Can you assume that allocation is sequential?
I can.
- Bert -
On 11/26/10 8:59 , Bert Freudenberg wrote:
On 26.11.2010, at 17:43, Andres Valloud wrote:
You *must* use an end marker, i.e.,
last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ].
This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no.
What happens if a process with higher priority interrupts the iteration and creates more objects?
New objects would come after the end marker in memory.
- Bert -
Awesome, thank you! That makes the "marker technique" more clear. In HPS parlance (excuse me using the terminology I am most familiar with), Squeak has an eden plus two survivor semi spaces, and old space for the rest. The incremental GC would be a scavenge of new space. The full GC is a mark / sweep GC of all object data. Does that make sense?
Also, would you mind a question? If the new objects are above the old data, what happens when the old data grows large enough that it runs into the new object space? Is the new object space moved upwards?
On 11/30/10 16:20 , Andreas Raab wrote:
http://stephane.ducasse.free.fr/FreeBooks/CollectiveNBlueBook/Rowledge-Final...
On pp. 20 it has an excellent summary about how the Squeak GC works.
Cheers, - Andreas
On 11/30/2010 4:13 PM, Andres Valloud wrote:
Yes, from what I've seen, Squeak's GC is different from that in HPS. I found some information scattered here and there in Google, but it's mostly commentary. Ideally, I'd look at the source code and figure it out but I don't quite have the time to look at it in all its detail. I would really appreciate a current, 3-5 sentence summary of Squeak's GC implementation. Does anyone have such a pointer? I hope this request is not deemed ridiculous, I'm thinking it would be just as unreasonable to expect anybody to look at HPS' GC implementation and get the gist of what's going on in 5-10 minutes. Thanks in advance!
On 11/27/10 14:06 , Eliot Miranda wrote:
On Fri, Nov 26, 2010 at 9:15 AM, Andres Valloud <avalloud@smalltalk.comcastbiz.net mailto:avalloud@smalltalk.comcastbiz.net> wrote:
I'm not familiar with Squeak's GC... can you also assume that a GC (e.g.: a "new space" scavenge) won't move the marker so that the enumeration goes past it?
IIUC, that's irrelevant. FOr the marker technique to work all that's needed is that the relative order of live objects isn't changed. The only thing that can change that is a become:. In the absence of a become: the end marker, no matter how many GCs occur, stays above/after all previously activated objects and below/before all subsequently allocated objects. The Squeak GC effectively only compacts to squeeze out reclaimed objects, it does not reorder. So this is very different to the VM/HPS GC.
HTH Eliot
On 11/26/10 9:12 , Bert Freudenberg wrote:
On 26.11.2010, at 18:09, Andres Valloud wrote:
Can you assume that allocation is sequential?
I can.
- Bert -
On 11/26/10 8:59 , Bert Freudenberg wrote:
On 26.11.2010, at 17:43, Andres Valloud wrote:
You *must* use an end marker, i.e.,
last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ].
This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no.
What happens if a process with higher priority interrupts the iteration and creates more objects?
New objects would come after the end marker in memory.
- Bert -
.
On Tue, Nov 30, 2010 at 6:38 PM, Andres Valloud < avalloud@smalltalk.comcastbiz.net> wrote:
Awesome, thank you! That makes the "marker technique" more clear. In HPS parlance (excuse me using the terminology I am most familiar with), Squeak has an eden plus two survivor semi spaces, and old space for the rest. The incremental GC would be a scavenge of new space. The full GC is a mark / sweep GC of all object data. Does that make sense?
Yes.
Also, would you mind a question? If the new objects are above the old data, what happens when the old data grows large enough that it runs into the new object space? Is the new object space moved upwards?
Think of the heap (including new space) as one region growing towards a limit. Old space is immediately below new space, a single variable pointing to the boundary. Somewhere above that is the allocation pointer. When one does a tenuring incremental collection new space is compacted and the pointer to the top of old space merely set to point to teh allocation pointer, so now everything is old. By the same token, to do a full gc the old space pointer is merely set to the bottom of the heap, so now everything is young, and the roots are the active context and the special objects array.
Unlinke HPS there is no such thing as an old space allocation, because there is no free list of free old space objects, and because old space and new space are contiguous. Hence a large allocation will typically tenure everything.
Make sense?
best Eliot
On 11/30/10 16:20 , Andreas Raab wrote:
http://stephane.ducasse.free.fr/FreeBooks/CollectiveNBlueBook/Rowledge-Final...
On pp. 20 it has an excellent summary about how the Squeak GC works.
Cheers,
- Andreas
On 11/30/2010 4:13 PM, Andres Valloud wrote:
Yes, from what I've seen, Squeak's GC is different from that in HPS. I found some information scattered here and there in Google, but it's mostly commentary. Ideally, I'd look at the source code and figure it out but I don't quite have the time to look at it in all its detail. I would really appreciate a current, 3-5 sentence summary of Squeak's GC implementation. Does anyone have such a pointer? I hope this request is not deemed ridiculous, I'm thinking it would be just as unreasonable to expect anybody to look at HPS' GC implementation and get the gist of what's going on in 5-10 minutes. Thanks in advance!
On 11/27/10 14:06 , Eliot Miranda wrote:
On Fri, Nov 26, 2010 at 9:15 AM, Andres Valloud <avalloud@smalltalk.comcastbiz.net mailto:avalloud@smalltalk.comcastbiz.net> wrote:
I'm not familiar with Squeak's GC... can you also assume that a GC (e.g.: a "new space" scavenge) won't move the marker so that the enumeration goes past it?
IIUC, that's irrelevant. FOr the marker technique to work all that's needed is that the relative order of live objects isn't changed. The only thing that can change that is a become:. In the absence of a become: the end marker, no matter how many GCs occur, stays above/after all previously activated objects and below/before all subsequently allocated objects. The Squeak GC effectively only compacts to squeeze out reclaimed objects, it does not reorder. So this is very different to the VM/HPS GC.
HTH Eliot
On 11/26/10 9:12 , Bert Freudenberg wrote:
On 26.11.2010, at 18:09, Andres Valloud wrote:
Can you assume that allocation is sequential?
I can.
- Bert -
On 11/26/10 8:59 , Bert Freudenberg wrote:
On 26.11.2010, at 17:43, Andres Valloud wrote:
You *must* use an end marker, i.e.,
last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ].
This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no.
What happens if a process with higher priority interrupts the iteration and creates more objects?
New objects would come after the end marker in memory.
- Bert -
.
Yes, that does make sense. Thank you :).
On 11/30/10 18:45 , Eliot Miranda wrote:
On Tue, Nov 30, 2010 at 6:38 PM, Andres Valloud <avalloud@smalltalk.comcastbiz.net mailto:avalloud@smalltalk.comcastbiz.net> wrote:
Awesome, thank you! That makes the "marker technique" more clear. In HPS parlance (excuse me using the terminology I am most familiar with), Squeak has an eden plus two survivor semi spaces, and old space for the rest. The incremental GC would be a scavenge of new space. The full GC is a mark / sweep GC of all object data. Does that make sense?
Yes.
Also, would you mind a question? If the new objects are above the old data, what happens when the old data grows large enough that it runs into the new object space? Is the new object space moved upwards?
Think of the heap (including new space) as one region growing towards a limit. Old space is immediately below new space, a single variable pointing to the boundary. Somewhere above that is the allocation pointer. When one does a tenuring incremental collection new space is compacted and the pointer to the top of old space merely set to point to teh allocation pointer, so now everything is old. By the same token, to do a full gc the old space pointer is merely set to the bottom of the heap, so now everything is young, and the roots are the active context and the special objects array. Unlinke HPS there is no such thing as an old space allocation, because there is no free list of free old space objects, and because old space and new space are contiguous. Hence a large allocation will typically tenure everything.
Make sense?
best Eliot
On 11/30/10 16:20 , Andreas Raab wrote: http://stephane.ducasse.free.fr/FreeBooks/CollectiveNBlueBook/Rowledge-Final.pdf On pp. 20 it has an excellent summary about how the Squeak GC works. Cheers, - Andreas On 11/30/2010 4:13 PM, Andres Valloud wrote: Yes, from what I've seen, Squeak's GC is different from that in HPS. I found some information scattered here and there in Google, but it's mostly commentary. Ideally, I'd look at the source code and figure it out but I don't quite have the time to look at it in all its detail. I would really appreciate a current, 3-5 sentence summary of Squeak's GC implementation. Does anyone have such a pointer? I hope this request is not deemed ridiculous, I'm thinking it would be just as unreasonable to expect anybody to look at HPS' GC implementation and get the gist of what's going on in 5-10 minutes. Thanks in advance! On 11/27/10 14:06 , Eliot Miranda wrote: On Fri, Nov 26, 2010 at 9:15 AM, Andres Valloud <avalloud@smalltalk.comcastbiz.net <mailto:avalloud@smalltalk.comcastbiz.net> <mailto:avalloud@smalltalk.comcastbiz.net <mailto:avalloud@smalltalk.comcastbiz.net>>> wrote: I'm not familiar with Squeak's GC... can you also assume that a GC (e.g.: a "new space" scavenge) won't move the marker so that the enumeration goes past it? IIUC, that's irrelevant. FOr the marker technique to work all that's needed is that the relative order of live objects isn't changed. The only thing that can change that is a become:. In the absence of a become: the end marker, no matter how many GCs occur, stays above/after all previously activated objects and below/before all subsequently allocated objects. The Squeak GC effectively only compacts to squeeze out reclaimed objects, it does not reorder. So this is very different to the VM/HPS GC. HTH Eliot On 11/26/10 9:12 , Bert Freudenberg wrote: On 26.11.2010, at 18:09, Andres Valloud wrote: Can you assume that allocation is sequential? I can. - Bert - On 11/26/10 8:59 , Bert Freudenberg wrote: On 26.11.2010, at 17:43, Andres Valloud wrote: You *must* use an end marker, i.e., last := Object new. "end marker" obj := self someObject. [last == obj] whileFalse:[ count := count + 1. obj := obj nextObject. ]. This will work because it counts between the beginning of memory and the (arbitrary) end marker. Anything else basically should not be relied on to work, jit or no. What happens if a process with higher priority interrupts the iteration and creates more objects? New objects would come after the end marker in memory. - Bert - .
On Wed, Nov 17, 2010 at 1:13 AM, Eliot Miranda eliot.miranda@gmail.comwrote:
On Tue, Nov 16, 2010 at 3:32 PM, Levente Uzonyi leves@elte.hu wrote:
On Tue, 16 Nov 2010, Juan Vuletich wrote:
Hi Levente,
Levente Uzonyi wrote:
Hi,
I'm mostly ready with this cleanup, only a few methods left in the image that use #== for integer comparison. Some of them must use #==, others may be changed, but I wasn't sure if the code will work if it's changed or not. If you'd like to check these methods, then evaluate the following in a workspace:
... code here...
Cheers, Levente
Thanks for the snippet! It is now a method in Cuis. I also checked a bit on trunk. In all senders of #nextObject you can apply the pattern in #allObjectsDo:. Instead of assuming == 0 for last #nextObject, it assumes that Object new will go at the end. If that ever changed, a few places would need fixing...
I'm not sure if it's ok to use that pattern in ImageSegment. Even if it's ok, the code is so messy, that it seems to be hard to apply the pattern without breaking the code.
As an experiment, in #allObjectsDo: I tried to remove the Object new stuff and replace with '[ 0 = object and: [ object isMemberOf: SmallInteger ]] whileFalse: ', but my test became extremely slow. I suspect #= is sending the message even if it has its own bytecode... So, the only method that remains with the '0 ==' pattern in Cuis is #critical:ifLocked: , as I'm not sure if we can clean it.
#= is a special selector, so I'm sure it won't send a message if it doesn't have to.
Um, on Cog it sends a message if the method containing the #= has been jitted.
Hi Eliot. I have just read the whole thread, but I wanted to ask you something about this line. Did I understand correct?? If #= has already been jitted, then Gog doesn't use the short-circuit sends (#bytecodePrimEqual) but it does a normal send (a primitive in this case)? So...suppose I change SmallInteger >> #= and I put a halt, then if will be halted (and broke everything, of course) once that method was jitted ? And I guess the same happens with all special selectors.
Thanks
Mariano
But the issue is /not/ whether there is a send or not. The issue is what the behaviour of the primitive code is. In the interpreter #= will short-circuit (avoid the send) if both the receiver and the argument are either a SmallInteger or a Float. In the JIT the primitive will not fail if the receiver and argument are both SmallIntegers or Floats or if the receiver is a Float and the argument is an Integer, but will fail if the receiver is a SmallInteger and the argument is a Float.
In general the VM is free to fail for non-matching numeric types (this is historical, derived from the blue book) so I don't feel Cog is at fault here. Certainly it is not answering incorrect results; it is simply taking more sends to produce the result in some circumstances. I think in cases like this (where you're enumerating over all objects) using #== is a wiser choice.
HTH
It's possible that #= is a bit slower than #== because it has to do some
extra checks, but the difference is probably neglible. Here's a benchmark:
| offset equality identity | Smalltalk garbageCollect. offset := [ 1 to: 1000000 do: [ :i | ] ] timeToRun. equality := [ :x | [ 1 to: 1000000 do: [ :i | 0 = x ] ] timeToRun ]. identity := [ :x | [ 1 to: 1000000 do: [ :i | 0 == x ] ] timeToRun ]. { 0. 1. SmallInteger maxVal. SmallInteger maxVal + 1. nil. Object new. 0.0. Array new. 1/2 } collect: [ :each | each -> ({ equality value: each. identity value: each. equality value: each. identity value: each. equality value: each. identity value: each } - offset) ].
And my results on CogVM:
{ 0->#(3 2 3 1 3 2). 1->#(2 4 3 0 3 4). 1073741823->#(3 3 2 1 3 5). 1073741824->#(221 4 223 1 223 4). nil->#(14 4 14 0 15 4). an Object->#(14 5 13 1 14 4). 0.0->#(622 5 623 2 624 4). #()->#(16 5 14 1 16 4). (1/2)->#(260 4 259 1 258 5) }
I'd do many more iterations. I wouldn't put any faith any millisecond numbers that are less than 3 digits (100 ms).
So the cause of the slowdown in your #allObjectsDo: implementation is that #= is only fast if both the receiver and the argument are SmallIntegers, otherwise there will be message sends.
Another way to implement #allObjectsDo: without the marker object is to replace the loop condition with: object class == SmallInteger. It's actually the same as your implementation without the #= check. #isMemberOf: is optimized to non-real sends. This is as fast as the current implementation on my pc.
Levente
Cheers, Juan Vuletich
On Fri, Nov 19, 2010 at 10:15 AM, Mariano Martinez Peck < marianopeck@gmail.com> wrote:
On Wed, Nov 17, 2010 at 1:13 AM, Eliot Miranda eliot.miranda@gmail.comwrote:
On Tue, Nov 16, 2010 at 3:32 PM, Levente Uzonyi leves@elte.hu wrote:
On Tue, 16 Nov 2010, Juan Vuletich wrote:
Hi Levente,
Levente Uzonyi wrote:
Hi,
I'm mostly ready with this cleanup, only a few methods left in the image that use #== for integer comparison. Some of them must use #==, others may be changed, but I wasn't sure if the code will work if it's changed or not. If you'd like to check these methods, then evaluate the following in a workspace:
... code here...
Cheers, Levente
Thanks for the snippet! It is now a method in Cuis. I also checked a bit on trunk. In all senders of #nextObject you can apply the pattern in #allObjectsDo:. Instead of assuming == 0 for last #nextObject, it assumes that Object new will go at the end. If that ever changed, a few places would need fixing...
I'm not sure if it's ok to use that pattern in ImageSegment. Even if it's ok, the code is so messy, that it seems to be hard to apply the pattern without breaking the code.
As an experiment, in #allObjectsDo: I tried to remove the Object new stuff and replace with '[ 0 = object and: [ object isMemberOf: SmallInteger ]] whileFalse: ', but my test became extremely slow. I suspect #= is sending the message even if it has its own bytecode... So, the only method that remains with the '0 ==' pattern in Cuis is #critical:ifLocked: , as I'm not sure if we can clean it.
#= is a special selector, so I'm sure it won't send a message if it doesn't have to.
Um, on Cog it sends a message if the method containing the #= has been jitted.
Hi Eliot. I have just read the whole thread, but I wanted to ask you something about this line. Did I understand correct?? If #= has already been jitted, then Gog doesn't use the short-circuit sends (#bytecodePrimEqual) but it does a normal send (a primitive in this case)? So...suppose I change SmallInteger >> #= and I put a halt, then if will be halted (and broke everything, of course) once that method was jitted ?
Yes, that's right; since in Cog jitted code #= is always sent if you put a breakpoint in SmallInteger>#= execution will hit the break-point, unlike in the interpreter.
And I guess the same happens with all special selectors.
Thanks
Mariano
But the issue is /not/ whether there is a send or not. The issue is what the behaviour of the primitive code is. In the interpreter #= will short-circuit (avoid the send) if both the receiver and the argument are either a SmallInteger or a Float. In the JIT the primitive will not fail if the receiver and argument are both SmallIntegers or Floats or if the receiver is a Float and the argument is an Integer, but will fail if the receiver is a SmallInteger and the argument is a Float.
In general the VM is free to fail for non-matching numeric types (this is historical, derived from the blue book) so I don't feel Cog is at fault here. Certainly it is not answering incorrect results; it is simply taking more sends to produce the result in some circumstances. I think in cases like this (where you're enumerating over all objects) using #== is a wiser choice.
HTH
It's possible that #= is a bit slower than #== because it has to do some
extra checks, but the difference is probably neglible. Here's a benchmark:
| offset equality identity | Smalltalk garbageCollect. offset := [ 1 to: 1000000 do: [ :i | ] ] timeToRun. equality := [ :x | [ 1 to: 1000000 do: [ :i | 0 = x ] ] timeToRun ]. identity := [ :x | [ 1 to: 1000000 do: [ :i | 0 == x ] ] timeToRun ]. { 0. 1. SmallInteger maxVal. SmallInteger maxVal + 1. nil. Object new. 0.0. Array new. 1/2 } collect: [ :each | each -> ({ equality value: each. identity value: each. equality value: each. identity value: each. equality value: each. identity value: each } - offset) ].
And my results on CogVM:
{ 0->#(3 2 3 1 3 2). 1->#(2 4 3 0 3 4). 1073741823->#(3 3 2 1 3 5). 1073741824->#(221 4 223 1 223 4). nil->#(14 4 14 0 15 4). an Object->#(14 5 13 1 14 4). 0.0->#(622 5 623 2 624 4). #()->#(16 5 14 1 16 4). (1/2)->#(260 4 259 1 258 5) }
I'd do many more iterations. I wouldn't put any faith any millisecond numbers that are less than 3 digits (100 ms).
So the cause of the slowdown in your #allObjectsDo: implementation is that #= is only fast if both the receiver and the argument are SmallIntegers, otherwise there will be message sends.
Another way to implement #allObjectsDo: without the marker object is to replace the loop condition with: object class == SmallInteger. It's actually the same as your implementation without the #= check. #isMemberOf: is optimized to non-real sends. This is as fast as the current implementation on my pc.
Levente
Cheers, Juan Vuletich
squeak-dev@lists.squeakfoundation.org