All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
Hi Monty,
what happens if you add an isEmpty implement ration based in size to SequenceableCollection?
_,,,^..^,,,_ (phone)
On Oct 21, 2016, at 3:59 PM, monty monty2@programmer.net wrote:
All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
Putting #size-based implementations in abstract Collection subclasses negates the lazy collection benefit and still penalizes non-abstract direct Collection subclasses like CharacterSet and any user-defined direct Collection subclass implemented using composition with forwarding to a concrete collection. If the default linear complexity of #size and #isEmpty bothers you so much, make #size a subclassResponsiblity like #do: instead of suddenly penalizing collections whose authors actually did the right thing by providing a non-linear #size implementation.
Sent: Friday, October 21, 2016 at 11:26 PM From: "Eliot Miranda" eliot.miranda@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work
Hi Monty,
what happens if you add an isEmpty implement ration based in size to SequenceableCollection?
_,,,^..^,,,_ (phone)
On Oct 21, 2016, at 3:59 PM, monty monty2@programmer.net wrote:
All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
Please retry with latest trunk version.
2016-10-22 6:48 GMT+02:00 monty monty2@programmer.net:
Putting #size-based implementations in abstract Collection subclasses negates the lazy collection benefit and still penalizes non-abstract direct Collection subclasses like CharacterSet and any user-defined direct Collection subclass implemented using composition with forwarding to a concrete collection. If the default linear complexity of #size and #isEmpty bothers you so much, make #size a subclassResponsiblity like #do: instead of suddenly penalizing collections whose authors actually did the right thing by providing a non-linear #size implementation.
Sent: Friday, October 21, 2016 at 11:26 PM From: "Eliot Miranda" eliot.miranda@gmail.com To: "The general-purpose Squeak developers list" <squeak-dev@lists.
squeakfoundation.org>
Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty
might do too much work
Hi Monty,
what happens if you add an isEmpty implement ration based in size to
SequenceableCollection?
_,,,^..^,,,_ (phone)
On Oct 21, 2016, at 3:59 PM, monty monty2@programmer.net wrote:
All non-trivial collections implement #size or inherit a custom
implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
Do not do this. People already avoid #ifEmpty: and related messages
where performance matters and shouldn't have to avoid #isEmpty too.
The latest trunk has numerous custom #isEmpty implementations, but that doesn't help custom collections outside the image not inheriting one. I maintain cross-platform projects that have such collections, like XPath with XPathFunctionSet or the BitmapCharacterSet package it relies on. Both are Collection subclasses, and now I feel like I should add this: isEmpty ^ self size = 0
to every single collection I maintain. Why not just make #size a subclassResponsibility or add abstract superclasses for lazy or infinite collections that implement #isEmpty using #do: and change #size to shouldNotImplement?
Sent: Saturday, October 22, 2016 at 8:42 AM From: "Nicolas Cellier" nicolas.cellier.aka.nice@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work
Please retry with latest trunk version.
2016-10-22 6:48 GMT+02:00 monty monty2@programmer.net:Putting #size-based implementations in abstract Collection subclasses negates the lazy collection benefit and still penalizes non-abstract direct Collection subclasses like CharacterSet and any user-defined direct Collection subclass implemented using composition with forwarding to a concrete collection. If the default linear complexity of #size and #isEmpty bothers you so much, make #size a subclassResponsiblity like #do: instead of suddenly penalizing collections whose authors actually did the right thing by providing a non-linear #size implementation.
Sent: Friday, October 21, 2016 at 11:26 PM From: "Eliot Miranda" eliot.miranda@gmail.com To: "The general-purpose Squeak developers list" <squeak-dev@lists.squeakfoundation.org[mailto:squeak-dev@lists.squeakfoundation.org]> Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work
Hi Monty,
what happens if you add an isEmpty implement ration based in size to SequenceableCollection?
_,,,^..^,,,_ (phone)
On Oct 21, 2016, at 3:59 PM, monty monty2@programmer.net wrote:
All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
On Monday, 24 October 2016, monty monty2@programmer.net wrote:
Why not just make #size a subclassResponsibility or add abstract superclasses for lazy or infinite collections that implement #isEmpty using #do: and change #size to shouldNotImplement?
Because #do: is the only method required to be overridden by Collection subclasses. All other methods are implemented in terms of #do:.
So yes, if your Collection subclass has an optimized implementation for #isEmpty, then provide it. That is a small price to pay for making a class work optimally across different code bases.
- Bert -
On Mon, Oct 24, 2016 at 1:31 AM, Bert Freudenberg bert@freudenbergs.de wrote:
On Monday, 24 October 2016, monty monty2@programmer.net wrote:
Why not just make #size a subclassResponsibility or add abstract superclasses for lazy or infinite collections that implement #isEmpty using #do: and change #size to shouldNotImplement?
Because #do: is the only method required to be overridden by Collection subclasses. All other methods are implemented in terms of #do:.
+1
So yes, if your Collection subclass has an optimized implementation for #isEmpty, then provide it. That is a small price to pay for making a class work optimally across different code bases.
- Bert -
Normally its the Subclasses where implementation-specific optimizations are done, not in the top abstractions. At the level of Collection, the code should be elegant, expressive, and defining of the responsibilities by the messages it sends. #size is something of Collection, period, there is nothing wrong with sending it, and checking size = 0 is more elegant than a block activation with non-local return.
As a community that respects its user base, we need to consider the *type* of damage that "Change it!" could cause to existing production applications. These applications were built on the expressive implementation, but now they'll be sending #do: to that SQLCollection instead of #size, which will cause the database to retrieve a full ResultSet, send it back to the client, just so it can answer #isEmpty.
Its this kind of "silent" damage that is the worst. Because its invisible, not a an error, not a blow up. Just an degradation in performance (ironic, given our motive here) which, believably, would not be caught in testing.
I agree with Monty. Collection is abstract, there will never be an instance of it. No self-respecting subclass needs this optimized, and trying to optimize it way up there forces too many assumptions about implementation. It makes the code less expressive while silently inflicting potentially performance-killing pain onto legacy applications until they're forced to sprinkle duplicate #isEmpty implementations all about..
On Mon, Oct 24, 2016 at 3:31 AM, Bert Freudenberg bert@freudenbergs.de wrote:
On Monday, 24 October 2016, monty monty2@programmer.net wrote:
Why not just make #size a subclassResponsibility or add abstract superclasses for lazy or infinite collections that implement #isEmpty using #do: and change #size to shouldNotImplement?
Because #do: is the only method required to be overridden by Collection subclasses. All other methods are implemented in terms of #do:.
So yes, if your Collection subclass has an optimized implementation for #isEmpty, then provide it. That is a small price to pay for making a class work optimally across different code bases.
- Bert -
Since Smalltalk-80, sends to #do: have been considered expensive, and sends to #size, inexpensive. And so it has been okay for applications which defined their own Collection subclasses, to add a few milliseconds of cost to something that's already considered expensive (#do:), and not to something which is expected to be fast (#size).
** For 30 years applications were built on these assumptions. Even I did it in Magma. #isEmpty was built on that assumption. Subclass implementors have seemed to agree to these assumptions too, and made their implementations true to them..
May we please reconsider this change?
On Mon, Oct 24, 2016 at 9:07 PM, Chris Muller asqueaker@gmail.com wrote:
Normally its the Subclasses where implementation-specific optimizations are done, not in the top abstractions. At the level of Collection, the code should be elegant, expressive, and defining of the responsibilities by the messages it sends. #size is something of Collection, period, there is nothing wrong with sending it, and checking size = 0 is more elegant than a block activation with non-local return.
As a community that respects its user base, we need to consider the *type* of damage that "Change it!" could cause to existing production applications. These applications were built on the expressive implementation, but now they'll be sending #do: to that SQLCollection instead of #size, which will cause the database to retrieve a full ResultSet, send it back to the client, just so it can answer #isEmpty.
Its this kind of "silent" damage that is the worst. Because its invisible, not a an error, not a blow up. Just an degradation in performance (ironic, given our motive here) which, believably, would not be caught in testing.
I agree with Monty. Collection is abstract, there will never be an instance of it. No self-respecting subclass needs this optimized, and trying to optimize it way up there forces too many assumptions about implementation. It makes the code less expressive while silently inflicting potentially performance-killing pain onto legacy applications until they're forced to sprinkle duplicate #isEmpty implementations all about..
On Mon, Oct 24, 2016 at 3:31 AM, Bert Freudenberg bert@freudenbergs.de wrote:
On Monday, 24 October 2016, monty monty2@programmer.net wrote:
Why not just make #size a subclassResponsibility or add abstract superclasses for lazy or infinite collections that implement #isEmpty using #do: and change #size to shouldNotImplement?
Because #do: is the only method required to be overridden by Collection subclasses. All other methods are implemented in terms of #do:.
So yes, if your Collection subclass has an optimized implementation for #isEmpty, then provide it. That is a small price to pay for making a class work optimally across different code bases.
- Bert -
Except #do: isn't the only method required to be overridden. #add: and #remove:ifAbsent: are also abstract.
Sent: Monday, October 24, 2016 at 4:31 AM From: "Bert Freudenberg" bert@freudenbergs.de To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Subject: Re: [squeak-dev] The defaullt implementation of isEmpty might do too much work On Monday, 24 October 2016, monty <monty2@programmer.net[mailto:monty2@programmer.net]> wrote:Why not just make #size a subclassResponsibility or add abstract superclasses for lazy or infinite collections that implement #isEmpty using #do: and change #size to shouldNotImplement?
Because #do: is the only method required to be overridden by Collection subclasses. All other methods are implemented in terms of #do:.
So yes, if your Collection subclass has an optimized implementation for #isEmpty, then provide it. That is a small price to pay for making a class work optimally across different code bases.
- Bert -
On Tue, Oct 25, 2016 at 6:33 AM, monty monty2@programmer.net wrote:
Except #do: isn't the only method required to be overridden. #add: and #remove:ifAbsent: are also abstract.
Only for extensible subclasses. These *can not* be implemented in terms of #do: so they have to be implemented in a subclass.
Collection provides a maximum of protocol with a minimum of methods required to be overridden.
This discussion is besides the point though. I was just pointing out that requiring #size or #isEmpty to be a subclass responsibility is not a good idea, because they can and should be implemented in terms of #do:. Whether #isEmpty should be implemented in terms of #size is a wholly different issue.
Chris has some valid points about historical performance trade-offs which are worth discussing.
My take is that if we don't know *anything* about the subclass, we have to assume that #size can be way more expensive than a single iteration of #do: (e.g. a linked list). So using #do: is the sensible default implementation for #isEmpty. If OTOH a subclass has a very expensive #do:, then it's reasonable to assume that it would avoid doing much work if it is in fact empty.
If a Collection subclass did rely on an implementation detail of its superclass performance-wise then this is unfortunate, but easy to fix by implementing #isEmpty in terms of #size. We shouldn't let that get in the way of making our base libraries more elegant.
- Bert -
My take is that if we don't know *anything* about the subclass, we have to assume that #size can be way more expensive than a single iteration of #do: (e.g. a linked list).
I don't understand this. How could #size ever be "way more expensive" than a single iteration of #do:? Based on the implementation in Collection, that's impossible..
So using #do: is the sensible default implementation for #isEmpty. If OTOH a subclass has a very expensive #do:, then it's reasonable to assume that it would avoid doing much work if it is in fact empty.
If a Collection subclass did rely on an implementation detail of its superclass performance-wise then this is unfortunate,
Well that's precisely what this change does! We're trying to fix some hypothetical subclass to "rely" on this implementation detail way up in Collection..
but easy to fix by implementing #isEmpty in terms of #size.
You ignored the "silent damage" argument, that someone wouldn't even KNOW it needs fixing.
Besides, the fix should be made in the subclass which doesn't have a fast #size.
Folks there is NO BASIS at the level of Collection for assuming that do: [ : each | ^ false ] is faster than ^self size=0. In fact, the proper assumption is the opposite -- that #size is the optimized method, not #do:.
We shouldn't let that get in the way of making our base libraries more elegant.
It makes them LESS elegant, because of 1) the violation of the assumption that #size is faster than #do:, plus 2) the associated duplication of code that will be required across multiple subclasses just to recover that original faster implementation, which they used to inherit. Doesn't that matter?
I don't think the burden of proof should be on the legacy method, but on the new implementation which purports some improvement. This change sounded good in the ivory towser, but can anyone identify *one single place* in the real-world where this change is beneficial?
Because the detractors are potentially very real...
On 25.10.2016, at 21:45, Chris Muller asqueaker@gmail.com wrote:
Folks there is NO BASIS at the level of Collection for assuming that do: [ : each | ^ false ] is faster than ^self size=0. In fact, the proper assumption is the opposite -- that #size is the optimized method, not #do:.
I don't understand on what _either_ performance-assumption is grounded. Since I've apparently not been around since the early years of either ST-80 or Squeak, can someone enlighten me here?
Both ways seem somewhat reasonable: `self size = 0` is simple, easy to understand on first sight and said here to be more expensive than `self do: [:ea | ^false]`, which is clever[1], possibly needs a double-take (or comment), but has a certain appeal of wholeness[2].
Best regards -Tobias
[1]: sadly, cleverness in software typically is a drawback. [2]: I mean, the minimalism that is possible by just using/providing #do: somehow conveys closure/completeness
On 25-10-2016, at 1:03 PM, Tobias Pape Das.Linux@gmx.de wrote: [1]: sadly, cleverness in software typically is a drawback.
Important point here; since debugging typically requires more cleverness than writing, making the code as clever as possible almost guarantees it cannot be debugged.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- An 8086 in a StrongARM environment.
Hi Tobias, Hi All,
On Tue, Oct 25, 2016 at 1:03 PM, Tobias Pape Das.Linux@gmx.de wrote:
On 25.10.2016, at 21:45, Chris Muller asqueaker@gmail.com wrote:
Folks there is NO BASIS at the level of Collection for assuming that do: [ : each | ^ false ] is faster than ^self size=0. In fact, the proper assumption is the opposite -- that #size is the optimized method, not #do:.
I don't understand on what _either_ performance-assumption is grounded. Since I've apparently not been around since the early years of either ST-80 or Squeak, can someone enlighten me here?
Both ways seem somewhat reasonable: `self size = 0` is simple, easy to understand on first sight and said here to be more expensive than
`self do: [:ea | ^false]`, which is clever[1], possibly needs a
double-take (or comment), but has a certain appeal of wholeness[2].
self size = 0 doesn't work for infinite collections. The suggested implementation of isEmpty does indeed require a comment (I didn't implement it) but it does work for infinite collections. The suggested implementation os isEmpty is also faster than elf size = 0 for any collection of sufficient size.
For example:
| b n | b := Bag someInstance. n := 100000. { b size. [1 to: n do: [:ign| b isEmptyOId]] timeToRun. [1 to: n do: [:ign| b isEmpty]] timeToRun } #(20402 3293 21)
and the cut off is quite low:
| n | n := 10000. ((1 to: 10) collect: [:iguana| (1 to: SmallInteger maxVal) detect: [:i| b := (1 to: i) asBag. [1 to: n do: [:ign| b isEmptyOId]] timeToRun > [1 to: n do: [:ign| b isEmpty]] timeToRun]]) average asFloat 4.9
So for collections that are 5 elements or more the "clever" implementation is faster.
Best regards
-Tobias
[1]: sadly, cleverness in software typically is a drawback. [2]: I mean, the minimalism that is possible by just using/providing #do: somehow conveys closure/completeness
_,,,^..^,,,_ best, Eliot
Hi, Chris, Monty, Tim, did you see the implementation of Collection>>size?
size "Answer how many elements the receiver contains."
| tally | tally := 0. self do: [:each | tally := tally + 1]. ^ tally
Let's forget a minute about specific subclasses. Does iterating over the whole collection is really the best idea for implementing isEmpty? When we implement a message in Collection, we should think in term of Collection, not Interval, Set nor Array.
We're talking of several things here: - universality of the default implementation: we don't need to know about the size to tell if empty not all collection have a known size - optimization: old implementation is fast for collections having an optimized size, catastrophic for others. new implementation is reasonably fast, and full optimization can be restored with a few 1-liner. - cleverness: IMO this is not more clever than, and quite logically connected to implementation of size
I agree that having several implementations for the sake of optimization is not ideal. That's what would have curbed my enthusiasm. Though I doubt about any noticeable slowdown in anything else than micro-benchmarks.
2016-10-25 22:49 GMT+02:00 Eliot Miranda eliot.miranda@gmail.com:
Hi Tobias, Hi All,
On Tue, Oct 25, 2016 at 1:03 PM, Tobias Pape Das.Linux@gmx.de wrote:
On 25.10.2016, at 21:45, Chris Muller asqueaker@gmail.com wrote:
Folks there is NO BASIS at the level of Collection for assuming that do: [ : each | ^ false ] is faster than ^self size=0. In fact, the proper assumption is the opposite -- that #size is the optimized method, not #do:.
I don't understand on what _either_ performance-assumption is grounded. Since I've apparently not been around since the early years of either ST-80 or Squeak, can someone enlighten me here?
Both ways seem somewhat reasonable: `self size = 0` is simple, easy to understand on first sight and said here to be more expensive than
`self do: [:ea | ^false]`, which is clever[1], possibly needs a
double-take (or comment), but has a certain appeal of wholeness[2].
self size = 0 doesn't work for infinite collections. The suggested implementation of isEmpty does indeed require a comment (I didn't implement it) but it does work for infinite collections. The suggested implementation os isEmpty is also faster than elf size = 0 for any collection of sufficient size.
For example:
| b n | b := Bag someInstance. n := 100000. { b size. [1 to: n do: [:ign| b isEmptyOId]] timeToRun. [1 to: n do: [:ign| b isEmpty]] timeToRun } #(20402 3293 21)
and the cut off is quite low:
| n | n := 10000. ((1 to: 10) collect: [:iguana| (1 to: SmallInteger maxVal) detect: [:i| b := (1 to: i) asBag. [1 to: n do: [:ign| b isEmptyOId]] timeToRun > [1 to: n do: [:ign| b isEmpty]] timeToRun]]) average asFloat 4.9
So for collections that are 5 elements or more the "clever" implementation is faster.
Best regards
-Tobias
[1]: sadly, cleverness in software typically is a drawback. [2]: I mean, the minimalism that is possible by just using/providing #do: somehow conveys closure/completeness
_,,,^..^,,,_ best, Eliot
Hi Nicolas, I understand the angle you are seeing this from, but these we distill these points down, we end up with no real-world value; but some unintended negatives, in fact..
did you see the implementation of Collection>>size?
size "Answer how many elements the receiver contains."
| tally | tally := 0. self do: [:each | tally := tally + 1]. ^ tally
Let's forget a minute about specific subclasses. Does iterating over the whole collection is really the best idea for implementing isEmpty?
Asking this question is already assuming too much. Collection>>#isEmpty should not be optimized in terms of another method two sends away. Collection is the ivory tower which only codifies the elegant definitions of the general behaviors. We should let the subclasses be concerned with the nitty-gritty implementation details needed to go crazy with their (less elegant) optimizations. **We shouldn't try to make performance optimizations in Collection.
When we implement a message in Collection, we should think in term of Collection, not Interval, Set nor Array.
The old implementation already was implemented in terms of Collection.
We're talking of several things here:
- universality of the default implementation:
The old implementation was universal. #size always has and always will be part of Collection.
we don't need to know about the size to tell if empty
Not any more than we need to avoid it. Not any more than we need to initiate a #do: enumeration (which is very expensive for some).
not all collection have a known size
So what will happen if they're sent #size then?
If #shouldNotImplement then they will override #isEmpty.
- optimization: old implementation is fast for collections having an optimized size,
catastrophic for others.
Those are more rare, they should override #isEmpty for their specific needs.
new implementation is reasonably fast, and full optimization can be restored with a few 1-liner.
"Reasonably fast" for which subclass? We can't ignore the subclasses.
- cleverness: IMO this is not more clever than, and quite logically connected to
implementation of size
I agree that having several implementations for the sake of optimization is not ideal.
It told me something was amiss with this change. Too many subclasses disagree.
That's what would have curbed my enthusiasm. Though I doubt about any noticeable slowdown in anything else than micro-benchmarks.
The SQLCollection example would be a large degradation. ----- It all distills down to slightly-negative. We should revert it.
Hi Tobias and Eliot,
Folks there is NO BASIS at the level of Collection for assuming that do: [ : each | ^ false ] is faster than ^self size=0. In fact, the proper assumption is the opposite -- that #size is the optimized method, not #do:.
I don't understand on what _either_ performance-assumption is grounded.
Exactly. The original basis for making this change is groundless.
self size = 0 doesn't work for infinite collections.
What's an infinite collection, and why wouldn't self size = 0 work for its #isEmpty? I assume it will need to override #size, so if it answers Float infinity then #isEmpty is false. What's not working? And, irregardless, custom collections like that define what THEY need to work properly (e.g., #size, #isEmpty, whatever) in their own class, not going changing 30-year old base library code to make themselves work..
The suggested implementation of isEmpty does indeed require a comment (I didn't implement it) but it does work for infinite collections. The suggested implementation os isEmpty is also faster than elf size = 0 for any collection of sufficient size.
For example:
| b n | b := Bag someInstance. n := 100000. { b size. [1 to: n do: [:ign| b isEmptyOId]] timeToRun. [1 to: n do: [:ign| b isEmpty]] timeToRun } #(20402 3293 21)
and the cut off is quite low:
| n | n := 10000. ((1 to: 10) collect: [:iguana| (1 to: SmallInteger maxVal) detect: [:i| b := (1 to: i) asBag. [1 to: n do: [:ign| b isEmptyOId]] timeToRun > [1 to: n do: [:ign| b isEmpty]] timeToRun]]) average asFloat 4.9
So for collections that are 5 elements or more the "clever" implementation is faster.
Then add Bag>>#isEmpty, don't change Collection>>#isEmpty, because it changes a 30-year old assumption about how #isEmpty works for every other kind of Collection.
On 25.10.2016, at 23:54, Chris Muller ma.chris.m@gmail.com wrote:
Hi Tobias and Eliot,
Folks there is NO BASIS at the level of Collection for assuming that do: [ : each | ^ false ] is faster than ^self size=0. In fact, the proper assumption is the opposite -- that #size is the optimized method, not #do:.
I don't understand on what _either_ performance-assumption is grounded.
Exactly. The original basis for making this change is groundless.
No Chris, I said I don't understand _both_ assumptions.
self size = 0 doesn't work for infinite collections.
What's an infinite collection,
For example, a generator… or a right-open interval?
[...]
ts or more the "clever" implementation is faster.
Then add Bag>>#isEmpty, don't change Collection>>#isEmpty, because it changes a 30-year old assumption about how #isEmpty works for every other kind of Collection.
Kent Beck gives this example for 'Simple Delegation':
Object subclass: #Vector instanceVariableNames: ‘elements’
Vector>>do: aBlock elements do: aBlock
It seems the #do:-based implementation is 'more correct' here.
I found no other recommendation in either Beck's or Thomas' style guides. To _me_ this suggests that what you perceive as 30-year-old-assumption is not as universal as '#do: should suffice'.
Best regards -Tobias
On Sat, Oct 22, 2016 at 12:59:48AM +0200, monty wrote:
All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
I like Eliot's new implementation, in part because it teaches the reader something about how iteration works. It is still simple enough for an inexperienced person to read, but it also challenges the reader to think about the control flow.
That said, I suspect that Monty and Chris are among the people with the most real-world experience in dealing with performance on very large collections.
So, I would like to ask from a strictly practical point of view, does the change that we are discussing here cause real-world problems for Gemstone and Magma applications?
Does the change make the maintenance of large collection classes more difficult for cross-dialect portability?
Are there any collection classes (especially in Gemstone and Magma) that become slower in a real-world measurable sense as a result of this change?
Dave
+1.
Sent from my iPhone
On Oct 25, 2016, at 16:42, David T. Lewis lewis@mail.msen.com wrote:
On Sat, Oct 22, 2016 at 12:59:48AM +0200, monty wrote: All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
I like Eliot's new implementation, in part because it teaches the reader something about how iteration works. It is still simple enough for an inexperienced person to read, but it also challenges the reader to think about the control flow.
That said, I suspect that Monty and Chris are among the people with the most real-world experience in dealing with performance on very large collections.
So, I would like to ask from a strictly practical point of view, does the change that we are discussing here cause real-world problems for Gemstone and Magma applications?
Does the change make the maintenance of large collection classes more difficult for cross-dialect portability?
Are there any collection classes (especially in Gemstone and Magma) that become slower in a real-world measurable sense as a result of this change?
Dave
I agreed that it looked good in the ivory tower. The thing is, there are these real-world detractors which we should not igore. To save my old fingers, let me just summarize them from the earlier posts. In summary:
- it provides no known material benefit - it could affect unsuspecting legacy application performance materially, and in an insidious way - it's an anomalous usage of the API which works against performance expectations about #size and #do:. - the justification for this change (performance optimization) in Collection is not valid in the first place.
No real-world positives, just negatives or zeros. Please revert.
On Tue, Oct 25, 2016 at 6:42 PM, David T. Lewis lewis@mail.msen.com wrote:
On Sat, Oct 22, 2016 at 12:59:48AM +0200, monty wrote:
All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
I like Eliot's new implementation, in part because it teaches the reader something about how iteration works. It is still simple enough for an inexperienced person to read, but it also challenges the reader to think about the control flow.
That said, I suspect that Monty and Chris are among the people with the most real-world experience in dealing with performance on very large collections.
So, I would like to ask from a strictly practical point of view, does the change that we are discussing here cause real-world problems for Gemstone and Magma applications?
Does the change make the maintenance of large collection classes more difficult for cross-dialect portability?
Are there any collection classes (especially in Gemstone and Magma) that become slower in a real-world measurable sense as a result of this change?
Dave
On Wed, Oct 26, 2016 at 6:42 AM, Chris Muller asqueaker@gmail.com wrote:
I agreed that it looked good in the ivory tower. The thing is, there are these real-world detractors which we should not igore. To save my old fingers, let me just summarize them from the earlier posts. In summary:
- it provides no known material benefit
Having an O(1) runtime vs O(n) is a very large material benefit. Collection>>size is O(n) so it's not a good base for #isEmpty.
- it could affect unsuspecting legacy application performance
materially, and in an insidious way
If you have performance problems, profile.
- it's an anomalous usage of the API which works against performance
expectations about #size and #do:.
The performance expectation of both #size and #do: in Collection is O(n). And there is nothing "anomalous" in using #do:, quite to the contrary, #do: is the *essence* of a collection of objects.
- the justification for this change (performance optimization) in
Collection is not valid in the first place.
It is very valid optimization given the implementation of Collection>>size.
No real-world positives, just negatives or zeros. Please revert.
Quite the opposite. It's a major improvement in basic API.
Yes, you will have to provide an optimized #isEmpty now, because you relied on an implementation detail of a superclass. But there never was a contract that #isEmpty needs to be implemented in terms of #size. The new implementation makes much more sense for an arbitrary abstract Collection, so I'd like to see it stay.
- Bert -
On 26 October 2016 at 09:35, Bert Freudenberg bert@freudenbergs.de wrote:
On Wed, Oct 26, 2016 at 6:42 AM, Chris Muller asqueaker@gmail.com wrote:
I agreed that it looked good in the ivory tower. The thing is, there are these real-world detractors which we should not igore. To save my old fingers, let me just summarize them from the earlier posts. In summary:
- it provides no known material benefit
Having an O(1) runtime vs O(n) is a very large material benefit. Collection>>size is O(n) so it's not a good base for #isEmpty.
- it could affect unsuspecting legacy application performance
materially, and in an insidious way
If you have performance problems, profile.
- it's an anomalous usage of the API which works against performance
expectations about #size and #do:.
The performance expectation of both #size and #do: in Collection is O(n). And there is nothing "anomalous" in using #do:, quite to the contrary, #do: is the *essence* of a collection of objects.
- the justification for this change (performance optimization) in
Collection is not valid in the first place.
It is very valid optimization given the implementation of Collection>>size.
No real-world positives, just negatives or zeros. Please revert.
Quite the opposite. It's a major improvement in basic API.
Yes, you will have to provide an optimized #isEmpty now, because you relied on an implementation detail of a superclass. But there never was a contract that #isEmpty needs to be implemented in terms of #size. The new implementation makes much more sense for an arbitrary abstract Collection, so I'd like to see it stay.
+1. Especially since we've seen examples of Collections where #size just doesn't make sense: generators, Actor-style mailboxes, Xtreams (and, by extension, all reactive APIs).
Which is exactly why in C# an IEnumerable (a Collection) doesn't even have a Count method. (It's added as an extension, and People Know(tm) to be careful using it on an abstract collection.)
Really, to be properly Ivory Tower, and Elegant, Collection's #size should be removed. And as Eliot correctly points out, the only elegant way to ask a Collection about which you know nothing, whether it has any elements, is to _evaluate its first value_.
I can't give real world Smalltalk experience reports, but I can say, with many burn marks to prove it, that asking an abstract collection of things for its size, is a terrible idea.
frank
- Bert -
I don't wish to belabor, but you completely ignored all of my (and Monty's) arguments. Yes, there *is* a sort-of "contract" that #isEmpty should be implemented in terms of #size, and not #do:, that is now being violated. I explained why earlier and you even said it was a point worth discussing, but never did. The comment that was added to Collection>>#isEmpty advertises the flawed thinking.
The flaws with all of your other statements were addressed in prior posts, too. I don't know why any of the advocates for this change couldn't address or even acknowledge those arguments. Even the comment that was added to Collection>>#isEmpty advertises the flawed thinking. I hope someone will go back and really _read_ the arguments being made in this thread.
Just saw Franks note. Same thing.. :(
On Wed, Oct 26, 2016 at 11:35 AM, Bert Freudenberg bert@freudenbergs.de wrote:
On Wed, Oct 26, 2016 at 6:42 AM, Chris Muller asqueaker@gmail.com wrote:
I agreed that it looked good in the ivory tower. The thing is, there are these real-world detractors which we should not igore. To save my old fingers, let me just summarize them from the earlier posts. In summary:
- it provides no known material benefit
Having an O(1) runtime vs O(n) is a very large material benefit. Collection>>size is O(n) so it's not a good base for #isEmpty.
- it could affect unsuspecting legacy application performance
materially, and in an insidious way
If you have performance problems, profile.
- it's an anomalous usage of the API which works against performance
expectations about #size and #do:.
The performance expectation of both #size and #do: in Collection is O(n). And there is nothing "anomalous" in using #do:, quite to the contrary, #do: is the *essence* of a collection of objects.
- the justification for this change (performance optimization) in
Collection is not valid in the first place.
It is very valid optimization given the implementation of Collection>>size.
No real-world positives, just negatives or zeros. Please revert.
Quite the opposite. It's a major improvement in basic API.
Yes, you will have to provide an optimized #isEmpty now, because you relied on an implementation detail of a superclass. But there never was a contract that #isEmpty needs to be implemented in terms of #size. The new implementation makes much more sense for an arbitrary abstract Collection, so I'd like to see it stay.
- Bert -
On Wed, Oct 26, 2016 at 11:41 AM, Chris Muller asqueaker@gmail.com wrote:
I don't wish to belabor, but you completely ignored all of my (and Monty's) arguments. Yes, there *is* a sort-of "contract" that #isEmpty should be implemented in terms of #size, and not #do:, that is now being violated. I explained why earlier and you even said it was a point worth discussing, but never did. The comment that was added to Collection>>#isEmpty advertises the flawed thinking.
I don't see any such contract. Collection has a contract; to be fully functional as a collection class the subclass must implement #do:. This is stated; #do: is a subclass responsibility of Collection. I see no stated contract that #isEmpty is implemented in terms of #size. Given that in Collection #size is implemented in terms of #do:, we are free to implement #isEmpty et al in terms either of #do: or of #size. The new implementation is better for large collections, works for infinite collections, and is hence to be preferred.
The flaws with all of your other statements were addressed in prior posts, too. I don't know why any of the advocates for this change couldn't address or even acknowledge those arguments. Even the comment that was added to Collection>>#isEmpty advertises the flawed thinking. I hope someone will go back and really _read_ the arguments being made in this thread.
I don't see what's special here. We've given arguments for the change; both Bert and Frank (effectively, in my view) refuted your arguments against the change. This is no different than other performance-related changes that have been made in the past. You have yet to present an example of code that is broken ny this change.
Just saw Franks note. Same thing.. :(
Looked pretty cogent and on point to me.
On Wed, Oct 26, 2016 at 11:35 AM, Bert Freudenberg bert@freudenbergs.de wrote:
On Wed, Oct 26, 2016 at 6:42 AM, Chris Muller asqueaker@gmail.com
wrote:
I agreed that it looked good in the ivory tower. The thing is, there are these real-world detractors which we should not igore. To save my old fingers, let me just summarize them from the earlier posts. In summary:
- it provides no known material benefit
Having an O(1) runtime vs O(n) is a very large material benefit. Collection>>size is O(n) so it's not a good base for #isEmpty.
- it could affect unsuspecting legacy application performance
materially, and in an insidious way
If you have performance problems, profile.
- it's an anomalous usage of the API which works against performance
expectations about #size and #do:.
The performance expectation of both #size and #do: in Collection is O(n). And there is nothing "anomalous" in using #do:, quite to the contrary,
#do:
is the *essence* of a collection of objects.
- the justification for this change (performance optimization) in
Collection is not valid in the first place.
It is very valid optimization given the implementation of
Collection>>size.
No real-world positives, just negatives or zeros. Please revert.
Quite the opposite. It's a major improvement in basic API.
Yes, you will have to provide an optimized #isEmpty now, because you
relied
on an implementation detail of a superclass. But there never was a
contract
that #isEmpty needs to be implemented in terms of #size. The new implementation makes much more sense for an arbitrary abstract
Collection,
so I'd like to see it stay.
- Bert -
Hi Eliot, I put a lot of effort to convey these arguments for the betterment of Squeak, I hope you'll oblige this last effort with an open and critical mind..
I don't see any such contract. Collection has a contract; to be fully functional as a collection class the subclass must implement #do:. This is stated; #do: is a subclass responsibility of Collection. I see no stated contract that #isEmpty is implemented in terms of #size.
I said a "sort-of 'contract'", because the problem with this change is more about going against the universal reality of the Smalltalk environment and ecosystem, not against an explicit API contract. I made an explanation of this issue in my earlier post which began with "Since Smalltalk-80, sends to #do: ..."
Given that in Collection #size is implemented in terms of #do:, we are free to implement #isEmpty et al in terms either of #do: or of #size. The new implementation is better for large collections, works for infinite collections, and is hence to be preferred.
If you understood that earlier post, then you understand why I believe we are NOT free to change Collection>>#isEmpty to operate in terms #do:, because we will have surreptitiously changed legacy applications which incur a much heavier cost with the #do: implementation. I provided a very plausible example of this (the SQLCollection) which should not be ignored.
Now, I guess this change *already* did even slow down #isEmpty for some existing classes in the image(!!), to which Nicolas reponded to by copying-and-pasting the old implementation code into multiple classes in order to recover their original performance. Wow.
In exchange for these blemishes, how many classes actually got benefit from inheriting the new implementation?
Collection allSubclasses count: [ : each | (each lookupSelector: #size) methodClass = Collection] "0"
So we're obviously not talking about benefit to any *in-image* kind of Collection, maybe just potential benefit to some... external application's Collection? But while hurting others..? Have we reached a point of enough diminishing returns yet?
The reality is that #size is not only part of Smalltalk's original concept of a Collection since 30 years ago, but is universally expected to be fast, and Smalltalkers write there code under that assumption. That's why the responsibility should be on those unknown *exceptional* subclasses, the InfiniteCollection's or whatever, to deny their #size and provide their alternate #isEmpty's.
Tallying it all up, we've got (-1) potential legacy impacts, (-1) all-new copy-and-pasted code in our core library, just so (+0) some exceptional-case external class MIGHT inherit this faster-for-them implementaiton rather than override it.
I totally understand and agree with the ivory-tower arguments made in favor of this change, its just that in the practical world, it turns out to be a net negative.
Best, Chris
On Wed, Oct 26, 2016 at 05:51:40PM -0500, Chris Muller wrote:
In exchange for these blemishes, how many classes actually got benefit from inheriting the new implementation?
Collection allSubclasses count: [ : each | (each
lookupSelector: #size) methodClass = Collection] "0"
Fair point. Every subclass of Collection reimplements #size in some way. Collection>>size is never actually used by any class in the image, and possibly not in any classes outside the image. So Collection>>size serves as documentation of the intent, and as a default implementation if someone were to implement a new subclass of Collection directly.
Some subclasses (e.g. Bag) would have similar characteristics and would presumably benefit from the new #isEmpty implementation, but most seem to have #size implementations that do not degrade in performance as size increases.
So Chris is raising a fair question. The new #isEmpty looks like it should be faster, but is it actually so in practice?
Dave
p.s. I personally still prefer the new implementation on aesthetic grounds.
On Thu, Oct 27, 2016 at 12:51 AM, Chris Muller asqueaker@gmail.com wrote:
In exchange for these blemishes, how many classes actually got benefit from inheriting the new implementation?
| b | b := (1 to: 10000) asBag. [b isEmpty] bench
before: : '5,540 per second. 181 microseconds per run.' now: '167,000 per second. 6 microseconds per run.'
In my book that's a speedup worth having.
- Bert -
| b | b := (1 to: 10000) asBag. [b isEmpty] bench
before: : '5,540 per second. 181 microseconds per run.' now: '167,000 per second. 6 microseconds per run.'
In my book that's a speedup worth having.
You're depending on the superclass implementation in Collection for that speedup. What did you say about that yesterday?
Honestly, Bert, it almost feels like you're being intellectually dishonest. You chose the only one that got sped up and ignored all the ones that got slowed down. AND, you also ignored all the other points (again) which render the above point irrelevant. Sad.
I give up. You "win".
Yes, in this case a series of tests across collection classes actually would be a proof. A single point measurement is not a statistically relevant result.
--Hannes
On 10/27/16, Chris Muller ma.chris.m@gmail.com wrote:
| b | b := (1 to: 10000) asBag. [b isEmpty] bench
before: : '5,540 per second. 181 microseconds per run.' now: '167,000 per second. 6 microseconds per run.'
In my book that's a speedup worth having.
You're depending on the superclass implementation in Collection for that speedup. What did you say about that yesterday?
Honestly, Bert, it almost feels like you're being intellectually dishonest. You chose the only one that got sped up and ignored all the ones that got slowed down. AND, you also ignored all the other points (again) which render the above point irrelevant. Sad.
I give up. You "win".
Or to put it diffently: Bert supplied the first contribution in this thread which is about performance actually measuring performance......
More is welcome.
On 10/27/16, H. Hirzel hannes.hirzel@gmail.com wrote:
Yes, in this case a series of tests across collection classes actually would be a proof. A single point measurement is not a statistically relevant result.
--Hannes
On 10/27/16, Chris Muller ma.chris.m@gmail.com wrote:
| b | b := (1 to: 10000) asBag. [b isEmpty] bench
before: : '5,540 per second. 181 microseconds per run.' now: '167,000 per second. 6 microseconds per run.'
In my book that's a speedup worth having.
You're depending on the superclass implementation in Collection for that speedup. What did you say about that yesterday?
Honestly, Bert, it almost feels like you're being intellectually dishonest. You chose the only one that got sped up and ignored all the ones that got slowed down. AND, you also ignored all the other points (again) which render the above point irrelevant. Sad.
I give up. You "win".
Or just make Collection>>#size a subclassResponsibiliity!
This will force implementors to give proper #size implementations that are usually O(1), which almost all do already. Bag could get the #do:-based #isEmpty as an optimization, or we could add a dedicated 'tally' inst var like other collections have and its #size could return that, making the default #size-based #isEmpty OK for it.
This is a compromise that addresses the concerns about the default Collection>>#isEmpty degrading linearly without a constant #size and the concerns about harming the performance of existing libraries and applications with custom collections that assume Collection>>#isEmpty is #size-based.
Please consider it.
From: "Bert Freudenberg" bert@freudenbergs.de To: "Chris Muller" ma.chris.m@gmail.com, "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work
On Thu, Oct 27, 2016 at 12:51 AM, Chris Muller <asqueaker@gmail.com[mailto:asqueaker@gmail.com]> wrote: In exchange for these blemishes, how many classes actually got benefit from inheriting the new implementation?
| b | b := (1 to: 10000) asBag. [b isEmpty] bench
before: : '5,540 per second. 181 microseconds per run.' now: '167,000 per second. 6 microseconds per run.' In my book that's a speedup worth having.
there never was a contract that #isEmpty needs to be implemented in terms of #size. The new implementation makes much more sense for an arbitrary abstract Collection, so I'd like to see it stay.
- Bert -
+1
Stef
squeak-dev@lists.squeakfoundation.org