[BUG]Collection>>removeAll:

Wed Aug 28 19:22:22 UTC 2002

On Wednesday, August 28, 2002, at 02:35 AM, Richard A. O'Keefe wrote:

> There are two quite distinct "negative" reactions:
> (A) Don't do anything *yet* until we understand what to do.
> (B) Don't do anything, leave it the way it is.
>
> I've been seeing things that I can only understand as (B).

This is argumentative, and unsupported.  I think David and Ralph, both 
eminent in this arena, have provided substantial arguments in both (A) 
and (B).  David provides a reasonable authority that, I believe, 
harmonizes two views.

*** I am sensible to Richard's suggestion that a system should either 
"do the right thing" or "whine about what it was asked to do."  I think 
this is sound practice and, to the extent practicable, a fine general 
principal, though I suspect I would temper it more than would he.  I do 
not disagree with RAO about this point, and support this view.

I would then balance that principle against the following:

*** The iterand of a #do: on a collection, implicitly or otherwise, is 
undefined whenever the structure and contents of the collection is not 
invariant through the iteration.  This is a bedrock principle of the 
Collection hierarchy's design, and seems to me a sound and reasonable 
basis for proceeding.  I don't require that the collection classes 
babysit me on the point, but in deference to the first principle, I 
think it is a reasonable thing to do decent checking where possible.  I 
would not attempt to do either impossible or impractical type checking, 
however.

To the extent that I have read what Richard has argued, I am not 
convinced that #removeAll: is or should be defined in the case of self. 
  I do agree that the check ought to be made to avoid peculiar results, 
however, but I would not INTENTIONALLY return a result upon which 
programmers would depend -- if only to avoid entirely unnecessary 
compatibility issues.

> HOWEVER, I do not think that a correction has to be held up to
> extremely stringent standards (that no other part of Squeak could meet)
> in order to be useful or acceptable.

There is reasonable disagreement whether a correction is required.  It 
seems to me that David's proposal (catching the easy-to-detect 
faux-iteration violation as an exception) more than adequately 
addresses the problem.

> And it really is very much an exaggeration to say that
> "x removeAll: x copy" is an "accepted way" to empty a collection.
> It is a way.  It is a good way.  But practically nobody knows about it.
> It's what people do when they burn their fingers on "x removeAll: x".

Just about everybody seems to know about it, IMHO, even late-comers to 
the language such as myself.  Ralph Johnson wrote a nice little piece I 
read early on in my Smalltalking life about common Smalltalk bugs, and 
the "copy trick" was clearly pointed out there.

> I have also explained repeatedly not only _that_ the change makes the
> specification simpler and cleaner, but _how_ it makes the specification
> simpler and cleaner.

Richard certainly has SAID that he feels it makes the spec simpler and 
cleaner -- not all of us were satisfied either at the sufficiency or 
correctness of his "explanation."  Merely stating, ipse dixit, that 
something is so certainly does not evidence that it is so.

> Rational/critical debate at least requires a rebuttal of these points,
> not a flat denial that no demonstration was ever made.

Pot. Kettle. Black.

Rational/critical debate at least requires acknowledgment that others 
have made more than a flat denial when, in fact, they have undertaken 
to make arguments on the merits.  I have read and considered carefully 
Richard's arguments, and David and Ralph and other's arguments, found 
both of them to have argued substantively and persuasively, and simply 
found the latter to be more persuasive.  I have not found Ralph and 
David guilty of empty gainsay, though I *DO* find Richard's present 
"claim of victory by default" (but not his earlier arguments) to be 
precisely that.

> But how does this argue against an extremely fast check?
> And how does it argue against a check THAT IS FASTER THAN AN ASSERT?

I don't think it does.  I think the idea of a check as a guard against, 
say, an error message or throwing an exception is as good as or 
identical to an assertion.  The difference is that the latter might be 
automatically caught and specially handled for efficiency purposes when 
a programmer doesn't feel the need for protection so much as the need 
for speed.  I am, personally, indifferent between the two.

I am not, however, of the view that the special check, and return of a 
value (rather than an error or throwing an exception) is anything other 
than a problematic change of semantics.

> Except that programmers who look up what #assert: does discover that it
> is slow (block creation + message send).  (The 3.2 browser makes it
> very easy to check this.  Type an assert: into some method, and use the
> rightmost button to select "bytecodes" instead of "source".)

Fair enough -- but it doesn't have to be.  Indeed, we could fix that 
trivially with a compiler change overnight. (Well, it took me a weekend 
to do the ifNil: special cases a few years back, so maybe not 
overnight).

> Lest I be misunderstood, I would be *delighted* if the compiler made
> a special case of "self assert: -explicit block-" like it does for
> ifTrue:.  Then David Griswold's argument *would* be valid and I *would*
> be reasonably happy with "self assert: [aCollection ~~ self]."

Then let's do that -- this makes sense to me.  Are there objections to 
special handling of assert in this manner?

> 	2) Just add "self assert: [ otherCollection ~~ self ]." to the 
> beginning
> 	of removeAll:.
> 	
> Except that as someone else has argued, it doesn't actually solve the 
> problem.

It does seem to do so, at least for me.

> No, it does not solve the problem.
> The problem is that
>  - #removeAll: fails to conform to the ANSI specification
>  - #removeAll: is hard to *think* about because of an arbitrary 
> restriction
>  - the restriction has nothing to do with the semantics of the 
> operation,
>    it is solely a consequence of a naive implementation.

This has been argued at length.  I see at least two different points of 
view, both reasonable.  I am not certain that I see that Richard is 
correct on any of the points of his syllogism.

> All it does is ensure that in likely cases involving existing 
> collection
> classes the bug will not go unreported.

This is true, if one accepts that this operation is properly defined, 
and false if not.  He would be very right if we accepted his premises, 
and is very wrong if we do not.  So what?  He is simply announcing his 
conclusion in different words.  This is not an argument.

There is, at least, a reasonable dispute over this question.

> Make no mistake, this suggestion would be a very definite improvement
> over the present situation.  Bugs that report themselves are better 
> than
> bugs that quietly give wrong answers.

Agreement on this point.

> But it is nothing more than a bandaid.  The RIGHT answer is to design 
> the
> interface so that it has as few exceptions as reasonably possible and 
> to
> write the code so that it implements a clean specification.

Agreement on this point.  I believe the status quo does that quite 
well, for reasons previously stated, and Richard's "solution" opens a 
can of worms.  He disagrees.  This does not advance the argument, but 
merely restates it.

> This applies throughout the collection classes, which are badly in need
> of an overhaul, not least some good documentation.

And so, the solution is to me quite clear: Richard should use the open 
source system to do what he wants.

Richard, if he is inclined to do so, should write his new overhaul of 
the collection classes, probably using a different namespace.  Then, 
his new and better system can be compared in the marketplace of ideas 
with the status quo.  If he is right, his will be preferred over the 
current system.  If he is wrong, it will perhaps inform improvements on 
the original without any meaningful harm.

In this way, existing Smalltalk programs can proceed without being 
destroyed or modified by novel semantics, and people may rely upon the 
principles on which the existing Collection classes are based.  New 
programs can proceed experimentally along the pink plane and novel 
ideas can flow, critically evaluated as they go.  The experiment will 
succeed or fail, but we can all act as truth-seekers by simply not 
making this a zero-sum game.

Richard can prove his point on the merits by proving it in practice.  
You go, boy!  Write NewCollection and its progeny, and prove that open 
source rocks.

> Speed isn't my issue.  Preserving identity so that the Observer pattern
> continues to work is my issue.

removeAll
	x removeAll: x copy.

seems to work fine to that end.	A number of more efficient solutions 
exist in special cases of the Collection hierarchy.

> And most descendants of Collection do not and cannot support #remove:.
> If that isn't a "code smell", what is?

It is a code smell.  Many single-inheritence hierarchies have similar 
smells.  Please, by all means improve and refactor when you rewrite the 
hierarchy.