[BUG]Collection>>removeAll:

Thu Aug 29 04:49:12 UTC 2002

"David Griswold" <David.Griswold at acm.org> wrote:
	I don't think the ANSI standard says what you think it says.

I have quoted the verbatim text of the 1.9 draft.
I have asked anyone with access to the final version to correct me.
If anyone thinks that the ANSI standard does _not_ say what I said,
could they possibly quote the verbatim text of what it _does_ say?

	Just because it doesn't explicitly say that the receiver isn't
	allowed doesn't mean that the receiver *must* be allowed.

I used to be a regular reader (and poster) in comp.std.c.  I also used
to follow comp.std.fortran, and watched the Ada Interpretation process
for that Ada standard for some time with considerable interest.

People in those communities repeatedly made the point that it is what
a standard actually *says* that counts, not what the standardisers
*meant* to say, or what someone thinks the standard *ought* to mean.
Unless the standard is inconsistent with itself, what the standard
says stands.

	I made a clear argument that all programs in Smalltalk (or in
	any imperative language for that matter) have a universal,
	*implied* constraint that you shouldn't change objects that are
	being used by something that assumes that they remain the same.

Yes, but you are presupposing that #removeAll: *does* assume that.

In traditional logical terms, you are begging the argument.

*IF* removeAll: is allowed to assume that its argument is not going to
change, then at most it needs an assertion.

But that is precisely what has to be established.

There is nothing in the ANSI standard that would lead a careful reader
to conclude that #removeAll: *is* allowed to assume that its argument
is not changed.  It is possible to read the specification as *forbidding*
that assumption, that reading is consistent with the rest of the standard,
and that reading is easily implementable.

	This language-wide constraint clearly excludes (x removeAll: x),

Not at all.  It would only exclude (x removeAll: x) *IF* #removeAll:
were "something that assumes [it] remain[s] the same".

We are agreed that the *implementation* in Squeak *does* make that
assumption.  The whole question, with respect to the standard, is
whether the standard *allows* it to make that assumption, and you
really cannot answer that question by assuming it.

Now, if there were a presumption that method arguments were not changed,
then your argument would be valid.  Is there such a presumption?

In one specific case there _is_ a no-change requirement:
    5.3.1.1 =
    5.3.1.9 hash
    5.3.4.1 =
    5.6.2.7 =
    5.6.7.1 =
    5.8.1.4 =
    5.8.2.6 =
	Note that a collection that uses #= to discriminate
	objects may only reliably store objects whose hash
	values do not change while the objects are stored
	in the collection.

There's something related:
    5.3.1.6 copy
	Any operation that changes the state of the new object
	should not as a side effect change the state or behavior
	of the receiver.  Similarly, any change to the receiver
	should not as a side-effect change the new object.

However, except for noting that MappedCollection>>copy clearly violates
this, it doesn't quite answer our question.

Getting warmer:
    5.7.7.2 add: newElement
	The equivalence of newElement with respect to other objects
	should not be changed while newElement is in the collection,
	as this would violate the invariant under which the element
	was placed within the collection.

However, as I've already noticed in earlier messages, add: is an
excellent example of a message which *doesn't* in general assume
that its argument doesn't change:
    x := OrderedCollection new.
    x add: x.
doesn't depend on #=, and Squeak allows it, and on my reading, doing
this doesn't violate the ANSI specification.  However,
    x := Set new.
    x add: x.
*would* change the equality or otherwise of x with other objects, so
the ANSI specification *does* rule that one out.

I was hoping to demonstrate that
    x := Set new.
    x add: x.
    x includes: x
doesn't work.  I was then going to argue that this one is OK because
the ANSI specification *does* forbid it, unlike x removeAll: x.
Unfortunately, it fails for the wrong reason.  When Set tries to
compute x hash, it goes into an infinite loop.  Similarly,
    x := OrderedCollection new.
    x add: x.
    y := Set with: x
goes into an infinite loop.  (Do NOT try this in an image you care about.)

This raises the interesting question of what #hash should do with possibly
cyclic structures, and if you think that's of academic interest, take a
look at Morph some time.  (The point is that Morphs have a cyclic structure,
so that cyclic structures are not altogether alien to Squeak,
not that #hash doesn't work on Morphs.)  For safety, #hash might be
implemented as
    hash
	^self hashWithDepthBound: 3
or something like that, at at depth bound 0 it could just hash the
class or species of the receiver.  I certainly couldn't find anything in
the ANSI specification warning against using #hash with cyclic structures,
and since the hashWithDepthBound: implementation is not only possible but
actually _used_ in some programming languages, we cannot assume that
everyone will know about it without being told.

Now there is something extremely relevant:

    5.1.2.2 Parameter Specification
        Aliasing information (for example, whether a parameter is stored,
        or whether a returned value is new or returned state) is
        specified to avoid having implementors use defensive
        programming techniques which result in unnecessary object
        creation and copying, incurring a performance penalty.
        We differentiate between incidental aliasing and essential
        aliasing, both for parameters and for return values.
        Essential aliasing forms a critical part of the behavior of
        the interface, and as such it must be specified by the interface
        designer.  Incidental aliasing should not be specified since it
        is a side effect of implementation choices, and is not fundamental
        to the specified functionality of the interface.

	Essential aliasing of parameters is described using a parameter
	aliasing attributes [captured, uncaptured, unspecified].

Oddly enough, one thing that does not seem to be considered here is
cases where aliasing is prohibited.  A literal reading of the standard
(and remember, standards are _supposed_ to be read literally provided you
can do so without inconsistency) would suggest that if aliasing is _not_
indicated for a parameter, then it is "incidental aliasing" and
"is not fundamental to the specified functionality of the interface".

That is, contrary to what David Griswold is saying, there is an EXPLICIT
presumption in the ANSI standard that aliasing which is not mentioned is
NOT FUNDAMENTAL TO THE SPECIFIED FUNCTIONALITY OF THE INTERFACE.

To summarise, what have we found by scanning the ANSI (1.9 draft) standard?

- We have found that it IS clearly spelled out that collection keys
  must not change their hash code or equality relation to other objects
  while they are in the collection.  It is fair enough for collections
  not to worry about the possibility that this might happen, and it is
  fair enough to blame programmers for the consequences who do something
  that the standard says not to do.

- We have NOT found any stated presumption that method parameters are not
  changed by a method.

  For another example that this cannot be true of Smalltalk in general,
  consider

    asSet
        ^self inject: Set new into: [:s :e | s add: e].

  The first argument of #inject:into: is most definitely changed when we
  do this.  I think this is a fine thing to do.  David Griswold says that
  there is a universal presumption that this kind of thing won't happen.
  Do you think this would be such a bad way to implement #asSet that
  every Smalltalk programmer should _know_ not to do it?

- On the contrary, we have found an explicit claim that any aliasing that
  is not explicitly mentioned is "NOT FUNDAMENTAL TO THE SPECIFIED
  FUNCTIONALITY OF THE INTERFACE."

	and the specification for removeAll: doesn't override that.

It doesn't have to override anything.  David Griswold is *RIGHT* that
if something assumes that an object isn't changing, you shouldn't change
it.  However, there is no language-wide constraint that *every* method
assumes that *every* parameter is not changing.  What he needs is to
find something in the specification of #removeAll: that says that
*this* method makes or is entitled to make that assumption, and there
isn't anything like that there.

"The receiver never retains a reference to the parameter, directly
or indirectly, as a result of this message."

So let's look at the aliasing specification for #removeAll:

    Parameters
    oldElements  <collection>  uncaptured

What does "uncaptured" mean?  It means that
    The receiver never retains a reference to the parameter,
    directly or indirectly, as a result of this message.
Does that prohibit (x removeAll: x)?  No:  with any of the working
implementations I've suggested, x doesn't have any references to itself
afterwards that it didn't have before.

	Note that my argument about this implied constraint is not that
	the libraries have been designed that way, or that that's the way I
	think thing were meant etc., I'm saying this implied constraint is
	required in Smalltalk; you just can't get around it.

We AGREE that if a method assumes that some object doesn't change,
then it shouldn't be changed.  That has never been the point at issue.

The point is whether *THIS* method is entitled to make that assumption.

As far as I can tell, nobody would ever have expected this method to
make that assumption were it not for the way it *happens* to be implemented.
When you are arguing about whether the implementation is broken or not,
you really cannot assume its correctness in your argument,
that's the fallacy called question-begging.

	Programmers should read this additional constraint into all ANSI
	specifications, so that removeAll:  has the implied language
	"the argument can be any collection that is invariant during the
	execution of this method".

Why should they do that?
It is possible to implement the specification exactly as written
*without* the code making that assumption.

This method of exegesis is precisely backwards.

I am saying that the standard should be read like this:

    Try to take the text literally.
    If what it describes is consistent with the rest of the standard
    and describes something that can be implemented with known
    technology, do not add any extra assumptions.

In particular, I refuse to insert extra caveats, exceptions, holes,
and quirks into a specification simply on the grounds that a
simplistic implementation might have trouble.

In particular, the way the specification of #removeAll: is structured
is *strongly* suggestive that this assumption is not to be made.
(I am not saying that the ANSI authors intended this.  I am only
commenting on the way the text stands.)

One obvious way to word the specification would have been to say
    "x removeAll: y has the same effect on x that
	y do: [:each | x remove: each]
     would have had."

But that's not what the specification (in fact, the specifications;
there are several copies which I have previously quoted in full) say.
They do NOT talk about iterating over oldElements.  They can very
naturally be read as specifying a two-pass algorithm:
  1.  Determine what elements are to be removed.
  2.  Remove them.
And that reading certainly requires no assumptions whatsoever about
whether oldElements is changed or not.

	I don't *like* that, but it is necessary.

I have read your messages with great care, but I have failed to recognise
an argument for this position.  We AGREE that *IF* #removeAll: assumes
that its parameter doesn't change, then its parameter mustn't be changed.
We agree that for *each* method that assumes that some parameter does not
change, that parameter must not be changed.  We agree that whenever the
text of the ANSI specification is such that it is a reasonable inference
that some parameter should not change, it should not be changed, and the
standard should be so read.

What we do not agree about is whether the specification of *this* method
requires any such inference.  Surely the fact that the specification CAN
be implemented in many difference ways *without* depending on such an
assumption means that the specification of *this* method does *not*
require any such inference.

	Now, it appears you don't agree with this, but you can't say you haven't
	been rebutted.

What I can say is that I have not been offered any rebuttal
that doesn't beg the question.

IF #removeAll: assumes its argument doesn't change, why then the
simplistic implementation that assumes that argument doesn't change
isn't broken.  We agree.  What we disagree about is whether the
hypothesis of that conditional is true.  I have given plenty of
reason to believe it is false, and I am still waiting for an
argument that doesn't just assume its own conclusion.

	I personally agree that #remove:, add:, etc.  don't belong in
	Collection itself.  In Strongtalk I factored them out into the
	separate protocol Extensible, which OrderedCollections, Bags,
	etc.  support.  In Strongtalk, the Collection protocol is the
	true minimal protocol that all collections support.

That sounds wonderful.  Strongtalk sounds better and better.

	But in all other Smalltalks I've seen, they are in Collection,
	so that is a separate discussion.

Not exactly.  It establishes that you are willing to improve the
Collection classes far more drastically than most.  Good on you.
Now we know your resistance to #removeAll: is not a resistance
to change in the Collection classes as such.

	> I'm sorry, just what is being proposed that would break
	> existing classes?

	Changing the meaning of removeAll:. It breaks all existing
	classes that implement the Collection protocol or reimplement
	#removeAll:.

You have merely repeated the assertion.  What I have proposed is that
the implementation of #removeAll: in Squeak should be changed in such
a way that
 - the result is consistent with the ANSI standard
 - NO invocation will change in any observable way EXCEPT when
   the existing code quietly produces nonsensical answers.
 - to rephrase that, ONLY the case where you say there isn't any
   defined result would be changed.
So no working code could break.

I have checked all the existing classes that implement the Collection
protocol and I say it wouldn't break them.  You say it would.  As the
implementor of StrongTalk, you are an extremely clued up person who is
good at writing Smalltalk code.  So show me *how* it would break something.
Provide an example.

Again, if a class reimplements #removeAll:, it wouldn't be calling the
changed code, so its behaviour would not change.  If something's
behaviour is not changed, how exactly is it broken?  (Oh, the reimplemented
code might not conform to the full specification, but then it never did;
that wouldn't be a new breakage.)

	That is exactly why I think the Collection protocol should be minimal,
	especially in a system without mixins (which would allow people to never
	have to reimplement the whole protocol).  I think what you are saying
	actually supports my point that Collection should only be changed rarely
	and reluctantly.

Oh, I agree.  But that doesn't mean I refuse bug fixes!  In fact, it's
precisely _because_ I don't want the specification changed that I _do_
want the bug fix.

	You asked me for examples of classes that would break.  If the classes
	you wrote support remove, *your* code is a perfect example of the kind
	of code that will break.

No, they won't.  With the change, there are no more calls to #remove:
than without (except for the buggy case), and their arguments are
identically the same objects.  In fact, with the change, my code would
work in more cases, the direct opposite of being broken.

There is no change to what is removed from what involved.

	> Again, with the possible exception of code that deliberately causes a
	> DoesNotUnderstand, how would _adding_ a method to Collection
	> "break" any
	> other class?  Can you give an actual example?

	I just gave the example above: any other class that implements
	Collection without inheriting from it, including your own class you
	mentioned above.

That's not an example, it is an assertion that an example exists,
without actually exhibiting.  My own collection classes are most certainly
NOT broken by the change we are interested in, and adding a #removeAll
method of the form
    [self isEmpty] whileFalse: [self remove: self anyOne]
certainly would not break them.  #isEmpty, #anyOne, and #remove: were
part of the collection protocol, so any of my collection classes either
implement all of them, or are never going to implement #removeAll anyway.

	> By and large, MOST of the code in the Collection classes is
	> "The Simplest Thing That Could Possibly Work",
	> except that sometimes it doesn't.

	You may think it is simple-minded; I don't. You may think the
	code in Collection shouldn't place such a premium on simplicity;
	I do.  That doesn't make *me* simple-minded.

Nobody has ever suggested that you are simple-minded.	
I am not opposed to simple methods; I think they are great.
I think "The Simplest Thing That Could Possibly Work" is excellent
advice.  However, I don't believe that "simple" is better than "working",
and I don't believe that "simple to implement" is better than
"simple to use".  (That's the design philosophy that led to C.)

Using the simplest thing that could possibly work is an excellent
strategy right up to the point where you discover that it DOESN'T
work.  Then you do the next simplest thing.

The Collection classes as we find them contain many excellent things,
but are in many ways very obviously the result of high speed development.
By this time, it may not be possible to make any of the more desirable
restructurings (such as the ones I now learn have been done in StrongTalk)
without breaking anything.  However, such cleanups as we _can_ make without
breaking existing code (like fixing the #removeAll: bug) _should_ be made.