[BUG]Collection>>removeAll:

David Griswold David.Griswold at acm.org
Wed Aug 28 05:26:39 UTC 2002


To try to move this issue forward, and keep it in the realm of
constructive discussion, I'm going to try to clarify what is going on,
and put forward a proposal.

I suspect that some (Richard at least), are feeling like myself and
other nay-sayers are simply being obstructionist about a simple issue.
But there is a very good reason for the kind of critical discussion that
has been going on, and it is actually a good thing.  The closer a change
comes to the core of a system, the more it should be challenged, to make
sure it is worthy.  We are not talking about JoeBlowCornerOfTheSystem,
but the *Collection* class and protocol, arguably as close as you can
get to the heart of Smalltalk.  This class and protocol have withstood
an enormous amount of use in the real world, and even a very slight
change needs to be really, really thought about.  I jumped into the
conversation only because it seemed to me that the necessary "due
diligence" wasn't being done.

I've made my points, but to be constructive and try to move forward,
here is how I would reason about this problem and what to do about it.
I'm laying it out in painful detail to show the kind of restraint I
think is needed when dealing with important code like this.

- it is clear that people do try to write (x removeAll: x), and that it
doesn't do what they expect.  This seems to happen frequently enough
that it would be desirable to at least let people know when this
happens, so they can write their code a different way.  The currently
accepted valid ways to empty a collection are to either create a new
collection (usually the fastest way) or say (x removeAll: x copy).

- If we come up with a way to detect this case and generate an error,
then the only remaining rationale for extending removeAll: to support
the special case of the receiver as an argument would be if it was
required by ANSI, or made the library or its specification simpler and
cleaner, or fixed an actual bug.  I believe that none of these has been
demonstrated to be true.

So how would we detect the error?  Richard has argued for just putting
the check in.  I say it belongs in an assertion, even though it is a
very fast check, and here is why:  every time a programmer thinks of
some invariant like this that they want to check, they make a little
'cost' calculation in their head, and if the check is going to have much
of a performance cost, they tend to not do the check.  Using assertions
automatically for all of these kinds of checks eases that difficult cost
decision, which encourages people to sprinkle these checks all over
their code (a Good Thing).  Just because this particular check happens
to be very fast does not make it a special case.

So I would recommend the following fixes:

1) Get the assertion mechanism working effectively and efficiently in
the standard Squeak release.  Make sure its cost in development is
minimal, and its cost in production, zero.

2) Just add "self assert: [ otherCollection ~~ self ]." to the beginning
of removeAll:.

Very, very simple. (a Good Thing).  It prevents this whole issue from
being a problem, with no changing of the basic Collection interface or
implementation required, and with no performance impact in production
(Good Things).  It also gives everyone notice that they can now sprinkle
their code with assertions (a *really* Good Thing).

Now, to the second, separate matter: is a new #removeAll method needed?

I can see only one good reason for a #removeAll message.  That is that a
large, tenured collection might need it as a hook to implement a more
efficient way of being emptied.  Another argument that might be made is
that people need a generally easy way of removing all of a collection's
elements, but I don't buy that because most of the time, they should
just create a new empty collection, which is usually faster, so we don't
want to particularly encourage doing #removeAll anyway, since it is
usually the slowest way to get the job done.

However, the large collection argument is a good one, I think.  The
question then becomes, where does this method belong?  One could argue
that code dealing with such large collections almost always knows a lot
more about the collection than just that it is a collection that
supports remove:, so adding efficient removeAll methods to specific
collection classes could be done without adding it to the Collection
protocol itself.

On the other hand, since removeAll is conceptually built using remove:,
adding a default implementation to Collection doesn't break any subclass
of collection, since it doesn't use any message other than do: and
remove:.  It would, however, break classes that implement the Collection
protocol but which don't inherit from it (ah, for mixins).

So basically, I think this argument comes down to weighing whether the
"large collection" argument is good enough to overcome the drawback of
breaking existing classes that implement Collection without inheriting
from it.  I'm not drawing a conclusion here, just trying to delineate
the crux of the issue.  I personally think removeAll would be a nice
addition, but it would break some code out there, and it's not used that
often (and shouldn't be), so I'm on the fence about it.

If such a method was added, in Collection it needs to be something
really simple.  You can do this safely and generally without copying if
you write
	removeAll
		[ self isEmpty ] whileFalse: [ self remove: self anElement ]

where anElement can be the following private method (it can be made
public, and is in Strongtalk, but that's another issue):
	anElement
		self do: [ :el | ^el ].
		self error: 'Collection empty; no element to return'.

Unfortunately, without an inlining compiler, anElement is not very
efficient in Squeak right now, although with inlining it could be more
efficient than copying.  On a 1000 element Set of consecutive integers,
the above implementation is a bit slower than (self removeAll: self
copy), so maybe the copying version should be used for now instead, if
removeAll is to be added (although anElement is a handy method to have
around, in its own right).  On the other hand, much more efficient
implementations of anElement can easily be written for important
subclasses, and then removeAll can be done efficiently without copying.

So if there is any overall message here, it is: be very careful with
code in classes like this, and show great restraint by making changes as
minimal as possible.  The code in Collection is simple, but not
simple-minded.  Don't underestimate the subtlety of why these
simple-seeming methods are the way they are.

Cheers,
Dave





More information about the Squeak-dev mailing list