[BUG]Collection>>removeAll:

Jesse Welton jwelton at pacific.mps.ohio-state.edu
Fri Aug 30 21:01:45 UTC 2002


goran.hultgren at bluefish.se wrote:
> 
> For anyone just jumping in here this is what I *think* is the current
> "score":
> [...]
> - Richard together with Jan Hylands has done some really interesting
> archeological findings indicating that removeAll: indeed did work
> earlier due to differences in #removeIndex:! And more efficiently too,
> if I am not mistaken.

No, Richard noticed some recent changes and speculated that it might
have worked previously.  Jon found that Squeak's #removeIndex: in fact
worked in the same inefficient way it does now (only without even the
benefit of a primitive call) as far back as 1.1.  It's still the case
that Richard's suggestion could probably make this operation more
efficient in many common cases.

Otherwise, I think your summary is accurate.

[...]
> Personal current opinion:
> 
> 1. Andrew had some nice arguments there for a while about "semantic
> rules" on not changing operands during operation etc but in the end I
> got swayed over back to Richard's side, *especially* after the recent
> archeological findings, but also mainly because the "semantic rules"
> seemingly to me should only apply when talking about iteration and even
> though one way of implementing removeAll: is through a simple iteration
> - that is not the only way to do it and the iteration is *inside*
> removeAll: so it should not be a problem of the caller.

I agree, except that I wouldn't place much weight on the archeological
evidence either way.  It's interesting, but I don't think it informs
us much on what we *should* do.  One of the goals of Squeak is to
evolve a better system; that can't happen if we bind ourselves to past
behavior.  The question we should really focus on is, which solution
is better?  The answer to that depends on one's personal priorities
between simplicity of implementation, simplicity (and generality) of
use, portability between dialects, and efficiency.  Then there are
one's views on semantic consistency.

The difference between the implementation simplicity of most of
Richard's suggested non-error-throwing implementations and the various
error-throwing versions suggested is minimal.  Richard has also
admirably demonstrated that it can be done with no measurable loss to
efficiency, or even with a significant performance boost compared to
the status quo.  (In all fairness, I should note that the same
techniques could be applied to a non-copying implementation to gain an
even greater speed boost.  Still, if the performance of these
operations in the status quo is remotely acceptable, so would be
Richard's copying version.)

Personally, I think it's clear that simplicity and generality of use
is improved by expanding coverage to include the receiver as argument.
There are fewer special cases to worry about.  I think Andrew has
quite convincingly argued that checking this special case is not
sufficient to ensure correctness in all cases if there are, for
example, slicing collections.  I vacillate on the importance of the
fact that there are none in the base image.  However, I think a
copying implemantation (or any other that avoids removing the elements
from the receiver while iterating over the argument) would be fully
general and applicable to all cases without exception.  From a
usability standpoint, how is this anything but ideal?

As for semantic rules and consistency, in particular about not
modifying a collection over which one is iterating, I have to agree
with Richard, Goran, and the others who take the view that #removeAll:
should not be externally viewed as an iteration operation.  Richard
has again done much legwork to support his point here, by questioning
both technically experienced and inexperienced parties about their
expectations.  It seems that people, by and large, do not expect
#removeAll: to behave externally as an iteration construct.
Furthermore, it is not classified by Squeak as an iteration construct:
it is classified under "removing" protocol, not "enumerating".

(Aside: If we go back to a usability/generality point of view, I think
copying semantics for #do: would in fact be a big improvement.
Unfortunately, in this case the performance hit is just too much.)

Portability is also a relevant concern.  I do think there's merit to
writing prtable code, and to helping an author detect deviations from
the standard.  Throwing an error for undefined results can be a fine
way to do this.  However, I don't think this should override the
usability of the system when a sensible result can be provided for
those who wish to use it, and accept the consequent limitations to the
portability of their code.  It's like intentionally returning a
special UndefinedResult object in all those cases where the return
value is undefined.  Rather than cramming the limitations of the
standard (or of every other dialect, where the standard is unclear or
silent), it would be better to have tools to assist, or a behavior
switch, or a strict portability-enforcing module, that could be turned
on when desired, or ignored by those who don't want it.

> 2. Adding removeAll is a good thing IMHO. Can't possibly see why having
> to write "x removeAll: x copy" would be better - it can't be efficiently
> reimplemented in subclasses and it is an obscure way of doing it.

I think this is an obviously good addition.  Anyone maintaining or
using extant parallel implementations of the Collection protocol can
easily add it.

-Jesse



More information about the Squeak-dev mailing list