The standard does *not* support - a removeAll: a - [was: Re: [BUG] Collection>>removeAll:]

Richard A. O'Keefe ok at cs.otago.ac.nz
Mon Sep 9 03:30:54 UTC 2002


"Lex Spoon" <lex at cc.gatech.edu> wrote:
	> 	For example, suppose you looked at the
	> 	following code, which worked:
	> 	
	> 		a removeAll: a
	> 	
	> 	and then changed it to this, which doesn't work:
	> 	
	> 		a do: [ :each | a remove: each ]
	> 	

I replied about the first fragment:
	> Right.  Here is a call to an operation *inside* an encapsulation
	> boundary which can do whatever it wants.
and about the second:
	> I thought it was generally agreed in this and related discussions
	> that good programmers _ought_ to know that it is a bad idea to modify
	> a container while it is running an iterator method.
	
	Note, that the programmer might not know that the two a's are the same:
	
		a do: [ :each | b remove: each ]
	
Yes, but now that is a different example, and there are very different
things to say about it.  Anyone who has attentively read "Smalltalk with
Style", for example, knows that if you do this it is YOUR responsibility
to ensure that 'a' and 'b' are NOT the same.

	An error mesage lets the programmer know as soon as possible that a and
	b are sometimes the same.
	
If you can arrange for Squeak or any other Smalltalk to report an
error for
	a do: [:each | b remove: each]
when b == a, more power to you.  That could be an excellent thing.
	
However, this new example, is, if anything, a strong argument *for*
having a #removeAll: that does "the right thing" whatever its arguments.

Let's take this new example about point out something else that is wrong
with it:

    a do: [:each | b remove: each]

Now the programmer who wrote that may have _thought_ that 'a' had no
repeated elements.  If there _are_ repeated elements, it may quietly
give a surprising answer, or it may signal an error.  In either case,
this is not an argument AGAINST the #removeAllFoundIn: method, which
accurately expresses the programmer's intent, it is an argument FOR
#removeAllFoundIn:.

	> This argument has the form
	>     "Because a working method call can be replaced by code
	>      that is obviously unlikely to work,
	>      therefore the working method should be made to signal an error."
	> 
	> I don't see any logic in that.  To start with, just how likely is it that
	> someone would replace a call to a method in someone else's class with code
	> that, even if it worked, might take O(N**2) time or worse?  
	
	It is *not* obviously unlikely to work -- quite the contrary. 

But yes, it *is* obviously unlikely to work.  Remember, the example
about which I wrote that was

    a do: [:each | a remove: each]

and there is no shortage of Smalltalk books and guides warning people
against modifying a collection they are iterating over.  If you write
an explicit iteration over some collection in Smalltalk where the body
of the iteration changes the collection you are explicitly iterating over,
you have no right to expect it to do anything meaningful, useful, or sane.

	Furthermore, the complexity will usually be the same.
	In Squeak there is but one implementation of removeAll:.
	
But that is one of the things Goran Hultgren and I have proposed fixing.
Both of us have provided code for it.  He and I agree in wanting
a #removeAll method.  We have both shown that it can be O(N) instead
of O(N**2).  (Particularly important for Set, by the way, where removing
elements is surprisingly expensive.)

	I make changes like this all the time, and I believe it is a common
	Smalltalk development style.

Turn method calls into explicit loops that break a well known rule?
I doubt that.

	Smalltalkers *adjust* code more than they
	write it from scratch.

Indeed.  So do I.  But that _kind_ of change?

	And they don't wait for proofs of identical
	behavior before they make the adjustments;
	they try things that are
	likely to work and then they run the program again.
	
I do hope the standard of testing amongst Smalltalk programmers is
higher than _that_.  Beizer, in "Software Testing Techniques",
has this on page 4:

    What's the purpose of testing?
    There's an attitudinal progression characterized by
    the following five phases:
    
    PHASE 0--There's no difference between testing and debugging.
    Other than in support of debugging, testing has no purpose.

    PHASE 1--The purpose of testing is to show that the software works.
	
    PHASE 2--The purpose of testing is to show that the software doesn't work.

    PHASE 3--The purpose of testing is not to prove anything, but to
    reduce the perceived risk of not working to an acceptable value.
    
    PHASE 4--Testing is not an act.  It is a mental discipline that results
    in low-risk software without much testing effort.

Programmers should certainly not "try things that are likely to work".
That's phase 1 thinking.  They should try things that are likely NOT to work,
which is phase 2 thinking.  Phase 4 thinking is what you really need for
extreme programming, and NO, I have not reached that level myself, except
on too-rare occasions.  The tests should be written first, then the code
adjusted, and tested, and the tests should be genuinely probing tests.

There are a number of elementary boundary cases that every programmer
should be familiar with:

    - what if this object is nil?
    - what if this object is the wrong kind of object?
    - what if this object is identical to some other object I'm dealing with?

	>     x := 0. [z = 0] whileFalse: [x := x+y. z := z-1]
	> and if y or z was a floating-point number, the result could be grossly
	> inaccurate.  Is that an argument for making * signal an error when given
	> floating-point numbers, or returning equally inaccurate results?
	
	A difference is that, with floating point, the problematic situation
	can't be avoided.

Well, no.  By using #* instead of writing your own loop, you DO avoid
the problem.  That's my point.

	With collections it can.
	
Yes, by using a method with the right semantics instead of an explicit
loop with no semantics.

	There is more to my argument:
	
		1. It is an extremely special case.

		Instances are likely to be
		modified into instances of the general problem over
		time, and the general problam cannot be solved.
	
I must admit that I am now completely lost?

_What_ is a special case?

The only example we've been given of modifying code is
turning a working "a removeAll: a" into a nonworking
"a do: [:each | a remove: each]" which breaks so badly that
almost any test (other than with an empty collection) will reveal the bug.

_Which_ problem is "the general problem" which cannot be solved?

		2.  Relying on the extended semantics is unportable and
		should not be encouraged.
	
Practically _everything_ in Squeak is nonportable.
Are you saying that the use of graphics and sound should not be
encouraged because they are not portable (and they aren't)?

Let's take something else in Squeak.  Characters.  In Squeak, for any
integer 0 to: 255 there is precisely one corresponding Character object.
In Squeak, we can rely on

    (Character value: i) == (Character value: i)

Where are the people saying that this is "extended semantics",
"unportable", "should not be encouraged"?  Where are the people suggesting
that it should be an error to use a Character as an element of an IdentitySet
or as a key of an IdentityDictionary on the grounds that it is not portable?

Quoting ANSI has gone over like a lead balloon in the past, but let's try
one more time:
    "It is unspecified whether or not each code point is uniquely
     associated with a unique character object."
Why do they say that?  Because there are Smalltalks that do it each way.

Looking at ANSI again, simply to get an idea of what might be portable and
what might not, I note that ANSI characters support #isDigit, but there is
no #digitValue.  Now *I* think #digitValue is useful, and I have no
intention whatsoever of refraining from using it on the grounds that it
is an extension to the semantics of Character and is not portable.

There's an even bigger extension to Character semantics in Squeak:
in Squeak Characters are Magnitudes.  In ANSI Smalltalk they aren't.
(Here as elsewhere, all my comments refer to the 1.9 draft.)  That's
a much bigger semantic extension than making (a removeAll: a) work.
Where are the people saying "don't use #< with Characters"?

Then there are all those methods in String that weren't inherited from
Smalltalk-80.  Does anyone go around saying "String>>crc16 isn't portable,
so you should not use it?"

Let's face it, "not portable => do not encourage its use" is a rule that
is not applied to any other aspect of Squeak.  It appears to be a rule
made up for this occasion.

If this rule were consistently applied, we would never make any changes
to Squeak, because _any_ change affects portability between Squeak
versions at least.  Nobody could change any built in classes; Squeak
would be just like Java there.  Not even Squeak Central would be allowed
to fix bugs, no matter if everyone on the mailing list agreed that they
were bugs, because that would break portability between versions.
Indeed, nobody would even be allowed to make Squeak *more* compatible
with another Smalltalk (which is what the #removeAll: proposal amounted to).

		3. The situation is easily avoided once it's been found.
	
First of all, avoidance is never as easy as prevention.  Second, the problem
with the present code is that it SILENTLY gives wrong answers, so finding it
is not particularly easy in the first place.  In fact, a bug in the core
classes is about the last thing anyone will think of.  Third, how easy it
is to fix depends on the person doing the fixing.  Smalltalk programmers
tend to be fairly bright, so it might not be as hard for them as for, say,
V----- B---- programmers.  But fourth, avoidance doesn't fix the performance
problem, and prevention can.

	I guess what it comes down to is that I don't like do-what-I-meant. 
	DWIM is always dangerous, and in this case we can avoid that danger
	quite easily.
	
I've used DWIM.  DWIM was part of Interlisp.  In practice, it worked
extremely well.  In fact, to this day, Squeak has a DWIM feature.  If
you spell a variable wrong, it will try to guess what you might have meant
and will offer you a short list of corrections.  It's very handy.
DWIM refers to a _compile_time_transformation_ of patently illegal source
code to something you might have meant.	

DWIM is not to be confused with "do-what-you-said-you-would-do",
which is what I and others have advocated here.

Whatever "DWIM" is being used to mean here,
surely "do something really stupid that nobody could have meant"
has got to be worse?  Surely "do-something-weird-instead-of-what-
you-promised-to-do" cannot be a good thing?

Right now I'm fed up with Squeak not doing what I mean.
I've downloaded ComSwiki, fired it up, selected a port, clicked on
"start".  However, neither Amaya nor Netscape can find the web page
it says to visit next.  Try clicking on "stop", then on "start",
and BOOM.  It appears that sometimes Squeak thinks "start server"
means "report an error involving setToEnd".  Unhelpful.  I wish
ComSwiki _would_ do what I meant, and what the button label promises.




More information about the Squeak-dev mailing list