Questions about #do: (was: Re: [BUG]Collection>>removeAll:)

Richard A. O'Keefe ok at cs.otago.ac.nz
Mon Sep 2 06:36:47 UTC 2002


Wim Boot <wim at lombok.demon.nl> wrote:
	If you propose a change here, could you make sure that the
	following code also works?
	
	| c |
	c := #(1 2 3) asOrderedCollection.
	c removeAll: c readStream.
	

With the change I have described to #removeIndex:, this could work,
except for one thing.  The argument of #removeAll: is supposed to be
a Collection.  Oh, I see, #do: _is_ in ANSI 5.9.2.  However,
#removeAll: is _not_ defined to actually use #do: (only to work on
the elements of a collection in the same order) and its argument
must support _all_ the methods of the <collection> protocol, which
ReadStream doesn't seem to.

But there is still a problem, and it's a general problem with streams.
I'm replying to this because it illustrates an important point about
streams that is not at all obvious to beginning Smalltalkers.

If you accept that "when YOU iterate over a collection, YOU shouldn't
change it", which I do, then you also have to accept that "as long as
a stream is open on a collection, you should not change that collection"
because a stream is precisely an iteration device.  (Indeed, C++ calls
them 'iterators', and so does Java these days.)

So as soon as we have 'c readStream' in our hands, we must refrain
from _any_ attempt to modify c by _any_ means until that stream is
dead.
    c := #(1 2 3) asOrderedCollection.
    s := c readStream.
    c remove: 1.
    s next
is also banned; it's not removeAll: as such that is the problem.

The viewpoint I'm adopting here is that as soon as you invoke #readStream,
you have started an _explicit_ iteration, and it is up to you to ensure
that the underlying collection isn't changed until you've finished your
iteration.  This is consistent with my stand on #removeAll:, where I
argue that a programmer should not be bound by the constraints of an
iteration that s/he has not started, cannot see, and might not even exist.

I note that a Java Iterator would notice the change and raise an exception.
Basically, Java collections keep a count of the number of changes they've
had, and Java iterators remember what that count was.  You can make some
changes through an iterator, and the remembered count will be updated.  Any
other changes will be squawked about, loudly.

In fact we _could_ make the #readStream case work.  It wouldn't even be
particularly hard, except that the result would not be an instance of
ReadStream.  I'm not going to go into detail, because I think the case
where there _is_ an explicit iteration in progress is different from the
case where there is not.

This brings me to some questions about #do: which I've been meaning to ask.
When I ask about "general expectations", I am _not_ talking about language
standards, and I am not talking about exceptionless rules; what I'm basically
after is what assumptions do you make unless you think really hard about them?

1.  Is there a general expectation that if you call #do: once, and do not
    _yourself_ change the object you sent it to, you can do it again and
    get the same answers, apart from their order?

    - This is clearly violated by Stream>>do:, but do people find that
    _surprising_, as I do?

2.  Is there a general expectation that if you call #do: once, and do not
    _yourself_ change the object you sent it to, you will get the same
    answers _in the same order_ when you send it again, even for a
    non-sequenceable collection?

    - This is violated by my NewSet class, thanks to its use of the
    move-to-front heuristic, and it may be violated by other self-organising
    containers.

3.  Is there a general expectation that if you call #do:  the receiver
    will send messages only to itself, "structural" objects that it owns
    exclusively, and immutable objects such as numbers?
    
    - This is violated by any class you have put debugging print statements
    in, but that may be harmless.  It is certainly violated by
    MappedCollection.  Is that surprising?  

4.  Is there a general expectation that if you call #do: you may safely
    send 'query' messages to the receiver, message that do not change
    its _abstract_ state?

    - This is violated by NewSet again because of its self-organising
    nature.  You _can_ ask about the element that is currently being
    scanned, you can even _remove_ that element, but if you _ask_ whether
    some other element exists that may confuse the iteration.  Is that OK
    or do I have to put a lock around move-to-front while iterations are
    in progress and use exception handling to make sure the lock is clear
    when all iterations are over?

5.  Is there a general expectation that iterating over N elements using
    #do: is (amortized) O(N)?

    - This is violated by the existing Set class in Squeak.  The cost
    is proportional to the _maximum_ number of elements the Set has ever
    had, not to the number it has right now.  Is that surprising?

6.  Is there a general expectation that sending query messages to the
    receiver during an iteration doesn't change the cost of #do:?

    - Not true in NewSet.  Also not true in Eiffel's SET class, even
    though it isn't self-organising.  (It has a cache, and queries can
    invalidate the cache.)




More information about the Squeak-dev mailing list