The standard does *not* support - a removeAll: a - [was: Re: [BUG] Collection>>removeAll:]

Richard A. O'Keefe ok at cs.otago.ac.nz
Tue Sep 3 02:27:09 UTC 2002


I wrote that
    : oldElements copy do: [:each | self remove: each]
    : implemenents precisely the semantics required by the ANSI standard.

"Andrew C. Greenberg" <werdna at mucow.com> wrote:
	Except on collections where it doesn't: Richard's reasonable code 
	presumes that the collection answers to copy in the manner expected.
	
	This code fails, for example, on a LinkedList.
	
When we are talking about the ANSI standard, it pays to remember that
there is no such thing as a LinkedList class in ANSI.
	
Now #copy _is_ an ANSI method, and 5.3.1.6 says
    Return a new object that must be as similar as possible to the
    receiver in its initial state and behavior.  [Except that]
    Any operation that changes the state of the new object should not
    as a side effect change the state or behavior of the receiver.
    Similarly, any change to the receiver should not as a side-effect
    change the new object.
    If the receiver is an identity object, return the receiver.
and the Rationale warns explicitly that there _isn't_ any promise
that x copy = x.

Now let's look at LinkedList.  In Squeak 3.2 there is no #copy method
defined in the LinkedList class.  (By the way, in the Squeak 3.2
scroll-bars, the "menu" box is in an EXTREMELY inconvenient place.
Whenever I try to scroll up I find myself with a menu instead and have
to spend more time than I care to twitching the mouse around trying to
get the "scroll up" box.  The "menu" button should be _above_ the
"scroll up" box.)  The #copy method is inherited from Object.
LinkedList(Object)>>copy
    ^self shallowCopy
LinkedList(SequenceableCollection>>shallowCopy
    ^self copyFrom: 1 to: self size
LinkedList(SequenceableCOllection)>>copyFrom: start to: stop
    |newSize|
    newSize := stop - start + 1.
    ^(self species new: newSize)
        replaceFrom: 1 to: newSize with: self startingAt: start
(a LinkedList) species is LinkedList.
LinkedList class(Behavior)>>new: bombs out,
    "LinkedList cannot have variable sized instances".

Is this the fault of my suggested implementation of #removeAll:?
No, as I wrote, it fully conforms to the ANSI specification.
The implicit assumption is "given any collections which otherwise
conform to the ANSI specification."  And LinkedList is so far from
doing that, in almost _any_ operation, that it cannot be said to
conform to the ANSI <collection> protocol at all.

However, one _could_ add a new kind of Link.  Here's one that I used
for testing:

    Link subclass: SimpleLink
        instanceVariables: 'value'
        ...

    "method category 'accessing'"
    value             ^value
    value: anObject   value := anObject
    printOn: aStream  aStream nextPutAll: 'a SimpleLink(';
			      print: value; nextPut: #)

The problem is that LinkedList does not have a #copy that conforms
to the ANSI specification.  So let's give it one.

    LinkedList "category 'copying'">>
    copy
	|result|
	result := self species new.
	self do: [:each | result addLast: each copy].
	^result

I was going to present an example using #collect:, but (a) that blows
up for the same reason that #copy used to, (b) it's not even clear that
it makes sense.  The block would have to accept a Link as argument and
return a different Link as result.  Oh well, a collection is allowed to
have type restrictions, so here's #collect: after all.

    LinkedList "category 'enumerating'>>
    collect: aBlock
	|result|
	result := self species new.
	self do: [:each |
	    result addLast: (aBlock value: each)].
	^result

Now we can do
    a := LinkedList new.
    a add: (SimpleLink new value: #x).
    a add: (SimpleLink new value: #y).
    a add: (SimpleLink new value: #z).
    b := a copy.
    b addFirst: (SimpleLink new value: #b).
    a removeFirst.
    {a. b}

The output is
    #(a LinkedList(a SimpleLink(#y) a SimpleLink(#z))
      a LinkedList(a SimpleLink(#b) a SimpleLink(#x)
                   a SimpleLink(#y) a SimpleLink(#z)))

which shows that we now have a LinkedList>>copy that conforms to the
ANSI specification [almost, see last paragraph].

Will this make

    oldElements copy do: [:each | self remove: each]

work?  No, because the "elements" of a copy of a LinkedList are, must be,
different objects from the "elements" of the original.  (Yet another way
in which LinkedLists fail to act like collections.)  Of course, the ANSI
specification of #copy doesn't say that have to be, just that the result
is "as similar as possible", and since a Link _cannot_ belong to two
lists at the same time, this _is_ as close as possible.

The basic problem here is that a key aspect of the design of LinkedList
appears to have been copied from Simula 67.  In the absence of generic
(Ada/Eiffel term; for C++ read 'template') classes, and in the presence
of static type checking, the only way you could have a general purpose
list class in Simula 67 was by having a Link class which you had to
make your own subclasses of.  There was no need for that in Smalltalk,
and the result is that LinkedList is utterly unlike every other Collection.
Squeak has three kinds of collections:
 - general-purpose collections that can hold any kind of element at all
   (execpt, in some cases, nil)
   A non-nil object may belong to any number of general-purpose collections
   at the same time.
 - compact collections that can hold only values that are themselves
   immutable such as numbers and characters, restricted to a particular type.
   A number or character may belong to any number of compact collections
   (provided they accept that kind of number or character) at the same time.
 - LinkedList, which can hold only Links.
   A Link may only belong to a single LinkedList.
It turns out that some "core" collection methods CANNOT be implemented for
LinkedList.

Let's briefly consider all the ANSI <collection> messages:

    allSatisfy: aBlock			works
    anySatisfy: aBlock			works
    asArray				Ka-BOOM!  [1]
    asBag				works
    asByteArray				does not and must not work [2]
    asOrderedCollection			works
    asSet				works
    asSortedCollection			does not and must not work [3]
    asSortedCollection: sortBlock	works
    collect: aBlock			Ka-BOOM!  [4]
    detect: aBlock			works
    detect: aBlock ifNone: exceptionBlock	works
    do: aBlock				works [5]
    do: aBlock separatedBy: sepBlock	Ka-BOOM! [6]
    includes: anElement			Ka-BOOM! [7]
    inject: value into: aBlock		works
    isEmpty				works
    notEmpty				works
    occurrencesOf: anItem		works
    rehash				works [8]
    reject: aBlock			Ka-BOOM! [9]
    select: aBlock			Ka-BOOM! [9]
    size				works

[1] This needs the LinkedList>>at: method I posted a while back.
    Note that ANSI requires asArray to *work* and it easily could.
    Another implementation:
	LinkedList "category 'converting'">>
	asArray
	    |result index|
	    result := Array new: self size.
	    index := 0.
	    self do: [:each | result at: (index := index + 1) put: each].
	    ^result

[2] Because a Link isn't an integer between: 0 and: 255.

[3] Because #<= is not defined on Link.  If we add
	SimpleLink "category 'comparing'">>
	<= anotherValuable
	    ^value <= anotherValuable value
    then it can work the way you would expect.
	    
[4] As noted earlier.  A working implemention can be found above.

[5] With a whopping big caveat.
    The only use of LinkedList I can find in Squeak 3.2 is process queues.
    Process queues may change as a results of events, so #do: is totally
    vulnerable to a LinkedList changing while the iteration is running.
    This means that anyCollection removeAll: someProcessQueue
    never _could_ be relied on to work in Squeak.

[6] You _could_ define #do:separatedBy: once and for all in terms of #do:.
    Collection>>#do:separatedBy: does exactly what you would expect and
    _would_ work perfectly for LinkedList.  Except that
    SequenceableCollection>>do:separatedBy: (last changed at the end of
    2000) assumes not merely that SequenceableCollections can be
    indexed, but that this is the cheapest way to traverse them.
    If you adopt my LinkedList>>#at: then the #do:separatedBy: in
    SequenceableCollection will work, but it will take quadratic time
    when linear time is available.  Simplest fix:  copy the definition
    from Collection down into LinkedList.

[7] #at: again.  SequenceableCollections are collections that have a
    natural order; too many methods in SequenceableCollection make the
    invalid assumption that "has an order" == "is cheap to index".
    This leads to weirdness like Heap having _two_ orders:  the order
    used by #at: and #do:, and the different order used by #removeFirst.
    Proof by demonstration:
	h := Heap new.
	h addAll: #(3 1 4 1 5 9 2 6 5).
	h do: [:each | Transcript space; print: each].
	Transcript endEntry
    ==> 1 1 2 3 5 9 4 6 5
    but	r := OrderedCollection new.
	[h isEmpty] whileFalse: [r addLast: h removeFirst].
	r
    ==> an OrderedCollection(1 1 2 3 4 5 5 6 9)
    Trying to grapple with a collection with two incompatible orders	
    makes my head hurt.

[8] Because it's inherited from ProtoObject, where it's defined
    to do nothing.  Note, however, that this is supposed to be available
    for general use in ANSI; Squeak collections put it in the 'private'
    category.  In ProtoObject it's in 'objects from disk'.

[9] SequenceableCollection>>select: assumes that if a collection has a
    natural order, then you can create a nil-initialised object of that
    type of a given size.  Get past that and you run headlong into #at:.
    I was about to show you how to implement these so that they do work,
    when I realised that #select: and #reject: *CANNOT* be implemented
    in accord with the ANSI specification for LinkedList.

    Why?  Because the result is supposed to contain a selection of the
    elements of the receiver.  If the result were ever non-empty, there
    would have to be some element *identically* present in both the
    receiver and the result.  But that is impossible for a Link.

So here is the conclusion:

(A) #select: and #reject: not only don't work for LinkedList in Squeak 3.2,
    they _cannot_ be made to work in any way that would conform to the
    ANSI specification of #select: and #reject: for collections nor to
    one's common-sense expectations.

(B) Because LinkedList does not implement the <collection> protocol, it is
    not a collection class for the purposes of deciding which other ANSI
    methods should accept it as an argument.

(C) #removeAll: and #addAll:, amongst others, are only specified for
    arguments that support the full <collection> protocol.  Since LinkedList
    does not and cannot do that, #removeAll: and #addAll: are NOT DEFINED
    for a LinkedList argument.

(D) Therefore, if "oldElements copy do: [:each | self remove: each]"
    does not work when oldElements is a LinkedList, that does NOT constitute
    a failure to conform precisely to the ANSI semantics of #removeAll:.

(E) This is not an argument that #removeAll: _shouldn't_ work for a
    LinkedList argument, only that the ANSI specification doesn't require it
    to.  However, oneLinkedList removeAll: anotherLinkedList could change
    things only when the two lists are the same object, which takes us
    right back to an argument I don't want to get back into.

    For the same reason, I am NOT proposing the additions to LinkedList
    shown above as a [FIX].  (Because they take existing methods of core
    collection classes that currently give nonsensical results or none,
    and make them give results I regard as sensible, but the new methods
    are not guaranteed to be compatible with other Smalltalks.)



More information about the Squeak-dev mailing list