I wrote that
: oldElements copy do: [:each | self remove: each]
: implemenents precisely the semantics required by the ANSI standard.
"Andrew C. Greenberg" <werdna(a)mucow.com> wrote:
Except on collections where it doesn't: Richard's reasonable code
presumes that the collection answers to copy in the manner expected.
This code fails, for example, on a LinkedList.
When we are talking about the ANSI standard, it pays to remember that
there is no such thing as a LinkedList class in ANSI.
Now #copy _is_ an ANSI method, and 5.3.1.6 says
Return a new object that must be as similar as possible to the
receiver in its initial state and behavior. [Except that]
Any operation that changes the state of the new object should not
as a side effect change the state or behavior of the receiver.
Similarly, any change to the receiver should not as a side-effect
change the new object.
If the receiver is an identity object, return the receiver.
and the Rationale warns explicitly that there _isn't_ any promise
that x copy = x.
Now let's look at LinkedList. In Squeak 3.2 there is no #copy method
defined in the LinkedList class. (By the way, in the Squeak 3.2
scroll-bars, the "menu" box is in an EXTREMELY inconvenient place.
Whenever I try to scroll up I find myself with a menu instead and have
to spend more time than I care to twitching the mouse around trying to
get the "scroll up" box. The "menu" button should be _above_ the
"scroll up" box.) The #copy method is inherited from Object.
LinkedList(Object)>>copy
^self shallowCopy
LinkedList(SequenceableCollection>>shallowCopy
^self copyFrom: 1 to: self size
LinkedList(SequenceableCOllection)>>copyFrom: start to: stop
|newSize|
newSize := stop - start + 1.
^(self species new: newSize)
replaceFrom: 1 to: newSize with: self startingAt: start
(a LinkedList) species is LinkedList.
LinkedList class(Behavior)>>new: bombs out,
"LinkedList cannot have variable sized instances".
Is this the fault of my suggested implementation of #removeAll:?
No, as I wrote, it fully conforms to the ANSI specification.
The implicit assumption is "given any collections which otherwise
conform to the ANSI specification." And LinkedList is so far from
doing that, in almost _any_ operation, that it cannot be said to
conform to the ANSI <collection> protocol at all.
However, one _could_ add a new kind of Link. Here's one that I used
for testing:
Link subclass: SimpleLink
instanceVariables: 'value'
...
"method category 'accessing'"
value ^value
value: anObject value := anObject
printOn: aStream aStream nextPutAll: 'a SimpleLink(';
print: value; nextPut: #)
The problem is that LinkedList does not have a #copy that conforms
to the ANSI specification. So let's give it one.
LinkedList "category 'copying'">>
copy
|result|
result := self species new.
self do: [:each | result addLast: each copy].
^result
I was going to present an example using #collect:, but (a) that blows
up for the same reason that #copy used to, (b) it's not even clear that
it makes sense. The block would have to accept a Link as argument and
return a different Link as result. Oh well, a collection is allowed to
have type restrictions, so here's #collect: after all.
LinkedList "category 'enumerating'>>
collect: aBlock
|result|
result := self species new.
self do: [:each |
result addLast: (aBlock value: each)].
^result
Now we can do
a := LinkedList new.
a add: (SimpleLink new value: #x).
a add: (SimpleLink new value: #y).
a add: (SimpleLink new value: #z).
b := a copy.
b addFirst: (SimpleLink new value: #b).
a removeFirst.
{a. b}
The output is
#(a LinkedList(a SimpleLink(#y) a SimpleLink(#z))
a LinkedList(a SimpleLink(#b) a SimpleLink(#x)
a SimpleLink(#y) a SimpleLink(#z)))
which shows that we now have a LinkedList>>copy that conforms to the
ANSI specification [almost, see last paragraph].
Will this make
oldElements copy do: [:each | self remove: each]
work? No, because the "elements" of a copy of a LinkedList are, must be,
different objects from the "elements" of the original. (Yet another way
in which LinkedLists fail to act like collections.) Of course, the ANSI
specification of #copy doesn't say that have to be, just that the result
is "as similar as possible", and since a Link _cannot_ belong to two
lists at the same time, this _is_ as close as possible.
The basic problem here is that a key aspect of the design of LinkedList
appears to have been copied from Simula 67. In the absence of generic
(Ada/Eiffel term; for C++ read 'template') classes, and in the presence
of static type checking, the only way you could have a general purpose
list class in Simula 67 was by having a Link class which you had to
make your own subclasses of. There was no need for that in Smalltalk,
and the result is that LinkedList is utterly unlike every other Collection.
Squeak has three kinds of collections:
- general-purpose collections that can hold any kind of element at all
(execpt, in some cases, nil)
A non-nil object may belong to any number of general-purpose collections
at the same time.
- compact collections that can hold only values that are themselves
immutable such as numbers and characters, restricted to a particular type.
A number or character may belong to any number of compact collections
(provided they accept that kind of number or character) at the same time.
- LinkedList, which can hold only Links.
A Link may only belong to a single LinkedList.
It turns out that some "core" collection methods CANNOT be implemented for
LinkedList.
Let's briefly consider all the ANSI <collection> messages:
allSatisfy: aBlock works
anySatisfy: aBlock works
asArray Ka-BOOM! [1]
asBag works
asByteArray does not and must not work [2]
asOrderedCollection works
asSet works
asSortedCollection does not and must not work [3]
asSortedCollection: sortBlock works
collect: aBlock Ka-BOOM! [4]
detect: aBlock works
detect: aBlock ifNone: exceptionBlock works
do: aBlock works [5]
do: aBlock separatedBy: sepBlock Ka-BOOM! [6]
includes: anElement Ka-BOOM! [7]
inject: value into: aBlock works
isEmpty works
notEmpty works
occurrencesOf: anItem works
rehash works [8]
reject: aBlock Ka-BOOM! [9]
select: aBlock Ka-BOOM! [9]
size works
[1] This needs the LinkedList>>at: method I posted a while back.
Note that ANSI requires asArray to *work* and it easily could.
Another implementation:
LinkedList "category 'converting'">>
asArray
|result index|
result := Array new: self size.
index := 0.
self do: [:each | result at: (index := index + 1) put: each].
^result
[2] Because a Link isn't an integer between: 0 and: 255.
[3] Because #<= is not defined on Link. If we add
SimpleLink "category 'comparing'">>
<= anotherValuable
^value <= anotherValuable value
then it can work the way you would expect.
[4] As noted earlier. A working implemention can be found above.
[5] With a whopping big caveat.
The only use of LinkedList I can find in Squeak 3.2 is process queues.
Process queues may change as a results of events, so #do: is totally
vulnerable to a LinkedList changing while the iteration is running.
This means that anyCollection removeAll: someProcessQueue
never _could_ be relied on to work in Squeak.
[6] You _could_ define #do:separatedBy: once and for all in terms of #do:.
Collection>>#do:separatedBy: does exactly what you would expect and
_would_ work perfectly for LinkedList. Except that
SequenceableCollection>>do:separatedBy: (last changed at the end of
2000) assumes not merely that SequenceableCollections can be
indexed, but that this is the cheapest way to traverse them.
If you adopt my LinkedList>>#at: then the #do:separatedBy: in
SequenceableCollection will work, but it will take quadratic time
when linear time is available. Simplest fix: copy the definition
from Collection down into LinkedList.
[7] #at: again. SequenceableCollections are collections that have a
natural order; too many methods in SequenceableCollection make the
invalid assumption that "has an order" == "is cheap to index".
This leads to weirdness like Heap having _two_ orders: the order
used by #at: and #do:, and the different order used by #removeFirst.
Proof by demonstration:
h := Heap new.
h addAll: #(3 1 4 1 5 9 2 6 5).
h do: [:each | Transcript space; print: each].
Transcript endEntry
==> 1 1 2 3 5 9 4 6 5
but r := OrderedCollection new.
[h isEmpty] whileFalse: [r addLast: h removeFirst].
r
==> an OrderedCollection(1 1 2 3 4 5 5 6 9)
Trying to grapple with a collection with two incompatible orders
makes my head hurt.
[8] Because it's inherited from ProtoObject, where it's defined
to do nothing. Note, however, that this is supposed to be available
for general use in ANSI; Squeak collections put it in the 'private'
category. In ProtoObject it's in 'objects from disk'.
[9] SequenceableCollection>>select: assumes that if a collection has a
natural order, then you can create a nil-initialised object of that
type of a given size. Get past that and you run headlong into #at:.
I was about to show you how to implement these so that they do work,
when I realised that #select: and #reject: *CANNOT* be implemented
in accord with the ANSI specification for LinkedList.
Why? Because the result is supposed to contain a selection of the
elements of the receiver. If the result were ever non-empty, there
would have to be some element *identically* present in both the
receiver and the result. But that is impossible for a Link.
So here is the conclusion:
(A) #select: and #reject: not only don't work for LinkedList in Squeak 3.2,
they _cannot_ be made to work in any way that would conform to the
ANSI specification of #select: and #reject: for collections nor to
one's common-sense expectations.
(B) Because LinkedList does not implement the <collection> protocol, it is
not a collection class for the purposes of deciding which other ANSI
methods should accept it as an argument.
(C) #removeAll: and #addAll:, amongst others, are only specified for
arguments that support the full <collection> protocol. Since LinkedList
does not and cannot do that, #removeAll: and #addAll: are NOT DEFINED
for a LinkedList argument.
(D) Therefore, if "oldElements copy do: [:each | self remove: each]"
does not work when oldElements is a LinkedList, that does NOT constitute
a failure to conform precisely to the ANSI semantics of #removeAll:.
(E) This is not an argument that #removeAll: _shouldn't_ work for a
LinkedList argument, only that the ANSI specification doesn't require it
to. However, oneLinkedList removeAll: anotherLinkedList could change
things only when the two lists are the same object, which takes us
right back to an argument I don't want to get back into.
For the same reason, I am NOT proposing the additions to LinkedList
shown above as a [FIX]. (Because they take existing methods of core
collection classes that currently give nonsensical results or none,
and make them give results I regard as sensible, but the new methods
are not guaranteed to be compatible with other Smalltalks.)