[FIX] #union: broken for sets

Richard A. O'Keefe ok at cs.otago.ac.nz
Thu Feb 20 23:36:34 UTC 2003

In Squeak 3.0, 3.2, and 3.4alpha(5108) we find the same implementation
of #union:

    union: aCollection
        ^self asSet addAll: aCollection; yourself

This is the _only_ implementation of #union: in those versions of Squeak.
There are three implementations of #asSet,

	"Answer a Set whose elements are the unique elements
	 of the receiver."
	^Set withAll: self

        ^contents keys


The method comment in Collection says that the answer is a Set,
but for bags it may be a Set or an IdentitySet, and for sets it
may be a Set, an IdentitySet, or a PluggableSet.

Collections other than sets always answer a new object to #asSet;
sets of all kinds never answer a new object.

The problem is that the point of #union: is to return a new object
without mutating any existing objects.  But, irony of ironies, if
the receiver already is some kind of set, it IS mutated.
    x := #(3 1 4 1) asSet<print it>
=>  a Set(1 3 4)
    x union: #(5 9 6)<print it>
=>  a Set(1 3 4 5 6 9)
    x<print it>
=>  a Set(1 3 4 5 6 9)

The first problem to sort out is what the semantics of #asSet
is supposed to be.  #asIdentitySet always returns an IdentitySet,
not some other kind of set.  "as: Set" always returns a new object
that is precisely a Set, not any other kind of set.  The method
comment in Collection suggests that the author thought #asSet
always returned a Set.

The simplest change that could possibly work is "add a colon".

    union: aCollection
        ^(self as: Set) addAll: aCollection; yourself

This would eliminate the unintended mutation bug.
It would also make the result always be a Set, not any other kind of set.
Less happily, it would be less efficient for bags.

What I actually propose is a "bigger" change that makes much less
difference to the observable behaviour.  In particular, the change
I propose
+ does not alter any existing method
+ does not change the costs or results of #union: for anything except sets.
? still allows the kind of set you get to depend on the class of the
  receiver, and in the same way as before.

The correction is to add a new method:

    union: aCollection
        "Answer the set theoretic union of the receiver and aCollection,
         using the receiver's notion of equality and not side effecting
         the receiver at all."

	^self copy addAll: aCollection; yourself

More information about the Squeak-dev mailing list