[BUG] Set>>collect:

Richard A. O'Keefe ok at cs.otago.ac.nz
Mon Feb 17 00:02:58 UTC 2003


Bill Spight <bspight at pacbell.net> wrote:
	Consider the fact that SortedCollection>>collect: yields an
	OrderedCollection. That makes sense, because it is an extra burden on
	collect: to sort the new collection. *And you can always sort the new
	collection if you want to.*
	
The "burden" of sorting has nothing to do with the case, tra la.
It's the *incorrectness* of sorting that matters.
Amongst other things, sorting would break the positional link
between source items and result items (a link that only exists
when you collect: over a sequenceable collection, but DOES exist
for every such collection type).

	By contrast, if sending collect: to IdentitySet produces a Set, it is
	too late to go back and put in those elements that were rejected because
	they were equal, but not identical. It is OK to go from IdentitySet to
	Set, but not the other way.
	
So what?  All that means is "if you have a collection and you want an
IdentitySet result, DON'T USE #collect: TO DO IT."  It's precisely the
problem that #collect:into: is designed to solve.

There is only a problem when you have a set of results which support
both identity and a *distinct* notion of equality AND you happen to
know that this time you want identity.

If you consider transformation blocks that return Numbers or Strings
(something mine quite often do), you'll see why I assume that equality
is a far better default than identity.

If there should be a real case whether equality is on offer, is different
from identity, and yet is not preferred, then all that shows is that
there isn't a specification for IdentitySet>>collect: which is perfect
for all problems, which is hardly surprising.

Possibly the biggest lesson here is that if the class of the result
actually matters much, then the programmer had better *think* very very
carefully about whether to use #collect: at all.  Again, precisely the
problem that #collect:into: was designed to solve.

For me, the perfect illustration of this is Dictionary.
If I wanted Dictionary asOrderedCollection collect:,
that's what I'd write.  What I'd _like_ Dictionary>>collect: to do
is this:

    Dictionary>>
    collect: aBlock
        |result|
        result := self copyEmpty.
        self keysAndValuesDo: [:key :value |
	    result at: key put: (aBlock value: value)].
	^result

Until #collect:into: is available, what do you do?
Instead of

    x :=anIdentitySet collect: [:each | each computeSomeNumber].

do

    x := IdentitySet new.  "x must not be a Set"
    anIdentitySet do: [:each | x add: each computeSomeNumber].

Is this such a big deal?

	So either you need to take a different approach, as Andrew Black is
	doing, or do something with IdentitySet, Set, and PluggableSet. The
	conservative way is to leave Set alone 

PRECISELY!  The conservative way is to leave Set alone, BUT Set explicitly
overrides #collect: precisely so that a 'self species' result will NOT be
returned.

	But *something* needs fixing.
	
I think the thing that needs fixing is the attitude that #collect:
*can* always return the right result (no matter what the collection class,
no matter what the result type, there will _always_ be cases where the
result needed is not the result on offer).

One of my frequent mistakes in Common Lisp is to write
    (map #'+ Xs Ys)
when I should write
    (map 'list #'+ Xs Ys)
Why does Common Lisp require me to write the result type explicitly
instead of (say) picking it up from the first collection argument?
Because the Common Lisp designers were well aware that they had several
kinds of sequences, and that no perfect default could be found.

This, and the map-into function which turned up in CLtL2, inspired
#collect:into:.


Now, how do we implement #collect:into:?
One of my little design patterns is "make the result do the work".
My preferred way of copying things is "self clone finishCopy" so that
all of the manipulation of the copy's instance variables can be done
entirely within the encapsulation boundary.

In the case of #collect:into:, it goes like this.

    Collection>>
    collect: aBlock
	^self collect: aBlock into: (self collectionSpecies new: self size)

    collect: aBlock into: anotherCollection
        anotherCollection fillWith: self collecting: aBlock.
        ^anotherCollection

    collectionSpecies
	^self species

We need some specical cases for collectionSpecies:

    Bag>>collectionSpecies
        ^Bag
    Set>>collectionSpecies
        ^Set
    Dictionary>>collectionSpecies
        ^OrderedCollection
    ArrayedCollection>>collectionSpecies
	^Array
    SortedCollection>>collectionSpecies
	^OrderedCollection

An extensible collection does this:

    fillWith: aCollection collecting: aBlock
	"Add the elements of (aCollection collect: aBlock) last.
	 The result is unspecified."

        aCollection do: [:each |
	    self add: (aBlock value: each)].

An indexed but not extensible collection does this:

    fillWith: aCollection collecting: aBlock
	"Fill myself with (aCollection collect: aBlock).
	 The result is unspecified."
	|index|

	index := 0.
	aCollection do: [:each |
	    self at: (index := index+1) put: (aBlock value: each)].

SortedCollection does this:

    fillWith: aCollection collecting: aBlock
	"Fill myself with (aCollection collect: aBlock) and
	 then become sorted again.  The result is unspecified."
        aCollection do: [:each |
	    "Use unsafe addition."
	    self addLast: (aBlock value: each)].
	self reSort.

Note that this approach does not require any "private" or
"invariant-destroying" messages to be sent from the outside, and does
not require unindexed collections to be treated as if indexed.

This isn't being posted as an [ENH] or [FIX] yet because there are still
some design questions.  (What, exactly, should AEDesc>>collect: do?
Since MacExternalData can be viewed as a sequence of bytes, shorts, or
fourBytes, how should #collect: iterate over it?  And so on.)



More information about the Squeak-dev mailing list