[BUG] Set>>collect:

Mon Feb 17 02:11:35 UTC 2003

Bill Spight <bspight at pacbell.net> wrote:
	I was unclear, and it seems that we are talking about different things.
	I intended to talk about the pragmatics of language, i. e., about
	interpretation, not about what a programmer might want to do. And in
	that regard, Squeak is very much the issue, since it is the Squeak that
	the programmer must interpret.

I find this even more confusing than what you wrote last time.

My perspective here is that I have read quite a few Smalltalk textbooks
and several commercial Smalltalk manuals and the ANSI standard.  I'm
looking at it as someone saying "here is everything I have been told
about #collect: in these books that seem to be some kind of authorities;
here is how I expect #collect: to be useful to me."

Nowhere in any of these documents is there the SLIGHTEST suggestion
that there is ANY restriction on what a transformation block may return.

Anyone familiar with a range of languages would expect #collect: to be
similar to the mapping functions in Lisp (where there is a restriction on
the resulting elements ONLY if you ask for a special result type) or in
functional languages (where there is no restriction).

For a Smalltalk programmer, the "contract" of #collect:
goes roughly like this:

    aCollection collect: transformationBlock
	"You supply a transformationBlock.
	 It must be a block that accepts one argument.
	 It should work with all the elements of aCollection.
	 It is up to YOU to make this so.

	 I send each element of aCollection to transformationBlock
	 and gather the results, putting them in a collection that
	 is somewhat similar to aCollection.  If aCollection is
	 sequenceable, so will the result be.
	 It is up to ME to cope with whatever your transformationBlock
	 returns."

The idea that IdentitySet>>collect: should return an IdentitySet
can be fairly and honestly characterised as "a foolish consistency";
foolish because it INCREASES the burden on the programmer.

	> So we see that the reason that there _is_ a Set>>collect: in the first place
	> appears to be precisely in order to AVOID using 'self species new'.  It's a
	> bug avoidance measure.  I have explained what the bug is.
	> 

	I take it you are referring to the behavior of PluggableSet, in your
	note of a few days ago.

And to the behaviour of IdentitySet, as mentioned also.
The essential point is that

    because the elements of the result
    are usually NOT like the elements of the receiver
    the equality test for the receiver
    is often NOT appropriate for the result.

	> The point is that having IdentitySet>>collect: return an IdentitySet
	> would be simply WRONG in many, if not most, cases.

	In many cases, certainly, but it would be right in many cases, as well.

The thing is that Set>>#collect: has *NO* way to tell which case is which.
It has to do the thing which is most likely to work, which is to return a Set.

There is, as I keep saying, *NO* patch which can possibly make
Set>>collect: return the perfect result in all cases.
The only way for #collect: to know what the right kind of result
should be is to be told.  Hence #collect:into:.

So the right thing to do is to leave Set the way it is and introduce
a general-purpose method which _can_ give the programmer the right result
in every case, because it _can_ be told what kind of result to return.

	> Set out rather unclearly, yes.  But you must on no account press that
	> little non-technical word "like" too hard.  Anything more than "ordered
	> if the receiver is ordered, unordered if the receiver is unordered" would
	> be more than it could bear.

	I disagree, on the basis of pragmatics.

But it is precisely pragmatics which is the grounds of my argument.

I consider it highly significant that I have posted several examples
showing that to always return an instance of 'self species' is clearly
wrong, whereas nobody has posted even one example showing the contrary.
To be precise, nobody has yet posted an example where

    - there is an IdentitySet
    - there is a transformationBlock
    - the results of the transformationBlock have equality
      that differs from identity
    - the result of mapping the transformation block over the
      identity set has the wrong elements.
    - the example came up in actual use.

	The key point, I think, is that if performing collect: on an IdentitySet
	produces a Set, you may lose elements that are equal but not
	identitical, and recovery is not easy.

There are infinitely many stupid things you can do in Smalltalk
from which recovery is not easy.

This is one of them.

(Once again, it is a *hypothetical* example, not a concrete case that
has come up in practical use.)

Call me a pragmatic old fuddy-duddy, but I just don't see any point in
removing a bug fix and thereby breaking PluggableSet just so that
IdentitySet *might* do the right thing for an unknown number of as yet
entirely imaginary cases.

What we need is an intention-revealing message send which makes it
absolutely clear what kind of result is wanted.

	OTOH, if it includes elements
	that are identical but not equal, you can easily convert the resultant
	IdentitySet to a Set. (There may be better ways to do it, OC.)

I don't know how to say strongly enough just how weird this argument seems
to me.

If you *KNOW* you need a Set, or you *KNOW* you need an IdentitySet,
there isn't any problem right now:

    aCollection inject: Set new into: [:set :each | set add: each]

or

    aCollection inject: IdentitySet new into: [:set :each | set add: each]

If you DON'T know that you need a Set, then you WON'T have any code to
do the conversion, and the IdentitySet result will be broken.

So the argument compares apples (where it matters that you need an
IdentitySet but you DON'T know that) with oranges (where it matters that
you need a Set and you DO know that).  When you compare genuinely similar
cases where you DO know what result type you need then you find that 
there isn't any problem right now (because you can express exactly what
you want with #inject:into: although #collect:into: would definitely be
clearer).  When you compare genuinely similar cases where you DON'T know
what result type you need, then you find that there are at least as many
cases where returning a Set (pretty much any transformation that returns
numbers or strings) is better as there are where returning an IdentitySet
is better (no actual concrete examples available yet).

	Yes, but the difference, if you are using an IdentitySet in the first
	place, is crucial. You are using an IdentitySet because you do *not*
	want to test for equality.

But because the results of the transformation block are quite likely to
be of different types, IdentitySet>>collect: has *NO* reason to believe
that you wish to avoid equality tests on *them*.

This is the absolutely central point.  It doesn't matter whether we
start from Set, IdentitySet, or PluggableSet.  No matter what variety
of Set we start from, we have *NO* reason to believe that the version
of equality used for its elements makes sense when applied to the
results of the transformation block.

Just as IdentitySet>>collect: should *almost* always return a Set,
but perhaps occasionally an IdentitySet, so also Set>>collect: should
sometimes return an IdentitySet.  There is nothing unique about IdentitySet
here.  ALL types of sets have exactly the same problem.

Look, when I wrote my new collection classes, I spent *days* arguing with
myself about this one.  I started to compose bug reports for this list
more than once, and every time came back to the same conclusion:  the way
it is now is the best definition possible for Set>>collect:, AND a more
general interface is needed.

	As for the new collection, who knows, but if
	avoiding equality testing was important in the first place, it is
	probably still important.

Nope.  Almost certainly not, in my experience.   Mind you, my identity
sets have been symbols or nodes in graphs, and my transformations have
returned numbers, strings, and Sets (not identitysets) for which it was
important that the result NOT be an IdentitySet.

	We may not even want the new collection to be a Set, but if we
	do, it seems to me that it's a good bet that we want it to be
	another IdentitySet.

Well, "seeming" isn't a good reason.  In my experience, an IdentitySet
is almost never a good result.

What this means is that my experience and your seeming cancel each other
out, and _neither_ of us is entitled to use 'probability' language about
the alternatives.

Remember, the only cases that are really in dispute are cases where
equality differs from identity.  Those are the only cases where information
can be lost by returning a Set instead of an IdentitySet.  And the commonest
objects with equality differing from identity are Numbers and Strings, where
equality is the most intuitive result.

	Besides the question of probability is
	that of prominence, and the idea of avoiding equality testing is
	still prominent.

It's prominent in the *INPUT* of the operation.
There is no reason why it should be prominent in the *OUTPUT*.
Let's face it, the most prominent feature of Dictionaries is having
hashed keys, yet the result of Dictionary>>collect: does not have hashed keys.
If 'prominence' were a good criterion, then sauce for the IdentitySet goose
should be sauce for the Dictionary gander.

The really important thing here is that once we have #collect:into:
we won't _care_ that much what IdentitySet>>collect: returns.