[BUG] Set>>collect:

Richard A. O'Keefe ok at cs.otago.ac.nz
Fri Feb 14 01:12:12 UTC 2003


Bill Spight <bspight at pacbell.net> wrote:
	This afternoon I watched a Set appear where I wanted an IdentitySet. The
	culprit seems to be here:
	
	[Set>>collect: has "Set new" not "self species new"]

This is one I consider reporting a couple of years ago,
and agonised over a bit.  It finally dawned on me that the code
as it stands is hard to improve.

The reason it is hard to improve is PluggableSets.

Suppose I have a PluggableSet of Strings, and the equality and
hash blocks are such that alphabetic case is ignored.  Call this
"cis".

Suppose further that "Set new" is changed to "self species new".

What's "cis species"?
It's PluggableSet.

Now suppose I do "cis collect: [:each | each , each]".
I get a PluggableSet, but it has a nil hashBlock and a nil equalBlock.
Thanks to special case code in PluggableSet>>scanFor: that acts just
like a Set.  What I get is something that ACTS like a Set, but LOOKS
like a PluggableSet.

What we'd really need to make that example work the way I want is
"self copyEmpty" not "self species new".

But if it were fixed _that_ way, I'd be stuck when I did
"cis collect: [:each | each length]", because now the hash and equal
blocks would be trying to ignore alphabetic case in the characters of
integers, which don't have characters to ask about.

The essential point about #collect: is that the transformation block
does NOT in general return objects of the same type that it is passed.

Just because identity is the right individuation criterion for items
x y in a set, that does not mean that it is the right individuation
criterion for (someBlock value: x) and (someBlock value: y).

In short, the code that is there in Set is just about the best
compromise possible.

Suppose instead that the code were factored like this:

    Set>>
    collect: aBlock
	^self collect: aBlock into: Set new

    collect: aBlock into: aCollection
	array do: [:each |
	    each ifNotNil: [
	        aCollection add: (aBlock value: each)]].
	^aCollection

Then we could have
    aSet collect: aBlock into: IdentitySet new
    anIdentitySet collect: aBlock into:
        (PluggableSet new hashBlock: foo; equalBlock: bar; yourself)
and a host of other combinations.

#collect:into: makes a lot of sense for most collections (of course
the last argument must be an extensible collection), e.g.
    aString collect: [:each | each asInteger] into: IdentitySet new.
That could of course be done using
    IdentitySet new withAll: (aString collect: [:each | each asInteger])

except that String>>collect: wants to make a String, and so this
perfectly sensible operation doesn't work.  FEH!  The general rule
is that a #collect: method should return a container with the same
*shape* as the receiver, but not with the same *element restrictions*.

    Object>>
    isCharacter	^false

    Character>>
    isCharacter ^true

    String>>
    collect: aBlock
	"Answer a String or an Array depending on whether all of the
	 values yielded by aBlock are Characters (String) or not (Array)."
	|result item flag t|

	result := self species new: self size.
	flag := false.
	1 to: self size do: [:index |
	    item := aBlock value: (self at: index).
	    (flag or: [item isCharacter])
	        ifFalse: [t := Array new: self size.
			  1 to: index - 1 do: [:i |
			      t at: i put: (result at: i)].
			  result := t.
			  flag := true].
	    result at: index put: item].
	^result

Without some such change, String>>collect: is essentially useless.
Anyway, I think I have made my case for #collect:into:



More information about the Squeak-dev mailing list