On Fri, Nov 28, 2014 at 7:42 PM, Ben Coman <btc@openinworld.com> wrote:
Levente Uzonyi wrote:
The problem with changing #species is that it's overused. There are at least three different things it's used for in the Collection hierarchy:
- the class of the object returned by #collect: (Set's subclasses are the exception since 2000 or so, except for WeakSet, but that's really strange, and should revisited)
- the class of the object returned by #select: and friends
- the class used for comparison of collections

So should there be #collectSpecies ?

No.  That's what species was invented for in the first place.
 
Which defaults to "^self species" and is overidden as required.

In the case of Sets, you start with an "unordered collection without duplicates", but since #collect can introduce duplicates, you end up with "an unordered collection with duplicates" --> Bag (per Dave's suggestion)

So I think its reasonable that the following two snippets should produce the same result.

oc := OrderedCollection newFrom: #( 1 2 3 4 5 ).
(oc collect: [ :e | e even ]) occurrencesOf:  true. "--> 2"

s := Set newFrom: #( 1 2 3 4 5 ).
(s collect: [ :e | e even ]) occurrencesOf:  true.  "--> 2"


which you get with the following change...

Set>>collectSpecies
        ^Bag

Set>>collect: aBlock
        | result |
        result := self collectSpecies new: self size.
        array do: [:each | each ifNotNil: [result add: (aBlock value: each enclosedSetElement)]].
        ^ result




HashedCollection >> #select: uses #species. Changing IdentitySet
#species to return a Set would break #select:, because a ~~ b doesn't
imply a ~= b. IdentitySet is the optimal class for the return value of IdentitySet >> #select:, because it's guaranteed to be able to hold all selected values. I think the same applies to all other kind of sets (and collections, because #select: is simply reducing the number of elements).

This is not the first time this discussion comes up. There was one in 2003[1], but the thread is hard to follow. And there was one in 2009[2].
I think the conclusion was that it's better to leave the method as it is, because there's no better way to do it.
Sure, we could use #species in #collect:, but it would break quite a lot of stuff. For example:

| k |
k := KeyedSet keyBlock: [ :each | each first ].
k add: #[1]; add: #[2].
k collect: [ :each | each size ]


That works with the above change to produce --> a Bag(1 1)


Currently this snippet returns "a Set(1)". Using #species in #collect: (via #copyEmpty) one would get an error, because SmallInteger does not understand #first. Changing the #species of KeyedSet would break #select:.

So IMHO when you #collect: from a set, you should always use #collect:as:. Here is a (partial) list of methods which send #collect: to a Set:

Categorizer >> #changeFromCategorySpecs:
ClassFactoryForTestCase >> #createdClassNames
Dictionary >> #unreferencedKeys
MCDependencySorterTest >> #assertItems:orderAs:withRequired:toLoad:extraProvisions:
MessageNode >> #analyseTempsWithin:rootNode:assignmentPools:
MethodNode >> #referencedValuesWithinBlockExtent:
SetTest >> #testCollect
SetWithNilTest >> #runSetWithNilTestOf:

There are probably many more, but we should fix all of them. A static analyzer could help a lot.

Levente

[1] http://lists.squeakfoundation.org/pipermail/squeak-dev/2003-February/052659.html
[2] http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-November/141016.html



On Wed, 26 Nov 2014, David T. Lewis wrote:

It seems to me that Object>>species is intended to handle this sort of issue.

For IdentitySet, it answers what Eliot was expecting:

  IdentitySet new species ==> IdentitySet
  Set new species ==> Set

However, IdentitySet>>collect: does not make use of this, and it answers
a Set for the reasons that Levente explained.

Answering an Array or and OrderedCollection would not really make sense,
because sets are unordered collections (but Bag might be better).

Bag is better.


Shouldn't we have an implementation of IdentitySet>>species that answers
Set (or Bag), with a method comment explaining why this is the case, and
with all of the collection methods using #species to answer the right
kind of result?

I note that IdentitySet>>collect: answers a Set, but IdentitySet>select:
sends #species and therefore answers an IdentitySet.

So I think that if we want the #species of an IdentitySet to be a Set,
then we should make it so, and give it a method comment to explain the
rationale. And the #collect: and #select: methods should both answer a
result of that #species.

I think the species of #collect: and #select: are intrinsically different. #collect: is a transformation.


Dave


On Thu, Nov 27, 2014 at 01:14:30AM +0100, Levente Uzonyi wrote:
Your example hides the problem of ordering - what Tobias is asking about -
so here's another:

(IdentitySet withAll: #(1 1.0)) collect: [ :each | each class ]

If IdentitySet >> #collect: were returning an Array, then what would be the
answer?

{ SmallInteger. Float } or { Float. SmallInteger } ?

-->  a Bag(SmallInteger Float)


If you really want to have the resulting collection have the same size,
but avoid the problem with ordering, then what you really need is a Bag.

To me that makes the most sense.


On Thu, 27 Nov 2014, Frank Lesser wrote:

Hi Tobias,
agree, a problem of "OrderedCollection"
not to break a lot of other things we could return an Array.
but for me collecting has priority.
Frank

-----Urspr?ngliche Nachricht-----
Von: squeak-dev-bounces@lists.squeakfoundation.org
[mailto:squeak-dev-bounces@lists.squeakfoundation.org] Im Auftrag von
Tobias
Pape
Gesendet: Donnerstag, 27. November 2014 00:48
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] IdentitySet>>collect:


On 27.11.2014, at 00:34, Frank Lesser <frank-lesser@lesser-software.com>
wrote:

hmm, not convinced

(IdentitySet withAll: #(1 1.0)) collect: [:e| e asInteger ]
OrderedCollection(1 1 )

in LSWVST ( one-to-one ), you collect results of evaluating a block on
objects.

Frank
maybe I am wrong ...

Where would the order come from for that _Ordered_Collection?

An unordered OrderedCollection --> Bag

cheers -ben




--
best,
Eliot