2014-11-29 3:05 GMT+01:00 Levente Uzonyi <leves@elte.hu>:
The problem with changing #species is that it's overused. There are at least three different things it's used for in the Collection hierarchy:
- the class of the object returned by #collect: (Set's subclasses are the exception since 2000 or so, except for WeakSet, but that's really strange, and should revisited)
- the class of the object returned by #select: and friends
- the class used for comparison of collections

HashedCollection >> #select: uses #species. Changing IdentitySet
#species to return a Set would break #select:, because a ~~ b doesn't
imply a ~= b. IdentitySet is the optimal class for the return value of IdentitySet >> #select:, because it's guaranteed to be able to hold all selected values. I think the same applies to all other kind of sets (and collections, because #select: is simply reducing the number of elements).

This is not the first time this discussion comes up. There was one in 2003[1], but the thread is hard to follow. And there was one in 2009[2].
I think the conclusion was that it's better to leave the method as it is, because there's no better way to do it.
Sure, we could use #species in #collect:, but it would break quite a lot of stuff. For example:

| k |
k := KeyedSet keyBlock: [ :each | each first ].
k add: #[1]; add: #[2].
k collect: [ :each | each size ]

Currently this snippet returns "a Set(1)". Using #species in #collect: (via #copyEmpty) one would get an error, because SmallInteger does not understand #first. Changing the #species of KeyedSet would break #select:.


IMO select: should NOT use species, because most of the time we can select: in same class (but maybe in case of Interval).
 
So IMHO when you #collect: from a set, you should always use #collect:as:. Here is a (partial) list of methods which send #collect: to a Set:

Categorizer >> #changeFromCategorySpecs:
ClassFactoryForTestCase >> #createdClassNames
Dictionary >> #unreferencedKeys
MCDependencySorterTest >> #assertItems:orderAs:withRequired:toLoad:extraProvisions:
MessageNode >> #analyseTempsWithin:rootNode:assignmentPools:
MethodNode >> #referencedValuesWithinBlockExtent:
SetTest >> #testCollect
SetWithNilTest >> #runSetWithNilTestOf:

There are probably many more, but we should fix all of them. A static analyzer could help a lot.

Levente

[1] http://lists.squeakfoundation.org/pipermail/squeak-dev/2003-February/052659.html
[2] http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-November/141016.html




On Wed, 26 Nov 2014, David T. Lewis wrote:

It seems to me that Object>>species is intended to handle this sort of issue.

For IdentitySet, it answers what Eliot was expecting:

  IdentitySet new species ==> IdentitySet
  Set new species ==> Set

However, IdentitySet>>collect: does not make use of this, and it answers
a Set for the reasons that Levente explained.

Answering an Array or and OrderedCollection would not really make sense,
because sets are unordered collections (but Bag might be better).

Shouldn't we have an implementation of IdentitySet>>species that answers
Set (or Bag), with a method comment explaining why this is the case, and
with all of the collection methods using #species to answer the right
kind of result?

I note that IdentitySet>>collect: answers a Set, but IdentitySet>select:
sends #species and therefore answers an IdentitySet.

So I think that if we want the #species of an IdentitySet to be a Set,
then we should make it so, and give it a method comment to explain the
rationale. And the #collect: and #select: methods should both answer a
result of that #species.

Dave


On Thu, Nov 27, 2014 at 01:14:30AM +0100, Levente Uzonyi wrote:
Your example hides the problem of ordering - what Tobias is asking about -
so here's another:

(IdentitySet withAll: #(1 1.0)) collect: [ :each | each class ]

If IdentitySet >> #collect: were returning an Array, then what would be the
answer?

{ SmallInteger. Float } or { Float. SmallInteger } ?

If you really want to have the resulting collection have the same size,
but avoid the problem with ordering, then what you really need is a Bag.
Luckily you can have whatever you want (as long as it makes sense):

(IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as: Set.
"==> a Set(1)"
(IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as:
IdentitySet.
"==> an IdentitySet(1)"
(IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as:
Array.
"==> #(1 1)"
(IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as:
OrderedCollection.
"==> an OrderedCollection(1 1)."
(IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as: Bag.
"==> a Bag(1 1)"
(IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as:
IdentityBag.
"==> an IdentityBag(1 1)"
(IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as:
Heap.
"==> a Heap(1 1)"

Levente

On Thu, 27 Nov 2014, Frank Lesser wrote:

Hi Tobias,
agree, a problem of "OrderedCollection"
not to break a lot of other things we could return an Array.
but for me collecting has priority.
Frank

-----Urspr?ngliche Nachricht-----
Von: squeak-dev-bounces@lists.squeakfoundation.org
[mailto:squeak-dev-bounces@lists.squeakfoundation.org] Im Auftrag von
Tobias
Pape
Gesendet: Donnerstag, 27. November 2014 00:48
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] IdentitySet>>collect:


On 27.11.2014, at 00:34, Frank Lesser <frank-lesser@lesser-software.com>
wrote:

hmm, not convinced

(IdentitySet withAll: #(1 1.0)) collect: [:e| e asInteger ]
OrderedCollection(1 1 )

in LSWVST ( one-to-one ), you collect results of evaluating a block on
objects.

Frank
maybe I am wrong ...

Where would the order come from for that _Ordered_Collection?

Best
        -Tobias