AW: [squeak-dev] IdentitySet>>collect:

Eliot Miranda eliot.miranda at gmail.com
Sun Nov 30 17:39:25 UTC 2014


Hi Levente,

On Nov 28, 2014, at 6:05 PM, Levente Uzonyi <leves at elte.hu> wrote:

> The problem with changing #species is that it's overused. There are at least three different things it's used for in the Collection hierarchy:
> - the class of the object returned by #collect: (Set's subclasses are the exception since 2000 or so, except for WeakSet, but that's really strange, and should revisited)
> - the class of the object returned by #select: and friends
> - the class used for comparison of collections

The first two are conceptually the same, and are the rationale for species.  I base that assessment on my experience with Smalltalk-80's collection library and derivatives.  species served to answer a mutable collection to hold the results of collect:, select: and reject:.  For example Interval, being a synthetic collection, cannot hold arbitrary results.  So it's species must be different (Array).

I don't see that collect:, select: and reject: are different use cases.

As far as comparison, that's johnny come lately.  In Smalltalk-80 for example Sets were incomparable, #= & hash being inherited from Object (I.e. #== & identityHash).  So species uses based on collect:, select: and reject: should have priority.


> 
> HashedCollection >> #select: uses #species. Changing IdentitySet 
>>> #species to return a Set would break #select:, because a ~~ b doesn't
> imply a ~= b. IdentitySet is the optimal class for the return value of IdentitySet >> #select:, because it's guaranteed to be able to hold all selected values. I think the same applies to all other kind of sets (and collections, because #select: is simply reducing the number of elements).
> 
> This is not the first time this discussion comes up. There was one in 2003[1], but the thread is hard to follow. And there was one in 2009[2].
> I think the conclusion was that it's better to leave the method as it is, because there's no better way to do it.
> Sure, we could use #species in #collect:, but it would break quite a lot of stuff. For example:
> 
> | k |
> k := KeyedSet keyBlock: [ :each | each first ].
> k add: #[1]; add: #[2].
> k collect: [ :each | each size ]
> 
> Currently this snippet returns "a Set(1)". Using #species in #collect: (via #copyEmpty) one would get an error, because SmallInteger does not understand #first. Changing the #species of KeyedSet would break #select:.
> 
> So IMHO when you #collect: from a set, you should always use #collect:as:. Here is a (partial) list of methods which send #collect: to a Set:
> 
> Categorizer >> #changeFromCategorySpecs:
> ClassFactoryForTestCase >> #createdClassNames
> Dictionary >> #unreferencedKeys
> MCDependencySorterTest >> #assertItems:orderAs:withRequired:toLoad:extraProvisions:
> MessageNode >> #analyseTempsWithin:rootNode:assignmentPools:
> MethodNode >> #referencedValuesWithinBlockExtent:
> SetTest >> #testCollect
> SetWithNilTest >> #runSetWithNilTestOf:
> 
> There are probably many more, but we should fix all of them. A static analyzer could help a lot.
> 
> Levente
> 
> [1] http://lists.squeakfoundation.org/pipermail/squeak-dev/2003-February/052659.html
> [2] http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-November/141016.html
> 
> 
> 
> On Wed, 26 Nov 2014, David T. Lewis wrote:
> 
>> It seems to me that Object>>species is intended to handle this sort of issue.
>> 
>> For IdentitySet, it answers what Eliot was expecting:
>> 
>>  IdentitySet new species ==> IdentitySet
>>  Set new species ==> Set
>> 
>> However, IdentitySet>>collect: does not make use of this, and it answers
>> a Set for the reasons that Levente explained.
>> 
>> Answering an Array or and OrderedCollection would not really make sense,
>> because sets are unordered collections (but Bag might be better).
>> 
>> Shouldn't we have an implementation of IdentitySet>>species that answers
>> Set (or Bag), with a method comment explaining why this is the case, and
>> with all of the collection methods using #species to answer the right
>> kind of result?
>> 
>> I note that IdentitySet>>collect: answers a Set, but IdentitySet>select:
>> sends #species and therefore answers an IdentitySet.
>> 
>> So I think that if we want the #species of an IdentitySet to be a Set,
>> then we should make it so, and give it a method comment to explain the
>> rationale. And the #collect: and #select: methods should both answer a
>> result of that #species.
>> 
>> Dave
>> 
>> 
>> On Thu, Nov 27, 2014 at 01:14:30AM +0100, Levente Uzonyi wrote:
>>> Your example hides the problem of ordering - what Tobias is asking about -
>>> so here's another:
>>> 
>>> (IdentitySet withAll: #(1 1.0)) collect: [ :each | each class ]
>>> 
>>> If IdentitySet >> #collect: were returning an Array, then what would be the
>>> answer?
>>> 
>>> { SmallInteger. Float } or { Float. SmallInteger } ?
>>> 
>>> If you really want to have the resulting collection have the same size,
>>> but avoid the problem with ordering, then what you really need is a Bag.
>>> Luckily you can have whatever you want (as long as it makes sense):
>>> 
>>> (IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as: Set.
>>> "==> a Set(1)"
>>> (IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as:
>>> IdentitySet.
>>> "==> an IdentitySet(1)"
>>> (IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as:
>>> Array.
>>> "==> #(1 1)"
>>> (IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as:
>>> OrderedCollection.
>>> "==> an OrderedCollection(1 1)."
>>> (IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as: Bag.
>>> "==> a Bag(1 1)"
>>> (IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as:
>>> IdentityBag.
>>> "==> an IdentityBag(1 1)"
>>> (IdentitySet withAll: #(1 1.0)) collect: [ :each | each asInteger ] as:
>>> Heap.
>>> "==> a Heap(1 1)"
>>> 
>>> Levente
>>> 
>>> On Thu, 27 Nov 2014, Frank Lesser wrote:
>>> 
>>>> Hi Tobias,
>>>> agree, a problem of "OrderedCollection"
>>>> not to break a lot of other things we could return an Array.
>>>> but for me collecting has priority.
>>>> Frank
>>>> 
>>>> -----Urspr?ngliche Nachricht-----
>>>> Von: squeak-dev-bounces at lists.squeakfoundation.org
>>>> [mailto:squeak-dev-bounces at lists.squeakfoundation.org] Im Auftrag von
>>>> Tobias
>>>> Pape
>>>> Gesendet: Donnerstag, 27. November 2014 00:48
>>>> An: The general-purpose Squeak developers list
>>>> Betreff: Re: [squeak-dev] IdentitySet>>collect:
>>>> 
>>>> 
>>>> On 27.11.2014, at 00:34, Frank Lesser <frank-lesser at lesser-software.com>
>>>> wrote:
>>>> 
>>>>> hmm, not convinced
>>>>> 
>>>>> (IdentitySet withAll: #(1 1.0)) collect: [:e| e asInteger ]
>>>>> OrderedCollection(1 1 )
>>>>> 
>>>>> in LSWVST ( one-to-one ), you collect results of evaluating a block on
>>>>> objects.
>>>>> 
>>>>> Frank
>>>>> maybe I am wrong ...
>>>> 
>>>> Where would the order come from for that _Ordered_Collection?
>>>> 
>>>> Best
>>>>    -Tobias
> 


More information about the Squeak-dev mailing list