AW: [squeak-dev] IdentitySet>>collect:

Ben Coman btc at openInWorld.com
Sat Nov 29 03:42:00 UTC 2014


Levente Uzonyi wrote:
> The problem with changing #species is that it's overused. There are at 
> least three different things it's used for in the Collection hierarchy:
> - the class of the object returned by #collect: (Set's subclasses are 
> the exception since 2000 or so, except for WeakSet, but that's really 
> strange, and should revisited)
> - the class of the object returned by #select: and friends
> - the class used for comparison of collections

So should there be #collectSpecies ?
Which defaults to "^self species" and is overidden as required.

In the case of Sets, you start with an "unordered collection without 
duplicates", but since #collect can introduce duplicates, you end up 
with "an unordered collection with duplicates" --> Bag (per Dave's 
suggestion)

So I think its reasonable that the following two snippets should produce 
the same result.

oc := OrderedCollection newFrom: #( 1 2 3 4 5 ).
(oc collect: [ :e | e even ]) occurrencesOf:  true. "--> 2"

s := Set newFrom: #( 1 2 3 4 5 ).
(s collect: [ :e | e even ]) occurrencesOf:  true.  "--> 2"


which you get with the following change...

Set>>collectSpecies
	^Bag

Set>>collect: aBlock
	| result |
	result := self collectSpecies new: self size.
	array do: [:each | each ifNotNil: [result add: (aBlock value: each 
enclosedSetElement)]].
	^ result



> 
> HashedCollection >> #select: uses #species. Changing IdentitySet
>>> #species to return a Set would break #select:, because a ~~ b doesn't 
> imply a ~= b. IdentitySet is the optimal class for the return value of 
> IdentitySet >> #select:, because it's guaranteed to be able to hold all 
> selected values. I think the same applies to all other kind of sets (and 
> collections, because #select: is simply reducing the number of elements).
> 
> This is not the first time this discussion comes up. There was one in 
> 2003[1], but the thread is hard to follow. And there was one in 2009[2].
> I think the conclusion was that it's better to leave the method as it 
> is, because there's no better way to do it.
> Sure, we could use #species in #collect:, but it would break quite a lot 
> of stuff. For example:
> 
> | k |
> k := KeyedSet keyBlock: [ :each | each first ].
> k add: #[1]; add: #[2].
> k collect: [ :each | each size ]


That works with the above change to produce --> a Bag(1 1)

> 
> Currently this snippet returns "a Set(1)". Using #species in #collect: 
> (via #copyEmpty) one would get an error, because SmallInteger does not 
> understand #first. Changing the #species of KeyedSet would break #select:.
> 
> So IMHO when you #collect: from a set, you should always use 
> #collect:as:. Here is a (partial) list of methods which send #collect: 
> to a Set:
> 
> Categorizer >> #changeFromCategorySpecs:
> ClassFactoryForTestCase >> #createdClassNames
> Dictionary >> #unreferencedKeys
> MCDependencySorterTest >> 
> #assertItems:orderAs:withRequired:toLoad:extraProvisions:
> MessageNode >> #analyseTempsWithin:rootNode:assignmentPools:
> MethodNode >> #referencedValuesWithinBlockExtent:
> SetTest >> #testCollect
> SetWithNilTest >> #runSetWithNilTestOf:
> 
> There are probably many more, but we should fix all of them. A static 
> analyzer could help a lot.
> 
> Levente
> 
> [1] 
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2003-February/052659.html 
> 
> [2] 
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-November/141016.html 
> 
> 
> 
> 
> On Wed, 26 Nov 2014, David T. Lewis wrote:
> 
>> It seems to me that Object>>species is intended to handle this sort of 
>> issue.
>>
>> For IdentitySet, it answers what Eliot was expecting:
>>
>>   IdentitySet new species ==> IdentitySet
>>   Set new species ==> Set
>>
>> However, IdentitySet>>collect: does not make use of this, and it answers
>> a Set for the reasons that Levente explained.
>>
>> Answering an Array or and OrderedCollection would not really make sense,
>> because sets are unordered collections (but Bag might be better).

Bag is better.

>>
>> Shouldn't we have an implementation of IdentitySet>>species that answers
>> Set (or Bag), with a method comment explaining why this is the case, and
>> with all of the collection methods using #species to answer the right
>> kind of result?
>>
>> I note that IdentitySet>>collect: answers a Set, but IdentitySet>select:
>> sends #species and therefore answers an IdentitySet.
>>
>> So I think that if we want the #species of an IdentitySet to be a Set,
>> then we should make it so, and give it a method comment to explain the
>> rationale. And the #collect: and #select: methods should both answer a
>> result of that #species.

I think the species of #collect: and #select: are intrinsically 
different. #collect: is a transformation.

>>
>> Dave
>>
>>
>> On Thu, Nov 27, 2014 at 01:14:30AM +0100, Levente Uzonyi wrote:
>>> Your example hides the problem of ordering - what Tobias is asking 
>>> about -
>>> so here's another:
>>>
>>> (IdentitySet withAll: #(1 1.0)) collect: [ :each | each class ]
>>>
>>> If IdentitySet >> #collect: were returning an Array, then what would 
>>> be the
>>> answer?
>>>
>>> { SmallInteger. Float } or { Float. SmallInteger } ?

-->  a Bag(SmallInteger Float)

>>>
>>> If you really want to have the resulting collection have the same size,
>>> but avoid the problem with ordering, then what you really need is a Bag.

To me that makes the most sense.

>>>
>>> On Thu, 27 Nov 2014, Frank Lesser wrote:
>>>
>>>> Hi Tobias,
>>>> agree, a problem of "OrderedCollection"
>>>> not to break a lot of other things we could return an Array.
>>>> but for me collecting has priority.
>>>> Frank
>>>>
>>>> -----Urspr?ngliche Nachricht-----
>>>> Von: squeak-dev-bounces at lists.squeakfoundation.org
>>>> [mailto:squeak-dev-bounces at lists.squeakfoundation.org] Im Auftrag von
>>>> Tobias
>>>> Pape
>>>> Gesendet: Donnerstag, 27. November 2014 00:48
>>>> An: The general-purpose Squeak developers list
>>>> Betreff: Re: [squeak-dev] IdentitySet>>collect:
>>>>
>>>>
>>>> On 27.11.2014, at 00:34, Frank Lesser 
>>>> <frank-lesser at lesser-software.com>
>>>> wrote:
>>>>
>>>>> hmm, not convinced
>>>>>
>>>>> (IdentitySet withAll: #(1 1.0)) collect: [:e| e asInteger ]
>>>>> OrderedCollection(1 1 )
>>>>>
>>>>> in LSWVST ( one-to-one ), you collect results of evaluating a block on
>>>>> objects.
>>>>>
>>>>> Frank
>>>>> maybe I am wrong ...
>>>>
>>>> Where would the order come from for that _Ordered_Collection?

An unordered OrderedCollection --> Bag

cheers -ben


More information about the Squeak-dev mailing list