AW: [squeak-dev] IdentitySet>>collect:

Eliot Miranda eliot.miranda at gmail.com
Mon Dec 1 16:26:02 UTC 2014


On Fri, Nov 28, 2014 at 7:42 PM, Ben Coman <btc at openinworld.com> wrote:

> Levente Uzonyi wrote:
>
>> The problem with changing #species is that it's overused. There are at
>> least three different things it's used for in the Collection hierarchy:
>> - the class of the object returned by #collect: (Set's subclasses are the
>> exception since 2000 or so, except for WeakSet, but that's really strange,
>> and should revisited)
>> - the class of the object returned by #select: and friends
>> - the class used for comparison of collections
>>
>
> So should there be #collectSpecies ?
>

No.  That's what species was invented for in the first place.


> Which defaults to "^self species" and is overidden as required.
>
> In the case of Sets, you start with an "unordered collection without
> duplicates", but since #collect can introduce duplicates, you end up with
> "an unordered collection with duplicates" --> Bag (per Dave's suggestion)
>
> So I think its reasonable that the following two snippets should produce
> the same result.
>
> oc := OrderedCollection newFrom: #( 1 2 3 4 5 ).
> (oc collect: [ :e | e even ]) occurrencesOf:  true. "--> 2"
>
> s := Set newFrom: #( 1 2 3 4 5 ).
> (s collect: [ :e | e even ]) occurrencesOf:  true.  "--> 2"
>
>
> which you get with the following change...
>
> Set>>collectSpecies
>         ^Bag
>
> Set>>collect: aBlock
>         | result |
>         result := self collectSpecies new: self size.
>         array do: [:each | each ifNotNil: [result add: (aBlock value: each
> enclosedSetElement)]].
>         ^ result
>
>
>
>
>> HashedCollection >> #select: uses #species. Changing IdentitySet
>>
>>> #species to return a Set would break #select:, because a ~~ b doesn't
>>>>
>>> imply a ~= b. IdentitySet is the optimal class for the return value of
>> IdentitySet >> #select:, because it's guaranteed to be able to hold all
>> selected values. I think the same applies to all other kind of sets (and
>> collections, because #select: is simply reducing the number of elements).
>>
>> This is not the first time this discussion comes up. There was one in
>> 2003[1], but the thread is hard to follow. And there was one in 2009[2].
>> I think the conclusion was that it's better to leave the method as it is,
>> because there's no better way to do it.
>> Sure, we could use #species in #collect:, but it would break quite a lot
>> of stuff. For example:
>>
>> | k |
>> k := KeyedSet keyBlock: [ :each | each first ].
>> k add: #[1]; add: #[2].
>> k collect: [ :each | each size ]
>>
>
>
> That works with the above change to produce --> a Bag(1 1)
>
>
>> Currently this snippet returns "a Set(1)". Using #species in #collect:
>> (via #copyEmpty) one would get an error, because SmallInteger does not
>> understand #first. Changing the #species of KeyedSet would break #select:.
>>
>> So IMHO when you #collect: from a set, you should always use
>> #collect:as:. Here is a (partial) list of methods which send #collect: to a
>> Set:
>>
>> Categorizer >> #changeFromCategorySpecs:
>> ClassFactoryForTestCase >> #createdClassNames
>> Dictionary >> #unreferencedKeys
>> MCDependencySorterTest >> #assertItems:orderAs:withRequired:toLoad:
>> extraProvisions:
>> MessageNode >> #analyseTempsWithin:rootNode:assignmentPools:
>> MethodNode >> #referencedValuesWithinBlockExtent:
>> SetTest >> #testCollect
>> SetWithNilTest >> #runSetWithNilTestOf:
>>
>> There are probably many more, but we should fix all of them. A static
>> analyzer could help a lot.
>>
>> Levente
>>
>> [1] http://lists.squeakfoundation.org/pipermail/squeak-dev/2003-
>> February/052659.html
>> [2] http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-
>> November/141016.html
>>
>>
>>
>> On Wed, 26 Nov 2014, David T. Lewis wrote:
>>
>>  It seems to me that Object>>species is intended to handle this sort of
>>> issue.
>>>
>>> For IdentitySet, it answers what Eliot was expecting:
>>>
>>>   IdentitySet new species ==> IdentitySet
>>>   Set new species ==> Set
>>>
>>> However, IdentitySet>>collect: does not make use of this, and it answers
>>> a Set for the reasons that Levente explained.
>>>
>>> Answering an Array or and OrderedCollection would not really make sense,
>>> because sets are unordered collections (but Bag might be better).
>>>
>>
> Bag is better.
>
>
>>> Shouldn't we have an implementation of IdentitySet>>species that answers
>>> Set (or Bag), with a method comment explaining why this is the case, and
>>> with all of the collection methods using #species to answer the right
>>> kind of result?
>>>
>>> I note that IdentitySet>>collect: answers a Set, but IdentitySet>select:
>>> sends #species and therefore answers an IdentitySet.
>>>
>>> So I think that if we want the #species of an IdentitySet to be a Set,
>>> then we should make it so, and give it a method comment to explain the
>>> rationale. And the #collect: and #select: methods should both answer a
>>> result of that #species.
>>>
>>
> I think the species of #collect: and #select: are intrinsically different.
> #collect: is a transformation.
>
>
>>> Dave
>>>
>>>
>>> On Thu, Nov 27, 2014 at 01:14:30AM +0100, Levente Uzonyi wrote:
>>>
>>>> Your example hides the problem of ordering - what Tobias is asking
>>>> about -
>>>> so here's another:
>>>>
>>>> (IdentitySet withAll: #(1 1.0)) collect: [ :each | each class ]
>>>>
>>>> If IdentitySet >> #collect: were returning an Array, then what would be
>>>> the
>>>> answer?
>>>>
>>>> { SmallInteger. Float } or { Float. SmallInteger } ?
>>>>
>>>
> -->  a Bag(SmallInteger Float)
>
>
>>>> If you really want to have the resulting collection have the same size,
>>>> but avoid the problem with ordering, then what you really need is a Bag.
>>>>
>>>
> To me that makes the most sense.
>
>
>>>> On Thu, 27 Nov 2014, Frank Lesser wrote:
>>>>
>>>>  Hi Tobias,
>>>>> agree, a problem of "OrderedCollection"
>>>>> not to break a lot of other things we could return an Array.
>>>>> but for me collecting has priority.
>>>>> Frank
>>>>>
>>>>> -----Urspr?ngliche Nachricht-----
>>>>> Von: squeak-dev-bounces at lists.squeakfoundation.org
>>>>> [mailto:squeak-dev-bounces at lists.squeakfoundation.org] Im Auftrag von
>>>>> Tobias
>>>>> Pape
>>>>> Gesendet: Donnerstag, 27. November 2014 00:48
>>>>> An: The general-purpose Squeak developers list
>>>>> Betreff: Re: [squeak-dev] IdentitySet>>collect:
>>>>>
>>>>>
>>>>> On 27.11.2014, at 00:34, Frank Lesser <frank-lesser at lesser-software.
>>>>> com>
>>>>> wrote:
>>>>>
>>>>>  hmm, not convinced
>>>>>>
>>>>>> (IdentitySet withAll: #(1 1.0)) collect: [:e| e asInteger ]
>>>>>> OrderedCollection(1 1 )
>>>>>>
>>>>>> in LSWVST ( one-to-one ), you collect results of evaluating a block on
>>>>>> objects.
>>>>>>
>>>>>> Frank
>>>>>> maybe I am wrong ...
>>>>>>
>>>>>
>>>>> Where would the order come from for that _Ordered_Collection?
>>>>>
>>>>
> An unordered OrderedCollection --> Bag
>
> cheers -ben
>
>


-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20141201/71a12812/attachment.htm


More information about the Squeak-dev mailing list