[squeak-dev] Re: Why is Heap>>#species => Array?

Andrew P. Black black at cs.pdx.edu
Fri Feb 22 22:25:54 UTC 2008

On 21 Feb 2008, at 23:49, Klaus D. Witzel wrote:

> On Fri, 22 Feb 2008 06:22:32 +0100, Paolo Bonzini wrote:
>> Klaus D. Witzel wrote:
>>> Subject line says it all, check yourself,
>>>  (Heap withAll: 'array') reject: [:x | x = $r]
>>>  What's the rationale (there's no doc, no comment)? Archive shows  
>>> that #species was changed to fix another (anonymous) bug but,  
>>> this way the senders of #species can impossibly do what Smalltalk  
>>> users expect from the collection hierarchy (and there is  
>>> #asArray ...)

Klaus asked if we had any insights on this during the reengineering  
of the collection hierarchy.  The answer is yes, although my  
recollection of them may be imperfect.

We noticed that the use of #species was inconsistent between  
different classes.   We endeavored to fix these inconsistencies, to  
promote reuse of methods, and eliminate the multiplicity of slightly  
different versions.

We came to the conclusion that the #species method was intended for  
use in equality comparisons.  Two collections were equal if they had  
the same species and if they had the same elements.  Whether order  
matters depends on the species, so it matters if the species is  
Array, and not if it is Set.

However, some code used #species to answer a very different question:  
what class should I use to make a new collection like this one in a  
#collect: or a #select: ?    Sometimes this was OK, but sometimes the  
answer was different  from the answer that we got from #species.     
We decided to uniformly use two methods, emptyCopyOfSize: and  
emptyCopyOfSameSize , to generate the new collections.   
emptyCopyOfSameSize was implemented as

	^ self  emptyCopyOfSize: self size

in TCollBasicImpl.

Using this, instead of

Set>>collect: aBlock
	"Evaluate aBlock with each of the receiver's elements as the argument.
	Collect the resulting values into a collection like the receiver.  
	the new collection."

	| newSet |
	newSet _ Set new: self size.
	array do: [:each | each ifNotNil: [newSet add: (aBlock value: each)]].
	^ newSet

we get

TCollExtensibleUnsequence>>collect: aBlock
	"Evaluate aBlock with each of the receiver's elements as the argument.
	Collect the resulting values into a collection like the receiver.  
	the new collection."

	| newCollection |
	newCollection _ self emptyCopyOfSameSize.
	self withIndexDo: [:each :index |
		newCollection unsafeAdd: (aBlock value: each) possiblyAt: index].
	^ newCollection makeSafe.

These methods also show another use of polymorphism: the method  
#unsafeAdd:possiblyAt:  The problem that this addresses is that some  
collections understand #at:put: (e.g, array) and others understand  
#add: (e.g., Set).  We made _all_ collections understand  
unsafeAdd:possiblyAt:  .  The second argument to this message is a  
_suggestion_ of an index at which to add the new element; the target  
can ignore this suggestion if it wants, as with a set.  The "unsafe"  
terminology has to do with internal invariants; a sorted collection  
can implement unsafeAdd:possiblyAt: to insert the new element at an  
arbitrary position, without sorting.  The rule is that eventually  
#makeSafe will be sent, and only _after_ that message is the user  
entitled to assume that the collection invariants will be once again  
true.  So sortedCollections can use it to sort; the default  
implementation does nothing and answers self.

I think that after we were done, we came to the conclusion that the  
kind of collection that results from a collect: ought to be a  
parameter to the collect.  For example, if I do a collect: over an  
IdentitySet, should the result be another IdentitySet, a Set, or an  
Array?  Well, it depends on the function that I'm applying: there is  
no one right answer.  Similarly, with collect: on a  
SortedCollection,  should the result be an OrderedCollection, an  
Array, or a new SortedCollection?  If the latter, with what sort  
block?    One way to do this is to have a #collect:into: method where  
the second argument is a collection into which the new elements will  
be added.  It's not even necessary for it to be empty!   Another  
possibility would be to provide as argument a block that does the  
adding ... this starts to look very much like #do:  So, we never  
implemented those variants.


More information about the Squeak-dev mailing list