collect: vs. to:do:

Fri Oct 20 19:28:02 UTC 2000

> -----Original Message-----
> From: Dan Winkler [mailto:fendidog at YAHOO.COM]
> Sent: Friday, October 20, 2000 2:09 PM
> To: squeak at cs.uiuc.edu
> Subject: Re: collect: vs. to:do:
> 
> 
> Thank you.
> 
> That's interesting that collect: returns different classes of 
> collections based
> on the class of the source collection.  When would you want 
> that behavior?

Interesting question.  Let's turn it around and ask, "When would you *not*
want that behavior?", or perhaps "What's the alternative to that behavior?".
If #collect: doesn't answer a collection of the same class as the collection
to which you sent the message, then what class should be answered?  Doing it
otherwise seems to imply that the #collect: message should declare, by fiat,
that some collection class is The Right Kind of class to answer.  This
presupposes that all types of collections are known, and that all types of
collections are compatible with the RightKindOfCollection class.  It would
seem that such a class would have to offer the uniqueness of a Set, the
lookup capabilities of a Dictionary, the rapid indexed access of an Array,
the dynamic expansion of an OrderedCollection, the sorting capabilities of a
SortedCollection, etc.  That's a whole lot to ask of any one class.

> I see in this particular method, the next line allocates and 
> Array explicitly. 
> Is there a reason we want to let collect: decide the class of 
> collection in the
> first line but use an Array in particular in the second line?  
> 
> 	allText _ pages collect: [:pg | OrderedCollection new].
> 	allTextUrls _ Array new: pages size.

Well, it looks like the two collections are being used for different
purposes.  In the first line it appears the method wants to allocate a
collection of OrderedCollections to hold the text from each of the pages.
In the second line it looks like an Array the same size as the pages
collection is being allocated to hold URL's.

> Would you say we should change the second line to:
> 
>         allTextUrls _ pages collect: [:pg | nil].

"Should"?  That's tough to say.  You *could* do that but from the fragment
posted there's no way to determine how the allTextUrls collection is used,
and usage can be important.  If there's a lot of access by index (e.g.
"someVar := allTextUrls at: index") then an Array is probably a very good
class to use, and the class of the pages collection may not work as well.

> As an old C programmer I'd be tempted to go the other way and 
> force them both
> to be Arrays for compactness since I know they're not going to grow.

You could do something like

	allText := Array new: pages size.
	1 to: allText size do: [ :i | allText at: i put: OrderedCollection
new ].

or, equivalently,

	allText := (Array new: pages size) collect: [ :pg |
OrderedCollection new ].

While both of these will produce equivalent results, they're more complex
and less clear.  The latter could be justified, I suppose, if you found a
performance bottleneck with the use of the allText collection and needed
faster access into the collection by index, but to write something like that
if you don't need to seems to be a case of premature optimization.  I'd say
stick with the simplest thing that could possibly work, which is what the
code here has already done.

And as another old C programmer I'd say

	Use the Force...err, I mean, the Collection protocol...Dan!

:-)

Bob Jarvis
Compuware @ Timken