Dictionary>>collect: vs. Dictionary>>select: : why are theydifferent?

Peter van Rooijen squeak at vanrooijen.com
Sat Nov 15 15:00:11 UTC 2003


From: "Ned Konz" <ned at bike-nomad.com>
> going through the Dictionary protocol I ran
> into something I'd never noticed before:
>
> Dictionary>>select: returns a Dictionary
>
> while
>
> Dictionary>>collect: returns an OrderedCollection (!) of result values.
> It does seem peculiar that this should be an OrderedCollection, since the
> associations are visited by hash order.

Ned,

I agree that it seems peculiar. Maybe it has to do with (not) having a
collection that does not care about order nor about duplicates (heap,
multi-set, I don't know a term for it that really evokes its properties).

My personal opinion/taste about this issue is to regard keyed collections as
collections of their elements only. So, membership and iteration would
regard only the elements. I feel this is the simplest, most flexible, and
easiest to remember approach.

Although this approach would not make it incorrect for #select: on a
dictionary to return another dictionary, it would make simpler answers be
correct as well.

For iteration, we would have the basic ones like #do:, #select:, #collect:
mean the same as #valuesDo:, #valuesSelect:, #valuesCollect:. And then we
would have #keysDo:, #associationsDo:, #keysAndValuesDo: et cetera.

It works usefully and without surprises, with all keyed collections, of
course Dictionaries and LookupTables, but also Arrays, OrderedCollections,
Strings.

[I've been wondering if it is useful to regard all collections as keyed.
This would make (((Set with: 1 with: 2 with: 3) at: 2 put: 4 ; yourself)
asSortedCollection asArray) = #(1 3 4).

There are a few more aspects to such a rethinking of collections, of course.
It might make sense to identify the surprise generators, missing pieces,
even clear inconsistencies in the current collection libraries before
embarking on a rethink that leads to new construction.

To my mind the most important high-level criteria for a good collection
library:

A No missing important functionality (avoid coding around the limitations)
B No surprises (exchangability of collections, avoid having to remember
idiosyncrasies)
C Efficiency (avoid not being able to use because of bad performance)

I find this fascinating to think about. I'm implementing a comprehensive
no-surprises collection library for my own language, but implementing one
for Smalltalk, getting the opportunity to improve, after 25 years, on an
already insanely great design, would surely be a lot of fun, too.

Cheers,

Peter van Rooijen
Amsterdam

> d _ IdentityDictionary new.
> 1 to: 10 do: [ :i | d at: i put: i * 2 ].
> d collect: [ :ea | ea ]
> an OrderedCollection(2 4 6 8 10 12 14 16 18 20)
> d select: [ :ea | true ]
> an IdentityDictionary(1->2 2->4 3->6 4->8 5->10 6->12 7->14 8->16 9->18
> 10->20 )
> d reject: [ :ea | false ]
> an IdentityDictionary(1->2 2->4 3->6 4->8 5->10 6->12 7->14 8->16 9->18
> 10->20 )
>
> Can anyone say whether this is the desired behavior?
>
> Also, it looks like I can gain about 20% in speed simply, and also reduce
the
> creation of extra objects (we don't *really* need Associations in the
> dictionary, do we?):
>
> d1 _ IdentityDictionary new.
> d2 _ IdentityDictionary2 new.
> 1 to: 5 do: [ :i | d1 at: i put: i ].
> 1 to: 5 do: [ :i | d2 at: i put: i ].
>
> Smalltalk garbageCollect.
> Time millisecondsToRun: [ 100000 timesRepeat: [ d1 at: 2 ]]   165
> Smalltalk garbageCollect.
> Time millisecondsToRun: [ 100000 timesRepeat: [ d2 at: 2 ]]   136
> Smalltalk garbageCollect.
> Time millisecondsToRun: [ 1 to: 100000 do: [ :i | d1 at: i put: i ]]  852
> Smalltalk garbageCollect.
> Time millisecondsToRun: [ 1 to: 100000 do: [ :i | d2 at: i put: i ]] 678
>
>
> -- 
> Ned Konz
> http://bike-nomad.com
> GPG key ID: BEEA7EFE
>
>
>




More information about the Squeak-dev mailing list