Arrays as Sets (Was: Behavior methods)

Travis Griggs tgriggs at keyww.com
Wed Jun 17 02:32:27 UTC 1998



Vassili Bykov wrote:

> At 12:03 AM 6/17/98 +0300, Boris G. Chr. Shingarov wrote:
> >(This is not exactly what is in IBM Smalltalk where allInstances
> >returns an OrderedCollection, contrast to Set in Squeak.  I think
> >it should be Set there - but maybe someone knows any specific
> >reason for OC??)
>
> The motivation behind the choice might be the array vs. hash table
> implementation rather than ordered vs. unordered property of the
> collection.  An arrayed collection is cheaper storage-wise (at least if you
> use it properly), and adding objects to it is faster--there is no hash code
> calculation, no search for a free slot in a hash bucket, and no rehashing
> after growing.  So if a collection is created primarily for the sake of
> iterating over it later, rather than testing for membership using
> #includes:, an OC may be a better choice than a Set.  (Actually, #includes:
> on a small OC may be faster than on a Set with same elements).

I did a bunch of testing on this a couple of months ago when I was playing the
VariableHashSet (the one that used blocks for equivalence and hashing ala
SortedCollections). I benched OrderedCollections vs. Sets with random
SmallIntegers. The break even point was about 7... that is on the average,
inclusion tests were generally faster from an array of seven or less elements vs.
a set of same size. It made me wonder if in addition to using blocks to indicate
the hash/equivalence behavior, you couldn't also do some optimization for sets
that were smaller than a certain thresholds. If I remember right, the total
percentage of Sets/Dictionarys in the system that were less than the threshold was
about 90% in a clean image.

--
Travis Griggs
Key Technology
tgriggs at keyww.com
To Smalltalk! - and Beyond!





More information about the Squeak-dev mailing list