comparing Collections. Can we do better!?

Boris Gaertner Boris.Gaertner at gmx.net
Sun Dec 14 12:46:22 UTC 2003


From: Ralph Boland <rboland at porchlight.ca>

> I tried to figure out how Set did this.  Set appears to inherit method
>   hash from Collection.  Collection does a self species hash.
> Set answers that its species is itself so set hash must be called
> which of course is inherited from Collection.

Not exactly, things are slightly more compplicated:
An instance of Set answers that its species is the class Set.
So  the message  hash must be sent to the *class* Set,
not to an instance. A method hash is not implemented
on the class side of Set, so we have to go up the
inheritance chain to find one.
The inheritance chain on the class side is:

ProtoObject #()
 Object #()
  Behavior #('superclass' 'methodDict' 'format')
   ClassDescription #('instanceVariables' 'organization')
    Class #('subclasses' 'name' 'classPool' 'sharedPools' 'environment'
'category')
     ProtoObject class #()
      Object class #()
       Collection class #()
        Set class #()

Note that you change from class protocol to instance protocol
when you move from  'ProtoObject class' to  'Class'.

Going up from  Set class, the first superclass that implements hash
is Object and this is the method that is evaluated.

So  the  'self species' moves you from instance protocol to
class protocol and search for an inherited method brings you
back to the instance protocol, but serach in the instance protocol
begins with Class. The instance protocol of Collection is
not visitted andd hence wedo not have an infinite loop.

> Though a Set hash is 4 time slower than
> a HashArray hash it isn't bad for an infinite loop!!!  (explainations
> here are welcome and don't put me down too too much for being stupid;
> I just increased the performance of a method by, say, 40,000 times).
>
As you see, you take your way over the class protocol.

> Anyway, after completing the infinite loop, Collection tests if the
> (Set) has 10 or fewer elements.
> If it does then it computes the bitXor of its
> elements and combines this with the hash value.  As a result,
> for example, for 8 element sets,  HashArray hash is about 27 times
>   faster than Set hash.  Thus hashing small sets is slower than hashing
> large sets where small is 10 or fewer and large is 11 or larger.
>
> I assume that Set hash does not store a hash value and thus in my
>   application a  aSetObject = aSetObject would be expected to be slower
> than aHashSetObject = aHashSetObject, possibly by a large amount.
>

I do not fully understand this argument. Do you assume that evaluation
of the method #= involved the evaluation of hash ? This assumption is
not correct. Set operation require the evaluation of both methods.
When you put your object into a Set and ask
  (mySet includes: myObject)
a faster hash will speed up your program, especially when you look
for instances that are not in 'mySet'. (The wonderful property
of Sets is that with *good luck* the computation of a hash
value is sufficient to say: 'no this object is not included.' This
is the situation where your improvements will pay off most.)

You will certainly receive other reactions. I think that
interest in fast hash function is great and a lot of subscribers
to our list will eagerly experiment with your code - I too.

Greetings, Boris




More information about the Squeak-dev mailing list