towards faster set operations

Ralph Boland ralph.boland at acadiau.ca
Thu Mar 11 18:55:08 UTC 2004


I am implementating a finite state machine and parsing package in Squeak
and need set operations to be fast.  In particular I need to avoid as 
many set element
comparisons during an "add:"  operation as possible since the
"elem1 = elem2" operations in my application are expensive.
     I have made a number of changes that perhaps could be made
permanent changes to Squeak .
(actually I only created my own set of parallel methods).

1)  (improvement: 2-15% for the running time of my application)
When adding an element to a set causes the set to resize the current version
of class Set  results in many set element comparisons being  performed.  
In my
version (called "addQuick:") no comparisions
of the elements of the set are done during the resize step.
Since each pair of elements in the set should return "false" if compared,
in principle there should be no functional change in how sets work by 
this change.

There can be changes though if an element in a set is altered in such a way
that it responds differently to comparisions.  For example, if a set 
resizes then
 the number of elements in the set (for the current version of Squeak)
could shrink as a result. 
Thus my proposed change could break code that relies upon this behavior.
However, I think the opinions of programmers who do this should be
ignored anyway (I am being kind and legal here and not stating what I 
really think).
Is there any chance that method "add:"  in Squeak could be modified to 
work the way
method "addQuick:" works?


2)  In order to improve performance I sometimes only partially construct 
an object before
adding it to the set.  In this circumstance a partially constructed 
object and its fully
constructed counterpart compare to equal. 
        If I find that the partially constructed object does not compare 
to equal
with any element in the set I complete the construction of the
object and then add it to the set.  If it does compare to equal to some 
element  "e" in the set
then I retrieve  "e" from the set and use it for some computations (set 
pointers to it).
        The problem here is that the method "add: obj"  for sets returns 
"obj". 
But if  "obj" compared to equal to some object  "obj2"  then I need 
access to
"obj2" not "obj".   If I need further access to "obj"  then I could write

"var1 := aSet  add:  (var2 := obj1).".

I fixed this by making method "addQuick" return "obj2" instead of "obj".
Modifying Squeak to support this behavior   is impossible I'm sure because
of backward compatibility issues (damn!). 
However a new method  "addAlt:"  could be supported.

3)  In the the above scenario, after fully constructing "obj"  I have to 
add it to
the set a second time (actually in my application  "obj" and "obj2" are 
elements
of different classes) causing a second search to occur in the set.

I fixed this by creating a method   "includes:  object  ifPresentAdd:  
addObject".
This method adds object  "addObject" to the set if  "object" is not in 
the set at the
location that  "object would be located.  Before adding  "addObject"  
the method
executes:

      "object = addObject  ifFalse:  self error:  'object ~= addObject'."

to ensure the integrity of the set.

Admittedly method  "includes:ifPresentAdd:"  is kind of weird so, 
though  it is
quite useful in my application I expect that others might object to 
making it
part of standard Squeak.

So who do I propose these changes to Squeak and what are my chances of 
getting
at least changes 1) and 3) accepted?

P.S.  If change 1)  is implemented it is  necessary to modify method
"noCheckAdd:"  in class Dictionary as well.

Ralph Boland





More information about the Squeak-dev mailing list