Equality of Recursive Structures [Was: Closure Compiler in 3.9 ?]

Andrew P. Black black at cs.pdx.edu
Thu May 10 06:46:53 UTC 2007


This started as a story about what I realized in the shower this  
morning, and ended up as a story about what I don't know about Squeak  
(again :-( )

On 8 May 2007, at 17:12, Andreas Raab wrote:

> A while ago Association>>= was changed to compare the values in  
> addition to its keys which has bitten me many, many times since it  
> was introduced. Most likely it is the culprit here, too.

No, Andreas, this was a different problem but an interesting one.   
This is the third time in my life that i has bitten me.  It's worth  
writing a pattern about: comparing recursive structures.

Suppose that you have two potentially recursive data structures, and  
you need to compare them — perhaps for equality, or perhaps for  
dominance of one over the other.  (I met this problem first when  
figuring out the type conformance algorithm for Emerald in about 1984.)

To be concrete, suppose that the structures are dictionaries, which  
contain simple data as keys (like symbols), and more complex data as  
values — including, perhaps, other dictionaries.  The obvious thing  
is to walk both structures in parallel, checking that they have the  
same keys, and that the values for each occurrence of a key are also  
the same.   It's possible that those values are also dictionaries, in  
which case you execute the same = code that you were in the middle of  
running ...

This is not a problem, unless: the dictionary stored in the  
dictionary is the very same dictionary that you started with.  In  
which case, the naive recursive search will go on for ever.

I first met this in Squeak when I executed Smalltalk = Smalltalk.  Of  
course, Smalltalk, as the dictionary of all globals, includes  
Smalltalk as a value, and the equality computation never terminated.   
I filed a bug fix for that one, and it's in the image now.  My fix  
was to simply compare the two objects for identity first, and only  
then do the recursive equality check.  This, or course, answers true  
ot Smalltalk = Smalltalk quite quickly.

The situation is the same with my recursive block example.

	b := [(i:= i-1) > 0 ifTrue: [j := j+1 . b value.] ifFalse:[j]] .

It's compiled as a method, which has (presumably, I haven't looked) a  
literal that points to itself ... and the equality comparision goes  
on forever.  Why Squeak was doing an equality comparision when all I  
asked for was to "DoIt" is another question.

I'm deducing all this from Mathieu's fix.   The identity test isn't a  
complete fix, however.  If you have two isomorphic recursive  
structures that don't share any elements, the test will still go on  
for ever.

Here is a small example: two isomorphic one-element dictionaries.   
Type this in a workspace (BUT DON"T TRY THIS IN AN IMAGE THAT YOU  
WANT TO KEEP USING!)

a := Dictionary new.
a at: #foo put: a.
b := Dictionary new.
b at: #foo put: b.

I expected that if I then did a printIt on a==b, I would get an  
immediate "false", but if I did a printit on a=b, the system would go  
off into never-land.

But actually, it's more interesting than that.  The system goes off  
into never-land almost regardless of what you do!  Debug it a == b  
will do it.  So will a simple printIt of b or inspectIt of b.  A doIt  
of "self halt. a == b" goes into a loop /before/ executing the self  
halt.  If you interrupt the computation, you can look at b in the  
debugger, and even launch an inspector from there:  
printStringLimtedTo: does its job, as it's supposed to.  The system  
is trying to evaluate hash on the cyclic dictionary; hash is also  
broken for the same reason.  Asking for a fullStack in the debugger  
to see _why_ it's trying to compute hash puts the system into another  
recursive loop!

This explains why I saw the same behavior with my recursive block,  
but not the root cause.  Anyone have an explanation?

By the way, the real solution to the recursive equality comparison is  
to maintain a cache.
A pair of elements goes into the cache when you start to comapre them  
for equality.  If, in the process of doing the comparision, you find  
that you are asking the same equaility question again, then the  
second time around, you can assume that the answer is true.  This is  
OK, evne though you haven't finished doing the comparison yet.  If it  
turns out that there is indeed a difference, the "outer" equality  
will eventually find it, and will answer false.   The same trick can  
be employed for computing the hash.  Assume that the hash of the  
structure that you are computing the has over is some constant, say  
0.  Start computing its hash, but if you find in the process that you  
need to compute the has of the same structure again, then just assume  
that it is 0.  The "real" hash might be thought of as the fixed-point  
of this process, but since hashes are just approximations anyway,  
once around the loop is enough.

The queation remains: why does the mere existence of such recursive  
structures throw Squeak into a tizzy?

	Andrew





More information about the Squeak-dev mailing list