Equality of Recursive Structures [Was: Closure Compiler in 3.9 ?]
Andrew P. Black
black at cs.pdx.edu
Thu May 10 06:46:53 UTC 2007
This started as a story about what I realized in the shower this
morning, and ended up as a story about what I don't know about Squeak
(again :-( )
On 8 May 2007, at 17:12, Andreas Raab wrote:
> A while ago Association>>= was changed to compare the values in
> addition to its keys which has bitten me many, many times since it
> was introduced. Most likely it is the culprit here, too.
No, Andreas, this was a different problem but an interesting one.
This is the third time in my life that i has bitten me. It's worth
writing a pattern about: comparing recursive structures.
Suppose that you have two potentially recursive data structures, and
you need to compare them — perhaps for equality, or perhaps for
dominance of one over the other. (I met this problem first when
figuring out the type conformance algorithm for Emerald in about 1984.)
To be concrete, suppose that the structures are dictionaries, which
contain simple data as keys (like symbols), and more complex data as
values — including, perhaps, other dictionaries. The obvious thing
is to walk both structures in parallel, checking that they have the
same keys, and that the values for each occurrence of a key are also
the same. It's possible that those values are also dictionaries, in
which case you execute the same = code that you were in the middle of
running ...
This is not a problem, unless: the dictionary stored in the
dictionary is the very same dictionary that you started with. In
which case, the naive recursive search will go on for ever.
I first met this in Squeak when I executed Smalltalk = Smalltalk. Of
course, Smalltalk, as the dictionary of all globals, includes
Smalltalk as a value, and the equality computation never terminated.
I filed a bug fix for that one, and it's in the image now. My fix
was to simply compare the two objects for identity first, and only
then do the recursive equality check. This, or course, answers true
ot Smalltalk = Smalltalk quite quickly.
The situation is the same with my recursive block example.
b := [(i:= i-1) > 0 ifTrue: [j := j+1 . b value.] ifFalse:[j]] .
It's compiled as a method, which has (presumably, I haven't looked) a
literal that points to itself ... and the equality comparision goes
on forever. Why Squeak was doing an equality comparision when all I
asked for was to "DoIt" is another question.
I'm deducing all this from Mathieu's fix. The identity test isn't a
complete fix, however. If you have two isomorphic recursive
structures that don't share any elements, the test will still go on
for ever.
Here is a small example: two isomorphic one-element dictionaries.
Type this in a workspace (BUT DON"T TRY THIS IN AN IMAGE THAT YOU
WANT TO KEEP USING!)
a := Dictionary new.
a at: #foo put: a.
b := Dictionary new.
b at: #foo put: b.
I expected that if I then did a printIt on a==b, I would get an
immediate "false", but if I did a printit on a=b, the system would go
off into never-land.
But actually, it's more interesting than that. The system goes off
into never-land almost regardless of what you do! Debug it a == b
will do it. So will a simple printIt of b or inspectIt of b. A doIt
of "self halt. a == b" goes into a loop /before/ executing the self
halt. If you interrupt the computation, you can look at b in the
debugger, and even launch an inspector from there:
printStringLimtedTo: does its job, as it's supposed to. The system
is trying to evaluate hash on the cyclic dictionary; hash is also
broken for the same reason. Asking for a fullStack in the debugger
to see _why_ it's trying to compute hash puts the system into another
recursive loop!
This explains why I saw the same behavior with my recursive block,
but not the root cause. Anyone have an explanation?
By the way, the real solution to the recursive equality comparison is
to maintain a cache.
A pair of elements goes into the cache when you start to comapre them
for equality. If, in the process of doing the comparision, you find
that you are asking the same equaility question again, then the
second time around, you can assume that the answer is true. This is
OK, evne though you haven't finished doing the comparison yet. If it
turns out that there is indeed a difference, the "outer" equality
will eventually find it, and will answer false. The same trick can
be employed for computing the hash. Assume that the hash of the
structure that you are computing the has over is some constant, say
0. Start computing its hash, but if you find in the process that you
need to compute the has of the same structure again, then just assume
that it is 0. The "real" hash might be thought of as the fixed-point
of this process, but since hashes are just approximations anyway,
once around the loop is enough.
The queation remains: why does the mere existence of such recursive
structures throw Squeak into a tizzy?
Andrew
More information about the Squeak-dev
mailing list
|