Shouldn't 2 hash = 2.0 hash? [LONG]

R. A. Harmon harmonra at webname.com
Tue Nov 30 23:07:12 UTC 1999


At 04:26 PM 11/29/99 -0500, David N. Smith wrote:
>At 17:33 -0600 11/28/99, R. A. Harmon wrote:
>>I found the following:
>>
>>         2 = 2.0 -> true
>>         2 hash = 2.0 hash ->false
>>
>>I think this is a bug.  Is it?
>
>Good question. The blue book says:
[snip]
>(I cannot find any reference in the blue book to another hashing 
>rule: The hash value of an object must be constant over time. There 

The ANSI doc in <Object> hash says:

        "The hash value of an object need not be temporally invariant. Two
independent invocations of #hash with the same receiver may not always yield
the same results. Note that collections that use #= to discriminate."


>So, by the Blue Book definition, what you observe is a bug.
[snip]
>Should it be? Does it really mean that two instances of wildly 
>differing classes must answer equal hashes when they compare equal? 
[snip]
>All of the discussion in the Blue Book about hashing is in the 
>context of looking up values in a hashed collection. Did they intend 
>that someone might put a float 2.0 into a hashed collection and then 
>look it up with an integer 2? Do people do it?

I was also concerned with the ANSI definition as I'm trying to produce all
the ANSI conforming messages for Squeak, and that I thought there may be an
unpleasant surprise in collections you mentioned.  I would think there are
good reason way folks might do it.


>Maybe there is some other reason for this rule, but I think it might 
>just be incompletely stated. Consider these cases:
[snip]
>So, maybe the rule should state something about closely related 
>objects answering the same hash, but I don't see a good and simple 
>way to say it.

I like the ANSI standard way as it relieves me of having to be creative, and
I get to rely on the committee's greater experience.

I also wondered if there was a specific reason Squeak does it this way, and
I thought I'd leverage off this list's greater experience.

>Are subclasses of Number closely related enough that they should 
>follow the rule?
>Most Smalltalk systems do what you expect.

                               Squeak  Dolphin VWNC
	2 = 2.0                true    true    true
	2 hash = 2.0 hash      false   true    true

How about VAST?


[Good analysis snip]
>There are similar problems with scaled decimal values.

I can't recall how I handled this in my Scaled Decimal implementation for
Squeak.

                               Squeak  Dolphin VWNC
	2 = 2.0s2              ?       true    true
	2 hash = 2.0s2 hash    ?       false   true

	2.0 = 2.0s2            ?       true    true
	2.0 hash = 2.0s2 hash  ?       false   true

This may not be Dophin's fault as I contributed my Scaled Decimal
implementation that they then modified (I haven't yet checked how much
changed).  It may be that my error may have slipped by them.

How about VAST?


>So, should the rule be followed even if it's more expensive to do so? 
[snip]
>In my view, and assuming I'm not missing something obvious, the answer is no.
>
>I'd argue that having (2.0 = 2) answer false is better than forcing 
>the hashes to be equal. Besides, comparing floats is a sin one should 
>not encourage. I'd rather see #= answer false and that some other 
>method be used for 'has equivalent value'.

That is the wisdom I've always heard, and my first impulse was that (2.0 =
2) should answer false.

As a general rule, I use the ANSI standard messages and where it doesn't
specify, I defer to other dialects where they agree, otherwise I have to
think (not a happy occurrence).  I use my general rule where possible so my
programs are at least nominally portable.

After checking on what Dolphin and VWNC does, I would vote for (2.0 = 2)
answer true and (2 hash = 2.0 hash) answer true in Squeak.  I haven't dug
into how this is achieved 



At 05:41 PM 11/29/99 -0500, agree at carltonfields.com wrote:
>This is an excellent discussion.  It suggests to me that the blue book
> notion of "="/hash and "==" do not properly contemplate the more general
> notion of equality used in computing 2 = 2.0 under the present system.
>  The latter computation is much more than the notion of "represents the
> same number," which is the Smalltalk/Blue Book notion of "=," but rather
> is the more general notion of "after coercing both arguments to the same
>level of generality in a specified hierarchy, represents the same number."
[Good analysis snip]

The ANSI doc in <Object> = says:

        "The meaning of "equivalent" cannot be precisely defined but the
intent is that two objects are considered equivalent if they can be used
interchangeably. Conforming protocols may choose to more precisely define
the meaning of "equivalent".

The value of (receiver = comparand) is true if and only if the value of
(comparand = receiver) would also be true. If the value of (receiver =
comparand) is true then the receiver and comparand must have equivalent hash
values. Or more formally:

        receiver = comparand    =>
        receiver hash = comparand hash

The equivalence of objects need not be temporally invariant. Two independent
invocations of #= with the same receiver and operand objects may not always
yield the same results. Note that a collection that uses #= to discriminate
objects may only reliably store objects whose hash values do not change
while the objects are contained in the collection."

I find this a bit more slippery than I can get my mind around.  First I
think 2 and 2.0 are interchangeable and then I think they are not.  Arrrrrrgh!


Thanks to all others for good points.

--
Richard A. Harmon          "The only good zombie is a dead zombie"
harmonra at webname.com           E. G. McCarthy
Spencer, Iowa





More information about the Squeak-dev mailing list