Shouldn't 2 hash = 2.0 hash? [LONG]
David N. Smith (IBM)
dnsmith at watson.ibm.com
Mon Nov 29 21:26:34 UTC 1999
At 17:33 -0600 11/28/99, R. A. Harmon wrote:
>I found the following:
>
> 2 = 2.0 -> true
> 2 hash = 2.0 hash ->false
>
>I think this is a bug. Is it?
Good question. The blue book says:
"... Any two objects that are equal must return the same value for
#hash. Unequal values may or may not return equal values for hash.
Typically, this integer is used as an index to locate the object in
an indexed collection..."
[Pg 96].
(I cannot find any reference in the blue book to another hashing
rule: The hash value of an object must be constant over time. There
is yet another hashing rule that is more practical: Hash values
should be small integers.)
So, by the Blue Book definition, what you observe is a bug.
Should it be? Does it really mean that two instances of wildly
differing classes must answer equal hashes when they compare equal?
Why does the Blue Book make this restriction?
All of the discussion in the Blue Book about hashing is in the
context of looking up values in a hashed collection. Did they intend
that someone might put a float 2.0 into a hashed collection and then
look it up with an integer 2? Do people do it?
Maybe there is some other reason for this rule, but I think it might
just be incompletely stated. Consider these cases:
* An employee object compares equal using the employee number. Should
(anEmployee = 899538) answer true or purposely hash the same? I think
the hashes can be safely different. The two classes are quite
separate in intent and use, and the fact they compare equal is an
accident.
* In another implementation, employee objects are represented as
instances of classes in a hierarchy. Should aRetiredEmployee compare
equal to anActiveEmployee if, by accident, they have the same
employee number? Sure! Should they have the same hash value? Of
course. The classes here are closely related and should follow the
rule. In fact, making a new employee and then trying to find it in a
dictionary of existing employees would find a nasty bug.
So, maybe the rule should state something about closely related
objects answering the same hash, but I don't see a good and simple
way to say it.
Are subclasses of Number closely related enough that they should
follow the rule?
Most Smalltalk systems do what you expect.
In order to keep hash fast, IBM Smalltalk simply truncates all floats
to integers and use the hash of that. It's an ugly solution, IMHO.
That means, for example, that a dictionary which uses floats as keys
may have just a few unique hash values for a large number of entries;
it can end up doing a sequential search instead of a simple probe
with the hash.
One could make the hash operation more complex, so that any non-zero
fractional part is used in the hash, but then the hash operation
slows down. Many systems do that and have a slower floating point
hash, harmful or not.
There are similar problems with scaled decimal values.
So, should the rule be followed even if it's more expensive to do so?
Are subclasses of Number closely related enough that they should
follow the rule regardless? Are the 'same' values represented as
different classes ever used as equivalent keys in a hashed collection?
In my view, and assuming I'm not missing something obvious, the answer is no.
I'd argue that having (2.0 = 2) answer false is better than forcing
the hashes to be equal. Besides, comparing floats is a sin one should
not encourage. I'd rather see #= answer false and that some other
method be used for 'has equivalent value'.
While I did write that floating-point hash method, Squeak didn't
follow the rule before I wrote it, which is why I didn't make the new
version follow it either.
But you did ask a good question, and I didn't really answer it.
Anyone have a better answer?
Dave
_______________________________
David N. Smith
IBM T J Watson Research Center
Hawthorne, NY
_______________________________
Any opinions or recommendations
herein are those of the author
and not of his employer.
More information about the Squeak-dev
mailing list
|