Shouldn't 2 hash = 2.0 hash? [LONG]

agree at carltonfields.com agree at carltonfields.com
Mon Nov 29 22:41:20 UTC 1999


This is an excellent discussion.  It suggests to me that the blue book notion of "="/hash and "==" do not properly contemplate the more general notion of equality used in computing 2 = 2.0 under the present system.  The latter computation is much more than the notion of "represents the same number," which is the Smalltalk/Blue Book notion of "=," but rather is the more general notion of "after coercing both arguments to the same level of generality in a specified hierarchy, represents the same number."

I am inclined to argue that David is correct, and that either the "bug" should be tolerated (but hopefully explained), or perhaps less elegantly, that "=" should return false for 2 compared to 2.0, and replaced with a keyword comparison method, say, "numericallyEqual:" which returns true if after proper coercions to a number of appropriate generality, the results are equal.

There are even yuckier examples of this, for example, consider:

	'2' = 2

	'2.0' = 2

both of which I can conceive of an application wanting something like "numericallyEqual:' or perhaps "#=" to return true, but for which I would be willing to compromise nothing in determining hash values.

The difficulty arises, of course, when you have a class such as Numbers, where most operations are likely to involve coercion of one kind or another to a more general operation.  (For this reason, I think Large___Integer hashing must hash consistently with SmallInteger to the extent that overlapping is permitted (or at least that there should be a normalization check in the hash operator).  

Just out of curiousity, is there any real reason why Floats with integer truncations shouldn't hash to the same value as their integer counterparts?  How much slower would that really be?

I think that we have at least three layers of equality, and the Blue Book intention is more or less as follows:

	"=="			is the same object (equal oops)
	"="			represents the same object for purposes of indexing (equal hashes)
	"purposeEqual:"	for other equivalence relationships in an application domain.

We are just too well trained to believe that "=" means mathematical equality for numbers, I suppose, but we can get over it in the long run.  When faced with a reflexive, transitive and associative relationship, we should take care to consider whether we really want to 





> -----Original Message-----
> From: MIME :dnsmith at watson.ibm.com > Sent: Monday, November 29, 1999 4:34 PM
> To: squeak at cs.uiuc.edu
> Cc: squeak at cs.uiuc.edu
> Subject: Re: Shouldn't 2 hash = 2.0 hash? [LONG]
> > > At 17:33 -0600 11/28/99, R. A. Harmon wrote:
> >I found the following:
> >
> >         2 = 2.0 -> true
> >         2 hash = 2.0 hash ->false
> >
> >I think this is a bug.  Is it?
> > Good question. The blue book says:
> > "... Any two objects that are equal must return the same > value for #hash. Unequal values may or may not return equal > values for hash. Typically, this integer is used as an index > to locate the object in an indexed collection..."
> [Pg 96].
> > (I cannot find any reference in the blue book to another > hashing rule: The hash value of an object must be constant > over time. There is yet another hashing rule that is more > practical: Hash values should be small integers.)
> > So, by the Blue Book definition, what you observe is a bug.
> > > Should it be? Does it really mean that two instances of > wildly differing classes must answer equal hashes when they > compare equal? Why does the Blue Book make this restriction?
> > All of the discussion in the Blue Book about hashing is in > the context of looking up values in a hashed collection. Did > they intend that someone might put a float 2.0 into a hashed > collection and then look it up with an integer 2? Do people do it?
> > Maybe there is some other reason for this rule, but I think > it might just be incompletely stated. Consider these cases:
> > * An employee object compares equal using the employee > number. Should (anEmployee = 899538) answer true or purposely > hash the same? I think the hashes can be safely different. > The two classes are quite separate in intent and use, and the > fact they compare equal is an accident.
> > * In another implementation, employee objects are represented > as instances of classes in a hierarchy. Should > aRetiredEmployee compare equal to anActiveEmployee if, by > accident, they have the same employee number? Sure! Should > they have the same hash value? Of course. The classes here > are closely related and should follow the rule. In fact, > making a new employee and then trying to find it in a > dictionary of existing employees would find a nasty bug.
> > So, maybe the rule should state something about closely > related objects answering the same hash, but I don't see a > good and simple way to say it.
> > Are subclasses of Number closely related enough that they > should follow the rule?
> > > Most Smalltalk systems do what you expect.
> > In order to keep hash fast, IBM Smalltalk simply truncates > all floats to integers and use the hash of that. It's an ugly > solution, IMHO. That means, for example, that a dictionary > which uses floats as keys may have just a few unique hash > values for a large number of entries; it can end up doing a > sequential search instead of a simple probe with the hash.
> > One could make the hash operation more complex, so that any > non-zero fractional part is used in the hash, but then the > hash operation slows down. Many systems do that and have a > slower floating point hash, harmful or not.
> > There are similar problems with scaled decimal values.
> > So, should the rule be followed even if it's more expensive > to do so? Are subclasses of Number closely related enough > that they should follow the rule regardless? Are the 'same' > values represented as different classes ever used as > equivalent keys in a hashed collection?
> > > In my view, and assuming I'm not missing something obvious, > the answer is no.
> > I'd argue that having (2.0 = 2) answer false is better than > forcing the hashes to be equal. Besides, comparing floats is > a sin one should not encourage. I'd rather see #= answer > false and that some other method be used for 'has equivalent value'.
> > While I did write that floating-point hash method, Squeak > didn't follow the rule before I wrote it, which is why I > didn't make the new version follow it either.
> > > But you did ask a good question, and I didn't really answer > it. Anyone have a better answer?
> > > Dave
> > _______________________________
> David N. Smith
> IBM T J Watson Research Center
> Hawthorne, NY
> _______________________________
> Any opinions or recommendations
> herein are those of the author
> and not of his employer.
> > > > 





More information about the Squeak-dev mailing list