hash and equal problem

nicolas cellier ncellier at ifrance.com
Mon Mar 27 19:44:42 UTC 2006


Hi all,

i do not think that this behavior is correct
  (2 = 2.0) = true.
  (2 hash = 2.0 hash) = false.

i saw a discussion about it in an old old thread (1999), but did not 
understand the rationale for such a behavior.

i can exhibit a lot of examples of code completely broken by this behavior.

Try this one:

  | aSet |
  aSet := Set new.
  aSet add: #( 1.0 2 3 4 5 6).
  aSet add: #( 1 2.0 3 4 5 6).
  aSet add: #( 1 2 3.0 4 5 6).
  aSet add: #( 1 2 3 4.0 5 6).
  aSet add: #( 1 2 3 4 5.0 6).
  aSet add: #( 1 2 3 4 5 6.0).
  aSet size

Then, try this slight modification:

  | aSet |
  aSet := Set new: 12.
  aSet add: #( 1.0 2 3 4 5 6).
  aSet add: #( 1 2.0 3 4 5 6).
  aSet add: #( 1 2 3.0 4 5 6).
  aSet add: #( 1 2 3 4.0 5 6).
  aSet add: #( 1 2 3 4 5.0 6).
  aSet add: #( 1 2 3 4 5 6.0).
  aSet size

You will get 6 or 5 depending on initial Set capacity...

Secondly, Array hash, Matrix hash, WideSstring hash are hashing each and every 
element... Sawing this, I imagine one never store 100x100 Matrices in a 
Dictionary, unless prepared to spend a lot of time at coffee machine...

In other words, finding a good balance between long to run hash algorithm and 
too many identical hash values is difficult. I am not sure that the 
exhaustive way is a good balance.

The bet that string will be short is maybe not the worst one (unless you store 
whole sentence like i saw in translation), but betting that every array in 
Smalltalk will be short is nonsense.

Try this one if you are not convinced:
  (SequenceableCollection allSubInstances collect: [:e | e size]) asBag
     valuesAndCounts associations asSortedCollection: [:a :b | a key > b key]

In a 3.9a image, i have an average of 74 and a maximum over the millon.

So i raise the question again. Should we really let above behavior exist in 
squeak ?
Shouldn't we look for more efficient hashing ?
I'm asking for a good rationale (compatibility issues included).




More information about the Squeak-dev mailing list