On Mon, May 28, 2012 at 11:13 PM, Frank Shearar <frank.shearar@gmail.com> wrote:
On 28 May 2012 17:43, Colin Putney <colin@wiresong.com> wrote:
>
> On 2012-05-28, at 7:59 AM, Frank Shearar wrote:
>
>>> That would be a nice property, but hash functions are not injective. If they
>>> were, then either the codomain would be too large (not SmallInteger in our
>>> case, which makes hashing impractical) or there were no need for the use of
>>> hashing at all, since there were no collisions.
>>
>> Sure, collisions mean that you can have a ~= b and yet a hash = b
>> hash. Nevertheless, CompiledMethod define: #hash as: [^ 1] satisfies
>> the above test, so on its own it's insufficient. I wouldn't ask for
>> testing CM x CM  - {CompiledMethods whose hashes collide} !
>
> So, essentially, you want a test that ensures that we have a high-quality hash function. That would be nice, but I'm not sure it fits into the structure of a unit test. Porting Andres' hash function tools to Squeak would probably be the best way to do that. Short of that, I'd suggest a simple smoke test - say, asserting that there are few collisions between the methods of TestCase or something like that.

I just want a test demonstrating that, even if it's just for carefully
constructed CompiledMethods, sometime cmA hash ~= cmB hash when cmA ~=
cmB. At the moment the test thoroughly demonstrates half of the hash's
behaviour - when things are =, their hashes are =.

I agree that demonstrating the collision rate of the hash function is
beyond the scope of a unit test.

How about adding a count that e.g. demands that the number of distinct hashes is better than half the number of distinct CompiledMethods.  e.g.

testHash
| ai |
ai := CompiledMethod allInstances.
ai do:
[:a|
ai do:
[:b|
a = b ifTrue: [self assert: a hash = b hash]]].
self assert: (ai collect: [:cm| cm hash]) asSet size * 2 >= ai asSet size
--
best,
Eliot