Incongruent Hash and all that: Reprise

Travis or Kerrin Griggs tkc at bmi.net
Fri Feb 27 09:28:00 UTC 1998


After rumeniscing (sp?) about the hash debate (which I think kind of
flowed out of the "why is/isn't Point a magnitude?" debate), I decided
to take a new spin on this whole hashing thing. For a while we ranted
about whether or not Point should be a subclass of magnitude, and if so
how it should uniformly implement < (less than). What I (and others)
seemed to come up with, is that sorting Points is domain specific. IOW,
depending on your domain, you will choose one function or another that
given two points will answer which is "less" than the other. And indeed,
SortedCollection seems to take this into mind, by using a sortBlock to
model this function. Therefore, each instance of SortedCollection can be
paramaterized for the specific Domain and use of said instance.

Then comes all this talk about how Floats hash. The ONLY place I'm aware
of that hash is important is when using objects in the Set heirarchy. If
we didn't have Sets (and it's various) subclasses, we'd never have to
say, "if thou implementest =, thou mustest implement hash!" So why not
take the same approach with the hashing function as with the sorting
function? So I did just that. The attached change set is a subclass of
Set called VariableHashSet, which adds an ivar for a hash block, and an
ivar for an equivalence block. In this way, you don't have to decide how
to implement hash in String, or Float, or Fraction, to give everyone and
anyone the best performance (etc) in every condition. Different
domains/uses can change this. This seems better to me, as it moves the
function of hashing closer to the objects that need/use it.

One of the things I was curious about, was what the performance hit
would be for adding the extra level of passing through the blocks, even
if all they did was the default hash and = behavior. There are "test"
methods on the class side which will spew results to the Transcript. The
performance hit is there (see the int case), allthough as shown with the
Float test, you can actually offset this by coming up with better
optimized algorithms. The performance hit in Squeak for using the blocks
in Squeak (Jitter) is quite a bit more significant (15-30%) than that
for doing the same in VW (about 2-3% there).

Travis Griggs
To Smalltalk! - and Beyond!



'From Squeak 1.3 of Jan 16, 1998 on 27 February 1998 at 12:54:05 am'!
Set subclass: #VariableHashSet
	instanceVariableNames: 'hashFunction equivalenceFunction '
	classVariableNames: ''
	poolDictionaries: ''
	category: 'Variable Hashing'!

!VariableHashSet commentStamp: 'TAG 2/27/98 00:54' prior: 0!
VariableHashSet extends the concept of Sets by parameterizing per instance the hash and equivalence functions used to determine placement/inclusion in the reciever.

Important note: There is an invariant found concerning the hash and equality functions, and that is that any two objects that evaluate true to the equivalence function MUST answer the same hash values when run through the hash function.

See performance methods on class side for evaluating the performance hit taken by adding the block layers. The performance hit is MUCH less significant in VW, I assume because BlockClosures (esp. closed ones) are handled much more efficiently there.

Instance Variables:

	hashFunction	<BlockContext> a single argument block that should return a hash value for the argument, see note above, defaults to [:elem | elem hash] to mimic normal sets

	equivalenceFunction	<BlockContext> a two argument block that takes two arguments and answers true/false whether they are equivalent, see note above, defaults to: [:a :b | a = b]!

!VariableHashSet methodsFor: 'private' stamp: 'TAG 2/27/98 00:07'!
equivalenceFunction: aBlock 
	equivalenceFunction _ aBlock!
]style[(28 2 19 9)f1b,f1,f1b,f1! !

!VariableHashSet methodsFor: 'private' stamp: 'TAG 2/26/98 23:08'!
hashFunction: aBlock
	hashFunction _ aBlock! !

!VariableHashSet methodsFor: 'private' stamp: 'TAG 2/24/98 19:09'!
init: aSize
	self initFunctions.
	^super init: aSize! !

!VariableHashSet methodsFor: 'private' stamp: 'TAG 2/24/98 19:10'!
initFunctions
	"setup the good old hash and equality functions"
	
	hashFunction _ [:elem | elem hash].
	equivalenceFunction _ [:a :b | a = b]! !

!VariableHashSet methodsFor: 'private' stamp: 'TAG 2/24/98 19:14'!
scanFor: anObject
	"Scan the key array for the first slot containing either a nil (indicating an empty slot) or an element that matches anObject. Answer the index of that slot or zero if no slot is found. This method will be overridden in various subclasses that have different interpretations for matching elements."
	| element start finish |
	start _ ((hashFunction value: anObject) \\ array size) + 1.
	finish _ array size.

	"Search from (hash mod size) to the end."
	start to: finish do:
		[:index | ((element _ array at: index) == nil or: [equivalenceFunction value: element value: anObject])
			ifTrue: [^ index ]].

	"Search from 1 to where we started."
	1 to: start-1 do:
		[:index | ((element _ array at: index) == nil or: [element = anObject])
			ifTrue: [^ index ]].

	^ 0  "No match AND no empty slot"! !


!VariableHashSet class methodsFor: 'performance' stamp: 'TAG 2/27/98 00:51'!
testFloatPerformance
	"VariableHashSet testFloatPerformance"
	"run N loops, each time creating a random set of Floats [0 - 1] (use siz to vary the size of the test set). For each loop, create a normal Set and VariableHashSet. Run one test where we test speed to find a known included value x times, and the same with a value that is known not to be in the set, for both set types. For the float case, make the hashFunction be multiply by some power of 10 (vary tens to play with)" 

| rnd siz set vset probe times bench vary |
rnd _ Random new.
rnd next.
siz _ 10.
3 timesRepeat: [
	set _ Set new.
	vset _ self new.
	vset hashFunction: [:elem | (elem * 50.0) truncated]. 
	[set size < siz] whileTrue: [vset add: (set add: rnd next)].
	probe _ nil.
	[probe isNil] whileTrue: [set do: [:each | (probe isNil and: [rnd next < (1.0 / siz)]) ifTrue: [probe _ each]]]. "get a random element of the test set"
	Transcript show: 'Test: '; print: set asSortedCollection asArray; cr.
	times _ 3000000 // (Time millisecondsToRun: [1000 timesRepeat: [vset includes: probe]]).
	bench _ (Time millisecondsToRun: [times timesRepeat: [set includes: probe]]).
	Transcript show: 'Inclusion of: ';print: probe; show: ' Set: '; print: bench.
	vary _ (Time millisecondsToRun: [times timesRepeat: [vset includes: probe]]).
	Transcript show: ' VSet: '; print: vary; show: ' ratio: '; print: vary asFloat / bench; cr.
	probe _ set anyOne.
	[set includes: probe] whileTrue: [probe _ rnd next].
	bench _ (Time millisecondsToRun: [times timesRepeat: [set includes: probe]]).
	Transcript show: 'Inclusion of: ';print: probe; show: ' Set: '; print: bench.
	vary _ (Time millisecondsToRun: [times timesRepeat: [vset includes: probe]]).
	Transcript show: ' VSet: '; print: vary; show: ' ratio: '; print: vary asFloat / bench; show:'
'.]! !

!VariableHashSet class methodsFor: 'performance' stamp: 'TAG 2/27/98 00:28'!
testIntPerformance
	"VariableHashSet testIntPerformance"
	"run N loops, each time creating a random set of of integers (use siz and wid to vary the size of the test set and variance of the elements). For each loop, create a normal Set and VariableHashSet. Run one test where we test speed to find a known included value x times, and the same with a value that is known not to be in the set, for both set types" 

| rnd siz wid set vset probe times bench vary |
rnd _ Random new.
rnd next.
siz _ 25.
wid _ siz * 50.
3 timesRepeat: [
	set _ Set new.
	vset _ self new.
	vset hashFunction: [:elem | elem]. "by a little speed by not sending the hash method"
	[set size < siz] whileTrue: [vset add: (set add: (rnd next * wid) truncated)].
probe _ -1.
	[set includes: probe] whileFalse: [probe _ (rnd next * wid) truncated].
	Transcript show: 'Test: '; print: set asSortedCollection asArray; cr.
	times _ 3000000 // (Time millisecondsToRun: [1000 timesRepeat: [set includes: probe]]).
	bench _ (Time millisecondsToRun: [times timesRepeat: [set includes: probe]]).
	Transcript show: 'Inclusion of: ';print: probe; show: ' Set: '; print: bench.
	vary _ (Time millisecondsToRun: [times timesRepeat: [vset includes: probe]]).
	Transcript show: ' VSet: '; print: vary; show: ' ratio: '; print: vary asFloat / bench; cr.
	probe _ set anyOne.
	[set includes: probe] whileTrue: [probe _ (rnd next * wid) truncated + 1].
	bench _ (Time millisecondsToRun: [times timesRepeat: [set includes: probe]]).
	Transcript show: 'Inclusion of: ';print: probe; show: ' Set: '; print: bench.
	vary _ (Time millisecondsToRun: [times timesRepeat: [vset includes: probe]]).
	Transcript show: ' VSet: '; print: vary; show: ' ratio: '; print: vary asFloat / bench; show:'
'.]! !





More information about the Squeak-dev mailing list