[FIX] identityHash and fixing collections. (repost)

Scott A Crosby crosby at qwes.math.cmu.edu
Sun Dec 9 13:36:41 UTC 2001


Hello. I started a thread a few months ago about horrible unscalability in
the collections. Specifically, Set, Dictionary, etc would become extremely
inefficient in nasty ways if you attempted to store more than about 3000
objects in them. I've fixed this.

This unscalability doesn't happen in the typical image, but applications
that do try to store a lot (>3000) of items in sets/dictionaries may
encounter this problem. For dictionaries/sets with >3000 items this patch
will speed them up by 100-1000x.

The trick is to refactor identityHash into a hashBits function and a
identityHash, then rebuild all of the sets.

I've done that.

Note, IdentityHash4.1.cs may be UNSAFE to apply if you have any
identitySet or identityDictionary using an array with more than 4000
elements. In the normal image, this is not a problem. (This is because
Identity*>>scansFor: has special case code for larger those larger
arrays.)

The other nit, is that if you attempt to build a set/dictionary with over
12 million items, the same sort of unscalability may occur again. I don't
see this being a problem at the current time. :)

Anyways, file these in, IN ORDER, and wait for the rehashing (which takes
about 10 minutes). Also, this code does do deep things to the underlying
infrastructure of your image.... So use a duplicate of your image.

After this is filed in, there's quite a bit of refactoring that can be
done with {"",Weak,Identity,Pluggable}{Set,Dictionary}.

Scott


--
No DVD movie will ever enter the public domain, nor will any CD. The last CD
and the last DVD will have moldered away decades before they leave copyright.
This is not encouraging the creation of knowledge in the public domain.


-------------- next part --------------
'From Squeak3.1alpha of 7 March 2001 [latest update: #4164] on 9 October 2001 at 8:41:08 pm'!
"Postscript:
Leave the line above, and replace the rest of this comment by a useful one.
Executable statements should follow this comment, and should
be separated by periods, with no exclamation points (!!).
Be sure to put any further comments in double-quotes, like this one."

Smalltalk at: #Switchover put: false.
!

-------------- next part --------------
'From Squeak3.1alpha of 7 March 2001 [latest update: #4164] on 9 October 2001 at 8:18:19 pm'!
"Change Set:		PrefixChanges
Date:			9 October 2001
Author:			Scott Crosby

<your descriptive text goes here>"!


!ProtoObject methodsFor: 'comparing' stamp: 'sac 10/9/2001 19:42'!
hashBits
	"Answer a SmallInteger whose value is related to the receiver's identity.
	This method must not be overridden, except by SmallInteger.
	Primitive. Fails if the receiver is a SmallInteger. Essential.
	See Object documentation whatIsAPrimitive.

	Do not override."

	<primitive: 75>
	self primitiveFailed! !

!ProtoObject methodsFor: 'comparing' stamp: 'sac 10/9/2001 19:48'!
identityHash
	Switchover ifTrue:
		[^self hashBits * 4097.]
		ifFalse:
		[^self hashBits].! !


!MethodDictionary methodsFor: 'private' stamp: 'sac 10/9/2001 19:42'!
scanFor: anObject
	"Scan the key array for the first slot containing either a nil (indicating an empty slot) or an element that matches anObject. Answer the index of that slot or zero if no slot is found. This method will be overridden in various subclasses that have different interpretations for matching elements."
	| element start finish |
	start _ (anObject hashBits \\ array size) + 1.
	finish _ array size.

	"Search from (hash mod size) to the end."
	start to: finish do:
		[:index | ((element _ self basicAt: index) == nil or: [element == anObject])
			ifTrue: [^ index ]].

	"Search from 1 to where we started."
	1 to: start-1 do:
		[:index | ((element _ self basicAt: index) == nil or: [element == anObject])
			ifTrue: [^ index ]].

	^ 0  "No match AND no empty slot"! !


!SystemTracer methodsFor: 'mapping oops' stamp: 'sac 10/9/2001 19:43'!
mapAt: obj
	"Return the new oop for this object"
	| bucket |
	bucket _ oopMap at: obj hashBits+1.
	1 to: bucket size by: 3 do: 
		[:i | obj == (bucket at: i)
			ifTrue: ["Promote this entry for rapid access"
					i > 1 ifTrue: [1 to: 3 do: [:j | bucket swap: j with: i-1+j]].
					^ bucket at: 2]].
	^ UnassignedOop! !

!SystemTracer methodsFor: 'mapping oops' stamp: 'sac 10/9/2001 19:43'!
mapAt: obj put: oop with: hash
	"Assign the new oop for this object"
	| bucket |
	bucket _ oopMap at: obj hashBits +1.

	"Check for multiple writes (debug only)"
"	1 to: bucket size by: 3 do: 
		[:i | obj == (bucket at: i) ifTrue: [self halt]].
"
	oopMap at: obj identityHash+1 put: (Array with: obj with: oop with: hash) , bucket! !

!SystemTracer methodsFor: 'mapping oops' stamp: 'sac 10/9/2001 20:17'!
mapHashAt: obj
	"Return the new hash for this object"
	| bucket |
	bucket _ oopMap at: obj hashBits + 1.
	1 to: bucket size by: 3 do: 
		[:i | obj == (bucket at: i) ifTrue: [^ bucket at: i+2]].
	self halt! !

Switchover _ true. Set rehashAllSets.
!

-------------- next part --------------
'From Squeak3.1alpha of 7 March 2001 [latest update: #4164] on 9 October 2001 at 8:52:40 pm'!

!ProtoObject methodsFor: 'comparing' stamp: 'sac 10/9/2001 20:25'!
identityHash
	^self hashBits * 4097.
! !

Smalltalk removeKey: #Switchover!

-------------- next part --------------
'From Squeak3.2alpha of 3 October 2001 [latest update: #4411] on 9 December 2001 at 7:11:44 am'!

!IdentityDictionary methodsFor: 'private' stamp: 'sac 12/9/2001 07:07'!
scanFor: anObject 
	"Scan the key array for the first slot containing either a nil (indicating 
	an empty slot) or an element that matches anObject. Answer the index 
	of that slot or zero if no slot is found. This method will be overridden in 
	various subclasses that have different interpretations for matching 
	elements. "
	| finish hash start element |
	finish _ array size.
hash _ anObject identityHash.
	start _ hash \\ array size + 1.
	"Search from (hash mod size) to the end."
	start
		to: finish
		do: [:index | ((element _ array at: index) == nil
					or: [element key == anObject])
				ifTrue: [^ index]].
	"Search from 1 to where we started."
	1
		to: start - 1
		do: [:index | ((element _ array at: index) == nil
					or: [element key == anObject])
				ifTrue: [^ index]].
	^ 0"No match AND no empty slot"! !


!IdentitySet methodsFor: 'private' stamp: 'sac 12/9/2001 07:07'!
scanFor: anObject 
	"Scan the key array for the first slot containing either a nil (indicating 
	an empty slot) or an element that matches anObject. Answer the index 
	of that slot or zero if no slot is found. This method will be overridden in 
	various subclasses that have different interpretations for matching 
	elements. "
	| finish hash start element |
	finish _ array size.

		hash _ anObject identityHash.
	start _ hash \\ array size + 1.
	"Search from (hash mod size) to the end."
	start
		to: finish
		do: [:index | ((element _ array at: index) == nil
					or: [element == anObject])
				ifTrue: [^ index]].
	"Search from 1 to where we started."
	1
		to: start - 1
		do: [:index | ((element _ array at: index) == nil
					or: [element == anObject])
				ifTrue: [^ index]].
	^ 0"No match AND no empty slot"! !



More information about the Squeak-dev mailing list