New method cache, 30% faster macrobenchmarks and ineffeciencies.

Stephan Rudlof sr at evolgo.de
Mon Dec 10 15:47:49 UTC 2001


Scott A Crosby wrote:
> 
> On Mon, 10 Dec 2001, Stephan Rudlof wrote:
> 
> > Scott A Crosby wrote:
> > >
> > > On Sun, 9 Dec 2001, Stephan Rudlof wrote:
...
> > > Its probably not relevant. My hashed collection stuff avoids a nasty big
> > > performance degradation for when you get to >4000 elements (where
> > > performace will fall by a factor of 1000 as the algorithm devolves into a
> > > linear search).. If someone needs to store >100k objects in a hash table,
> > > they'll probably be far better served by writing a custom 'hash' function
> > > and store a 30-bit hash value in a slot.
> >
> > I think I've taken the point: spreading a 12 bit hash to *much* larger
> > values doesn't help for very large collections: it leads to big leaks in the
> > collections just reachable by linear search...
> >
> 
> Not quite. In theory, if you use it right, a hash-table where the hashes
> can collect (say) 2^12 distinct values should be about 2^12 times faster
> than a linear search.
> 
> The problem (see my earlier posts) is that large (say) 40000 element Set
> SHOULD be about 4000 times faster than a linear search on 40000 elements.
> (You can think of it as the hash value breaks things up into 4000 chains
> of about 10 elements each.).. Squeak Set's and dictionaries do NOT have
> this behaivor, ANY set over 4000 elements will always devolve into a
> linear search, when it should be about 4000 times faster than a linear
> search. The *why* is a subtle design flaw. See my earlier post.

I know. My mistake was not to say clearly by formatting, that 'I think I've
taken the point:...' was related only to the *last* sentence of the previous
paragraph. Sorry!

...

> If you want wisdom, read and understand the subtle flaw in Set. :) Its
> far more interesting than primality of hash-multipliers.
> 
> Scott

I already have this wisdom ;-)

This problem in
Set>>scanFor: anObject
	"Scan the key array for the first slot containing either a nil (indicating
an empty slot) or an element that matches anObject. Answer the index of that
slot or zero if no slot is found. This method will be overridden in various
subclasses that have different interpretations for matching elements."
	| element start finish |
	start _ (anObject hash \\ array size) + 1.
	finish _ array size.
	...

is repaired in
IdentitySet>>scanFor: anObject
	| finish hash start element |
	finish _ array size.
	finish > 4096
		ifTrue: [hash _ anObject identityHash * (finish // 4096)]
		ifFalse: [hash _ anObject identityHash].
	start _ (hash \\ array size) + 1.
	...

Greetings,

Stephan
-- 
Stephan Rudlof (sr at evolgo.de)
   "Genius doesn't work on an assembly line basis.
    You can't simply say, 'Today I will be brilliant.'"
    -- Kirk, "The Ultimate Computer", stardate 4731.3




More information about the Squeak-dev mailing list