[SqF]Report of VI4 Project for Feb '02

Martin McClure martin at hand2mouse.com
Sat Feb 2 03:36:54 UTC 2002


At 9:42 PM -0500 2/1/02, Scott A Crosby wrote:
>  >
>>  One more possible change: It was brought up a few months back that
>>  the number of object header bits devoted to identity hash (10 or 11
>>  bits, IIRC) wasn't enough to allow efficient hashed identity
>>  collections that were very large. It would be nice to have more bits
>
>No, those bits are misued. In essence, large collections ignore them
>entirely. If they are used, the Identity** collections can scale to tens
>of thousands or more elements without a severe degradation. (Basically,
>using Identity* will be 4000x faster than a linear scan.)
>
>Performance degradation will continue to occur, but I push the limit from
>10,000 to 500,000-2,000,000.

Maybe that's big enough for Squeak, I don't know. I guess I'm used to 
GemStone, where we didn't consider collections scalable unless they 
performed well into the hundreds of millions of objects. :-)

And more hash bits in the header would still improve performance. A 
collection of 2M objects in a system where there are only 1K unique 
hash values means that, on average, there are 2K objects with the 
same hash value as the one you're trying to access. This means a lot 
of linear probing to find the one you want. I do realize that your 
changes make this a few orders of magnitude better than it was, but 
it's still not great.

The cost of putting more hash bits in the header must also be 
considered, of course, which is why I'm not unconditionally 
advocating we make the header bigger to do this.

>
>>  Also, as was discussed back then, it would improve things to make the
>>  identity hash reported by the VM have a larger range than 0-1023,
>>  even if it can only have 1024 different values. I think I proposed
>>  multiplying the current value by 65537, a prime number of roughly the
>>  right magnitude. This wouldn't require an image format change as
>>  such, but it would require rehashing all existing hashed identity
>
>Correct, my patch does that. There's a small potential robustness problem
>in a particular edge case, but it works. New primitives should be made
>though to do the multiplication in software.
>
>>  collections, including method dictionaries. (I haven't rechecked my
>>  facts, this is all from memory, so please correct me if I've got any
>>  of the details wrong.)
>
>No, I preserve the old (flawed) behavior for method dictionaries, rather
>than rebuilding a new image. True, this means that MethodDictionaries
>should not be scaled beyond, say, 2000 or so methods. I don't consider
>this a real problem.

No, that's not a real problem, but it seemed to me that if the image 
format is changing that might give us an opportunity to use the 
cleaner solution of doing it the same way everywhere. This I *am* 
currently proposing we do.

-Martin



More information about the Squeak-dev mailing list