[Q] Dictionary delays

Sat Apr 13 03:28:00 UTC 2002

On Fri, 12 Apr 2002, Harry Chesley wrote:

> In the project I'm working on, I use several very large
> IdentityDictionary's (on the order of a half million entries each). The
> processing steps that fill them up start out taking about 1ms (timed
> 1000 at a time with Time millisecondsToRun:) but as things fill, the
> time increases to about 10ms in the end. I thought I could speed things
> up by allocating large dictionaries to start with (IdentityDictionary
> new: 500000 rather than just IdentityDictionary new). But instead, it
> slowed things down, with processing steps now starting out taking 4ms,
> and still increasing over time.

I dealt with these issues a few months ago...

Try running your code under profiling, but you'll probably see most of the
time being spent on IdentityDictionary>>scanFor:

Focus expecially on number (and time each) for incrgc's and fullGC's. You
may be hitting GC issues, like (see below)

>
> This seems to imply that the dictionary is getting copied. I had thought
> that dictionaries only got copied when they needed to expand, which
> shouldn't be happening with the larger initial size. Or during the
> garbage collection process, which I didn't think it would be a recurring
> or at least frequent thing.

Dictionaries are only copied when they expand..

Your array is set as a root, which means that every incrGC, (4000 allocs)
it scans scans the entire array for references into newspace. That could
be expensive when you have several 1M-slot arrays. Try increasing the
number of allocs between incrGC's. See #vmParameterAt, and its friends and
invokers.  (Given what you say, I suspect that this is the cause.)

There may also be a root-table-overflow problem. (which can by learned by
either excessive numbers of fullGC's or looking at the appropriate
vmParameter.)  There is a patch for thist that may go in some day.

>
> Does anyone understand why I'm getting the behavior I am? And more
> importantly, if there's any way to speed things up? Besides the question

Add an explicit #hash slot onto your object, and initialize it with a
random number upon object initialization, then have #hash return that
number, and use a normal Dictionary.

Keep in mind that the way dictionaries are defined, you only have 12 hash
bits, so the max gain is 4096x.. A dictionary with a million elements that
uses only those bits will be only 4096x faster than a linked list. Not a
million times faster. (unless you add your own hash bits artificially by
making your own slot).

> of the dictionary size increase not helping, I'm not sure I understand
> why my processing steps slow down over time. I would expect some slow
> down as the dictionary becomes denser, but I'm not sure a factor of ten
> is reasonable, though some of it could be due to my program, not the
> dictionary.

Run it under profiling; thats the most important thing. Without that,
everything I say is conjecture.

Scott