New method cache, 30% faster macrobenchmarks and ineffeciencies.

Sun Dec 9 16:51:55 UTC 2001

The benchmarkings..

Its hard for me to believe it now, but I never did actually benchmark my
new methodcache on macroBenchmarks. OOPS!

Now that I've done it, I'm very impressed. I never realized that I got a
30% gain across the board. (And, maybe higher, I should probably retest
new different-sized caches and see if there's an even higher improvement)

These are on a stock 3.1a-4164 image. I am also running the minimal
changes in order to get it running with my interpreter binaries. Here are
the results:

Stock VM:                                          373 seconds.
VM with my method cache[1]:                        265 seconds.
VM with my method cache and collection changes[2]: 256 seconds.

I never benchmarked macroBenchmarks with/without my new method cache..
Oops.. I'm impressed.

My collection changes were discounted on the list and may be obsolete...
When I rebuild my next image, I'll probably omit them.

Anyways, go check it out, a few emails ago....

[1] This VM also contains some irrelevant changes to String.
[2] These include my small changes to sortedcollection, they need cleanup
by someone who knows 'the smalltalk way'(tm) better than I do. :) These
changes only help macroBenchmark1, and do not affect the others much.

--

Earlier, I reposted a changeset by Andraas, which made root table
overflows a lot cheaper. One place where that should help is when building
a new VM.

When I build a new VM, it takes about 3 minutes. 30 seconds of that time
is spent on doing 61 fullGC's. 10 of those fullGC's are caused by root
table overflows, which that patch would make a lot cheaper. The other 50
fullGC's are presumably caued by openFile.[*] Removing or making these
GC's cheaper would presumably lead to making squeak 15% faster building
interpreters (on stock images). Or faster, if most people do VM
development on larger-than-normal images.

BTW, while I'm talking about GC, I have an... interesting program, that
will encounter GC problems.

The program builds a queue, it then creates new objects, puts them on the
queue, and, after being popped off, they become garbage. The queue is long
term and thus essentially permanent. Thus, the queue will be put into
oldspace fairly quickly, and each time a new item is put on the queue,
it'll suck up one slot on the RootTable. Thus, every 2500 enqueues,
I pay a fullGC (or, with Andrase's patch, an incrGC&tenure). The current
VM makes this sort of program extremely expensive.

In this case, using Adraeas's patch, and increasing the RootTableSize by
13x, would mean I'd still pay, but it wouldn't be *that* bad. (amortized
cost, under a microsecond).

When I look over the paramaters used by the VM, they strike me as if they
were made for 4 year old computers. Why have a root table size of only
10kb? given the expense incurred when it is overflowed, why not make it
128kb?

For instance, on my P3-500 laptop, it takes under 20ms to do an incrGC, if
I do the incrGC every 100000 allocations.  I incrGC every 4000
allocations. Why not do the incrGC every 40000 allocations? This gives
many more opportunities to avoid tenuring and postpones the fullGC as long
as possible. (In the case of building a new VM, using 100k allocs/incrGC,
this tenures about 1/5 as many objects as 4k allocs/incrGC. For the other
benchmarks in macroBenchmarks, it varies from 1/6 to 1/2 as many.)

Smalltalk vmParameterAt: 5 put: 100000.
Smalltalk vmParameterAt: 6 put: 8000.
  " and "
Smalltalk vmParameterAt: 5 put: 4000.
Smalltalk vmParameterAt: 6 put: 2000.

Which differ by about 1.5% in their macrobenchmark score.

[*] I remember recently reading a post about how someone else was redoing
openFile so that it no-longer would gc afterwards. When building an
interpreter, squeak creates 50 files, and we have 50 fullGC's unaccounted
for.

Scott