CPU running smalltalk bytecode

Mon Feb 11 18:36:52 UTC 2002

> > My take on things is that a possible and practical change in hardware
> > that would benefit us (and many programs) would be an instruction cache
> > that was precisely controllable by the programmer. A 2-4Mb i-cache that
> > one could actually load the core vm into and _lock_ it in would be nice.
> 
> Would it? AFAIK, the bigger the cache, the slower (in terms of bandwidth
> and/or latency) it is. Which is why we have layered caches L1, L2, L3,
> RAM, HD. Also, programmers are notoriously bad for predictions of where
> problems actually are. I'd rather profile and focus my efforts where
> they do the most good.

Once you've discovered the hotspots, wouldn't you like to be able to
load the relevant code into the i-cache?

This has worked well in the past at the RAM<->HD level: if you are
really short on RAM, then managing it manually produces better results
than LRU-based virtual memory.  (Also at the RAM<->HD level, it's pretty
clear that the best of all is to have enough RAM, and that having enough
RAM is more important than having a fast CPU.  Do a web search about
performance of Ultima 9, for example....)

> Best yet, I'd much rather have a dynamic system that learns what is the
> most important and puts that into the cache than to do it manually. For
> something the size of squeak, it'll be more accurate and faster.

Experience shows otherwise: On modern architectures, you can do better
if you play friendly with the cache architecture.  The compiler or the
programmer can figure out things that the CPU won't be able to guess at
runtime.  For example, consider these two commands on bitmaps:

	A := (B + C) / 2.
	B := B / 2.

If A, B, and C are large, then I think it's clear the simple LRU
algorithm could end up loading B twice from main memory.  A good
compiler or programmer could interleave the two loops and end up only
loading B once.  An LRU-based cache can't clean up this kind of problem
automatically.

> And the gains can be signifigant. Look at what I've done over the last few
> months... And I *know* there's another 10-15% more in this VM, over my
> methodcache, roottableoverflow, and BC. [*]

Fair, but the question is whether a specialized CPU will help *more*
than smarter code on the same CPU's.

> > An improvement on that might be to go back to the writable control store
> > idiom, putting the vm 'above the bus'. A controllable d-cache might be
> > useful in letting us make sure that recent contexts and important
> > globals stay cached, stuff like that.
> 
> With an LRU cache discipline, this is almost assured. If its used a lot,
> it'll be in the cache.

You are assuming that the working set is smaller than the cache.

Lex Spoon