CPU running smalltalk bytecode

Mon Feb 11 00:50:20 UTC 2002

Jecel Assumpcao Jr <jecel at merlintec.com> is claimed by the authorities to have written:

> Tim might be right about a 600 MHz ARM 10, but I am going to see for 
> myself.
Good for you; for anyone that doesn't know, Jecel is one of the rather
small number of people that could be said to have a chance of doing
this.  I'm certainly willing to imagine that an fpga attack could do
something very interesting, the dang things have become very fast and
very big.

Of course, there are still little problems like dealing with garbage
collection and primitive stuff; making a hardware interpreter does
nothing for these important areas.

My take on things is that a possible and practical change in hardware
that would benefit us (and many programs) would be an instruction cache
that was precisely controllable by the programmer. A 2-4Mb i-cache that
one could actually load the core vm into and _lock_ it in would be nice.
An improvement on that might be to go back to the writable control store
idiom, putting the vm 'above the bus'. A controllable d-cache might be
useful in letting us make sure that recent contexts and important
globals stay cached, stuff like that.

However as I've said again and again (redundantly even), it's bandwidth,
bandwidth and bandwidth.

You can increase 'real' bandwidth by making the machine faster - memory
bus mainly. That is the 'purest' approach in a sense. I'm gut-level sure
that a really simple 600MHz cpu with 600MHz memory would outperform a
multi-GHz cpu with133MHz memory and caches and burstmode and and and,
plus be much simpler (no caches to worry about, maybe no registers
even). Sadly we can't buy such memory at Fry's. Yet :-) Watch for MRAM.
Why can't we have an ARMcore and 128Mbytes on a single chip!

You can increase 'apparent' bandwidth with caches, burst read memory,
writeback buffers, bigger register sets, whatever. This makes the
machine appear to be faster much of the time but introduces all sorts of
uncertainties and complications - a cache miss can be very expensive if
you're unlucky or incompetent. I suppose we could throw in parallel
processing here, though it could also go above.

You can increase 'needed' bandwidth with software trickery to make
better use of what you have. For us, a dynamic translator or possibly
full native compilation would serve well. Instead of popping and pushing
(sounds like drug dealing...) we optimize to storing things in registers
most of the time, caching decoded oops (and making sure to cope with a
gc!), squidging converse actions together, all that good stuff.

tim
-- 
Tim Rowledge, tim at sumeru.stanford.edu, http://sumeru.stanford.edu/tim
Brain fried; core dumped.