Rekursiv (was: Interval Smalltalk redux)

Fri Oct 4 21:11:32 UTC 2002

On Friday 04 October 2002 01:44, Ian Piumarta wrote:
> On Thu, 3 Oct 2002, Tim Rowledge wrote:
> > SUN tried at least acouple of times for java nad seem to have
> > failed. SOAR was not very effective, Sword/32 failed, iAPX432 was a
> > dog, etc etc.

I thought SOAR was a great success. It was just never cleaned up and 
finished, as is typical of student projects.

What iAPX432? It has been eliminated from all processor histories I can 
find. It seems that the 386 is now officially Intel's very first 32 bit 
processor. Of course, for those of us who really miss the old 432 they 
are doing IA64 ;-)

> I worked for a while on the REKURSIV project (a microcoded chipset
> for oo languages) that did GC, binding, etc. -- the whole kaboodle --
> in h/w. (Even the microcode talked about objects.)  It was hosted by
> a Sun3 (it plugged directly into the backplane).  It had its own
> language called Lingo.  While waiting for the hardware somebody
> implemented an interpreter in C.  By the time Lingo ran on the
> hardware, the interpreter (running on a Sun3) was faster than Lingo
> running on the hardware.

"It will not do to leave a live dragon out of your plans if you live 
near one." J.R.R.TOLKIEN in The Hobbit

In this case the dragon is Moore's law being ridden by the main cpu 
makers. Add to that the fact that software implementors can easily come 
up with new and brilliant ideas to make things faster. Remember: 
hardware can know the past but only the compiler can look into the 
future!

That said, the Rekursiv was interesting and worth studying:

  http://www.brouhaha.com/~eric/retrocomputing/rekursiv/

It did have against it some bad timing - the late 1980s were not very 
kind to microcoded machines. It was an era when RISCs ruled the earth. 
I think that this became clear to the Mushroom people (Ian, do you have 
any comments about your participation in that one?):

  http://www.wolczko.com/mushroom/index.html

To answer Tommy Thorn's question about what would be nice to have in a 
Smalltalk cpu - the Mushroom used virtually addressed caches. That is, 
you passed to the cache the pair {ObjectID, offset} and it would return 
a word. You only have to convert the virtual address to a physical 
memory address when you have a cache miss. So you can have all kinds of 
indirections (object tables, for example) and it doesn't cost you too 
much.

Another nice feature was that their generational garbage collector used 
the cache exclusively in the youngest generation. So an object could be 
created and later collected without causing a single external memory 
reference if you got really lucky. As Tim explained the three most 
important things are memory bandwidth, memory bandwidth and memory 
bandwidth. This was a neat way to get it.

One thing that is annoying if you try to do a cpu to directly execute 
bytecodes is that you have two sources of information in the 
instructions: the bytecodes themselves and the literals. Most 
processors have their literals mixed in the instruction stream, so as 
you gobble it up they are right there when you need them.

Another problem with direct bytecode execution is that "you do as you 
are told" and execute every single message send implied in the source 
code. Modern compilation technology eliminates most sends (Self, 
Hotspot Java and StrongTalk). In my smallest design (I am currently 
working on three different architectures and have a fourth one that is 
paused do to the lack of time) I have a one clock message send and two 
clock return (assuming the cache hits), yet the Self compiler can often 
get negative message send times! How can you compete with that? The 
negative times were obtained by inlining a message send and then using 
the new context information to optimize the inlined code so it was far 
smaller and faster than the original.

About FPGAs - these didn't improve much through most of the 1990s but 
now are advancing at an impressive pace. If you can do something 
interesting with an FPGA you get to ride Moore's law with the big boys. 
A FPGA with MicroBlaze, Nios or XSOC/xr16 is very competitive in both 
price and performance with "normal" embedded processors (at least 
between the smallest PICs and the StrongARM). And if you can tweak it 
to run something like Smalltalk instead of yet another gcc driven MIPs 
clone it really becomes worth it. My smallest machine costs $15 in 
single quantities. I'll have some tiny benchmark numbers soon to put 
that into the proper perspective.

BTW, what I wrote about hardware/software above supposes that the 
hardware can't inline and optimize code, eliminate message sends, etc. 
But I figured out how to do exactly that. Unfortunately this email is 
too long already.... ;-)

-- Jecel