Rekursiv (was: Interval Smalltalk redux)

Fri Oct 4 22:58:31 UTC 2002

Jecel --

Nice email. I won't enumerate any of the many PARC schemes for 
dealing with fast execution of late bound languages (but consider the 
"giant hash table" ...).
      But I would like to mention the work of Phil Abrams and others 
at Stanford and elsewhere in the late 60s and early 70s for on the 
fly APL optimizations. They used algebraic manipulations of various 
kinds in a limited HW window to create pretty optimal execution 
streams (e.g. (A*B)[1 to 5] was manipulated into A[1 to 5] * B[1 to 
5] before execution). There are some very nice generalizations of APL 
ideas and algebras to polymorphic objects, many of which have not 
been exploited as well as they could and should be.

Cheers,

Alan

-------

At 6:11 PM -0300 10/4/02, Jecel Assumpcao Jr wrote:
>On Friday 04 October 2002 01:44, Ian Piumarta wrote:
>>  On Thu, 3 Oct 2002, Tim Rowledge wrote:
>>  > SUN tried at least acouple of times for java nad seem to have
>>  > failed. SOAR was not very effective, Sword/32 failed, iAPX432 was a
>>  > dog, etc etc.
>
>I thought SOAR was a great success. It was just never cleaned up and
>finished, as is typical of student projects.
>
>What iAPX432? It has been eliminated from all processor histories I can
>find. It seems that the 386 is now officially Intel's very first 32 bit
>processor. Of course, for those of us who really miss the old 432 they
>are doing IA64 ;-)
>
>>  I worked for a while on the REKURSIV project (a microcoded chipset
>>  for oo languages) that did GC, binding, etc. -- the whole kaboodle --
>>  in h/w. (Even the microcode talked about objects.)  It was hosted by
>>  a Sun3 (it plugged directly into the backplane).  It had its own
>>  language called Lingo.  While waiting for the hardware somebody
>>  implemented an interpreter in C.  By the time Lingo ran on the
>>  hardware, the interpreter (running on a Sun3) was faster than Lingo
>>  running on the hardware.
>
>"It will not do to leave a live dragon out of your plans if you live
>near one." J.R.R.TOLKIEN in The Hobbit
>
>In this case the dragon is Moore's law being ridden by the main cpu
>makers. Add to that the fact that software implementors can easily come
>up with new and brilliant ideas to make things faster. Remember:
>hardware can know the past but only the compiler can look into the
>future!
>
>That said, the Rekursiv was interesting and worth studying:
>
>   http://www.brouhaha.com/~eric/retrocomputing/rekursiv/
>
>It did have against it some bad timing - the late 1980s were not very
>kind to microcoded machines. It was an era when RISCs ruled the earth.
>I think that this became clear to the Mushroom people (Ian, do you have
>any comments about your participation in that one?):
>
>   http://www.wolczko.com/mushroom/index.html
>
>To answer Tommy Thorn's question about what would be nice to have in a
>Smalltalk cpu - the Mushroom used virtually addressed caches. That is,
>you passed to the cache the pair {ObjectID, offset} and it would return
>a word. You only have to convert the virtual address to a physical
>memory address when you have a cache miss. So you can have all kinds of
>indirections (object tables, for example) and it doesn't cost you too
>much.
>
>Another nice feature was that their generational garbage collector used
>the cache exclusively in the youngest generation. So an object could be
>created and later collected without causing a single external memory
>reference if you got really lucky. As Tim explained the three most
>important things are memory bandwidth, memory bandwidth and memory
>bandwidth. This was a neat way to get it.
>
>One thing that is annoying if you try to do a cpu to directly execute
>bytecodes is that you have two sources of information in the
>instructions: the bytecodes themselves and the literals. Most
>processors have their literals mixed in the instruction stream, so as
>you gobble it up they are right there when you need them.
>
>Another problem with direct bytecode execution is that "you do as you
>are told" and execute every single message send implied in the source
>code. Modern compilation technology eliminates most sends (Self,
>Hotspot Java and StrongTalk). In my smallest design (I am currently
>working on three different architectures and have a fourth one that is
>paused do to the lack of time) I have a one clock message send and two
>clock return (assuming the cache hits), yet the Self compiler can often
>get negative message send times! How can you compete with that? The
>negative times were obtained by inlining a message send and then using
>the new context information to optimize the inlined code so it was far
>smaller and faster than the original.
>
>About FPGAs - these didn't improve much through most of the 1990s but
>now are advancing at an impressive pace. If you can do something
>interesting with an FPGA you get to ride Moore's law with the big boys.
>A FPGA with MicroBlaze, Nios or XSOC/xr16 is very competitive in both
>price and performance with "normal" embedded processors (at least
>between the smallest PICs and the StrongARM). And if you can tweak it
>to run something like Smalltalk instead of yet another gcc driven MIPs
>clone it really becomes worth it. My smallest machine costs $15 in
>single quantities. I'll have some tiny benchmark numbers soon to put
>that into the proper perspective.
>
>BTW, what I wrote about hardware/software above supposes that the
>hardware can't inline and optimize code, eliminate message sends, etc.
>But I figured out how to do exactly that. Unfortunately this email is
>too long already.... ;-)
>
>-- Jecel


--