[Hardware] display registers (was: RISC42 history)

Tue Oct 9 22:40:15 UTC 2007

Klaus D. Witzel wrote:
> > ... With agressive inlining, however, highly optimized
> > methods will need to access more registers.
> 
> I've thought over this for [howmany?] years now, time again. It is perhaps  
> possible to introduce inlineable blocks which don't need much more  
> register space. Access to outer scope could be done similiar to what the  
> B5xxx did with her display registers. And it is well known that that lad  
> runs perfectly with just 4 display registers (and only very large sisters  
> of her had surplus, $$ expensive display registers). The concept at work  
> here is D[ll] with ll=the current scope level (in our case the inlined  
> material's data). D[ll] points to D[ll-1], etc, (in our case to whom it  
> was inlined), from which data registers can be loaded. What do you think?

I suppose that by now you won't be exactly shocked if I say that I have
already done that in one of my projects? The 16 bit processor named
"Oliver" started out as a small variation on Chuck Moore's MISC Forth
processor. The instruction set was essentially the same (simple stack
machine with five bit opcodes) but it had an extra register named
"SELF". All addresses, including the PC, were relative to this register
and besides the normal call instruction there was a version that would
pop the top of the data stack into SELF. Calls saved the previous value
of SELF along with PC into the return stack, and returns naturally
restored both registers.

The details are not important, but in 2002 my clients observed that
Forth seemed a bit hard to learn while the Smalltalk I was using for my
main project would be learned by children. So they asked if I couldn't
do their project in Smalltalk too. Initially I said it wasn't possible:
the FPGA I was using for their machine only had 15 thousand gates to
keep the product really cheap (under $20) while I planned to use a 300
thousand gate FPGA for the children's computer (in contrast, my current
design uses an FPGA with 1.5 million gates!).

But after thinking about this for a month or so I saw that the only
thing missing for the OO Forth processor to run Smalltalk was some kind
of support for blocks. So I added two groups of 8 registers each which
could be efficiently saved to/reloaded from memory. The first few
registers in a group would be used exactly as display registers. So for
a block three levels deep loaded into group A, you would have the local
variables in A2 to A7 while A0 pointed to the group with the next
lexical level and A1 to the group in the level beyond that (normally the
home method). The group B registers are a scratch pad for non local
variables, so to access something in external lexical scopes you load it
into B (one instruction - it will first save the previous contents if
these were dirty) if it is not already there (the compiler is keeping
track) and just use it. I know this is more explict that the display
registers in a Burroughs machine, but it is a good fit for a MISC.

For the stack processor in Plurion I could address as many registers as
I liked using the prefix instructions to increase the operand field of
any other instruction. So the above scheme was replaced by mapping
higher lexical levels at 64 register intervals. Registers 0 to 63
(normally only 0 to 7 actually exist - you need to explicitly allocate
more in groups of 8 if you want them) are the local variables, 64 to 128
(64 to 71, most likely) are the variables in the next lexical level, 129
to 193 the second level and so on.

-- Jecel