Of the many interesting results mentioned in the first year report from Viewpoints (http://vpri.org/pdf/steps_TR-2007-008.pdf), there is a block diagram of a RISC processor by Chuck Thacker (Xerox Alto, MS TabletPC and others). It is very simple - 100 lines of Verilog is mentioned. Some things that are not explained are easy to figure out, like bit 24 selecting between load constant and regular instructions. The register set seems to be a flat 128 register file, like Knuth's MMIX or the Philips Trimedia. Having tried to develop a compiler for the latter, I must confess that I haven't figured out yet how to make good use of such a resource.
One of the reasons why this design is so simple is that it isn't pipelined in the traditional RISC style. Yet it does several things independently (in parallel) and so can execute one instruction per clock cycle. That clock won't be as fast as would be possible in a pipelined design, but given the fast memories available in modern FPGAs the difference won't be too bad.
There was an interesting series of articles by Jan Gray in "Circuit Cellar" (http://www.fpgacpu.org/xsoc/cc.html) which I really recommend to anyone who is interested in CPUs and FPGAs. He designed a 16 bit pipelined RISC and adapted LCC to generate code for it. The FPGAs he used in this project were older ones that didn't have any internal memory, but when newer models became available he did a CPU that was very similar but was not pipelined: http://www.fpgacpu.org/gr/index.html
None of my stack processors were pipelined (the older Tachyon design was, but it hid the complexities by multiplexing four or more threads in what is known as the "barrel execution model"- http://en.wikipedia.org/wiki/Barrel_processor). It seemed natural, however, that RISC42 should have a five stage pipeline. In fact, I took advantage of the normally invisible bypass circuit in pipelined designs to create the "cascade" instructions with register K and so reduce the pain of a two address instruction set.
I have not been very happy with the complexity due to the combination of various features. RISC42 is supposed to be educational and not just functional and fast. So it would be better for it to be a much simpler single clock design. That eliminates the K register, but R15 can fill that role pretty well (even better since it allows task switching between cascaded instructions). There are no programmer visible changes to the design except for the slower clock. FPGAs happen to be very good for pipelined designs since they have a flip-flop for every logic block, so this simplification won't save much space.
-- Jecel P.S.: thanks to a tip from Reinout Heeck I seem to have fixed the problem with my swiki (it was an "intrusion detection system" in the ADSL modem)
hardware@lists.squeakfoundation.org