Hi,
I'm just a post silicon test engineer for a company that is developing a object oriented medium grain parallel processor on a chip. It's quite a bit like a FPGA, but the chip implements ALU state machines and table lookup logic. It has multiply / accumulate (dot product) units, datapath muxes and lots of memory. The kicker is it's a tagged architecture, with 5 bits riding shotgun on the 16 bits of data. That's 8 bits of tag and  2 ready bits for a real-world 32 bit object.  I've heard talk in the office that a harvard/princeton machine demo hack implementation would be nice, but  that seems really 1950's .My first thought was to build a FORTH machine, or a bunch of forth machines, But Squeak seems really attractive and it goes well with the object oriented mindset of the company and the chip.
So? do you think I should burn a couple hundred hours of my own time trying to cook this up?
BTW: this thing runs at 1 GHz and has 256 alu's,  64 macs,  tons of registers and scratch and  2X266 Mhz 40 bit ddr2 dram interfaces with 256 Mwords of store..
AKA: Mr Jones.
Just how do I implement a parallel or massivly pipeline(d) squeak machine?