[Vm-dev] stack vm questions

Thu May 14 17:43:48 UTC 2009

2009/5/14 Jecel Assumpcao Jr <jecel at merlintec.com>:
>
> Thanks Eliot and Igor for your comments!
>
> Eliot is right that what Igor wrote is a good way to do a JIT but the
> problem is that my hardware is essentially an interpreter and must deal
> at runtime with expressions that the JIT would optimize away.
>
yes, i was proposed it for JIT only. For bytecode interpteter such
separation really could be much less effective.

What is bad, that to test the idea (how much its more [not]effective
comparing to single stack) it requires a huge implementation effort.
So, its a bit risky spend many hours changing the code generator &
data formats only to discover that in practice, it gives much less
significant benefits than expected :)

> My design is described in:
>
> http://www.squeakphone.com:8203/seaside/pier/siliconsqueak/details
>
> What is missing from that description is what the control registers (x0
> to x31) do, and that depends on the stack organization. Certainly it is
> possible to have three registers as Eliot suggested. A separate control
> stack is not only used on Forth processors but was also popular in Lisp
> Machine designs.
>
> Registers t0 to t29 are really implemented through a small amount of
> hardware as a range of words in the stack cache. Having to split this
> set differently for each method would make the hardware larger and
> slower. With a separate control stack the mapping is really simple.
>
> About the JIT having to use call which pushes the PC, that is not true
> on RISC processors. But I suppose the focus for Cog is making Squeak
> fast on the x86.
>
>> There's a tension between implementing what the current compiler
>> produces and implementing what the instruction set defines.  For
>> example should one assume arguments are never written to?  I lean
>> on the side of implementing the instruction set.
>
> That is specially a good idea if the same VM ends up being used for
> other languages like Newspeak. Certainly the Java VM runs many languages
> that the original designers never expected.
>
>> Yes.  In the JIT an interpreted frame needs an extra field to hold
>> the saved bytecode instruction pointer when an interpreted frame
>> calls a machine code frame because the return address is the "return
>> to interpreter trampoline" pc.  There is no flag word in a machine
>> code frame.  So machine code frames save one word w.r.t. the
>> stack vm and interpreted frames gain a word.  But most frames
>> are machine code ones so most of the time one is saving space.
>
> Ok, so the JIT VM will still have an interpreter. Self originally didn't
> have one but most of the effort that David Ungar put into the project
> when it was restarted was making it more interpreter friendly. The
> bytecodes became very similar to the ones in Little Smalltalk, for
> example.
>
> Will images be compatible between the JIT VM and the Stack VM? Or do you
> expect the latter to not be used anymore once the JIT is available? I
> had originally understood that the Stack VM would be compatible with
> older images (since you divorce all frames on save and remarry them on
> reload) but I had missed the detail of the different bytecodes for
> variable instance access in the case of Context objects.
>
>> I guess that in hardware you can create an instruction that will
>> load a descriptor register as part of the return sequence in parallel
>> with restoring the frame pointer and method so one would never
>> indirect through the frame pointer to fetch the flags word; instead
>> it would be part of the register state.  But that's an extremely
>> uneducated guess :)
>
> Well, I am trying to avoid having a flags word even though in hardware
> it is so easy to have any size fields that you might want. I can check
> if context == nil very efficiently. For methods, t0 is the same value as
> the "self" register (x4, for example) while for blocks it is different.
> And with three pointers (fp, sp and control pointer) I shouldn't need to
> keep track of the number of arguments.
>
>> Jecel can also design the machine to avoid taking interrupts on
>> the operand stack and provide a separate interrupt stack.
>
> Hmmm... it has been a while since I designed hardware with interrupts,
> but have normally used Alto style coroutines instead. The stack cache is
> divided up into 32 word blocks and can hold parts of stacks from several
> threads at once. Checking for overflow/underflow only needs to happen
> when the stack pointer moves from one block to a different one (and even
> then, only in certain cases which aren't too common). An interesting
> feature of this scheme is that only 5 bit adders are needed (which are
> much faster than 16 or 32 bit adders, for example. Wide adders could
> reduce the clock speed or make the operation take an extra clock).
> Another detail is that having operand or control frames split among two
> stack pages is no problem at all.
>
> address of tN in the stack cache:
>
>  raddr := fp[4:0] + N.
>  scaddr := (raddr[5] ? tHigh : tLow) , raddr[4:0].
>
> When fp[5] changes value, then tLow := tHigh and tHigh := head of free
> block list (if fp was going up). If there are no free blocks, then some
> have to be flushed to their stack pages in main memory. When going down,
> tLow is loaded from a linked list, which might have to be extended by
> loading blocks from stack pages. With a 4KB stack cache, for example,
> there are 32 blocks with 32 words each and so block 0 can dedicate a
> word for each of the other 31 blocks. The bottom 7 bits of that word
> (only 5 actually needed, but it is nice to have a little room to grow)
> can form the "previous" linked list (tHigh and tLow would also be 5 bits
> wide) while the remaining bits can hold the block's address in main
> memory.
>
> This might seem far more complicated than a split arg/temp frame and it
> certainly would be if implemented in software. In hardware, it is mostly
> wires, a multiplexer and a small adder.
>
> -- Jecel
>
>

-- 
Best regards,
Igor Stasenko AKA sig.