[Vm-dev] stack vm questions
siguctua at gmail.com
Thu May 14 17:43:48 UTC 2009
2009/5/14 Jecel Assumpcao Jr <jecel at merlintec.com>:
> Thanks Eliot and Igor for your comments!
> Eliot is right that what Igor wrote is a good way to do a JIT but the
> problem is that my hardware is essentially an interpreter and must deal
> at runtime with expressions that the JIT would optimize away.
yes, i was proposed it for JIT only. For bytecode interpteter such
separation really could be much less effective.
What is bad, that to test the idea (how much its more [not]effective
comparing to single stack) it requires a huge implementation effort.
So, its a bit risky spend many hours changing the code generator &
data formats only to discover that in practice, it gives much less
significant benefits than expected :)
> My design is described in:
> What is missing from that description is what the control registers (x0
> to x31) do, and that depends on the stack organization. Certainly it is
> possible to have three registers as Eliot suggested. A separate control
> stack is not only used on Forth processors but was also popular in Lisp
> Machine designs.
> Registers t0 to t29 are really implemented through a small amount of
> hardware as a range of words in the stack cache. Having to split this
> set differently for each method would make the hardware larger and
> slower. With a separate control stack the mapping is really simple.
> About the JIT having to use call which pushes the PC, that is not true
> on RISC processors. But I suppose the focus for Cog is making Squeak
> fast on the x86.
>> There's a tension between implementing what the current compiler
>> produces and implementing what the instruction set defines. For
>> example should one assume arguments are never written to? I lean
>> on the side of implementing the instruction set.
> That is specially a good idea if the same VM ends up being used for
> other languages like Newspeak. Certainly the Java VM runs many languages
> that the original designers never expected.
>> Yes. In the JIT an interpreted frame needs an extra field to hold
>> the saved bytecode instruction pointer when an interpreted frame
>> calls a machine code frame because the return address is the "return
>> to interpreter trampoline" pc. There is no flag word in a machine
>> code frame. So machine code frames save one word w.r.t. the
>> stack vm and interpreted frames gain a word. But most frames
>> are machine code ones so most of the time one is saving space.
> Ok, so the JIT VM will still have an interpreter. Self originally didn't
> have one but most of the effort that David Ungar put into the project
> when it was restarted was making it more interpreter friendly. The
> bytecodes became very similar to the ones in Little Smalltalk, for
> Will images be compatible between the JIT VM and the Stack VM? Or do you
> expect the latter to not be used anymore once the JIT is available? I
> had originally understood that the Stack VM would be compatible with
> older images (since you divorce all frames on save and remarry them on
> reload) but I had missed the detail of the different bytecodes for
> variable instance access in the case of Context objects.
>> I guess that in hardware you can create an instruction that will
>> load a descriptor register as part of the return sequence in parallel
>> with restoring the frame pointer and method so one would never
>> indirect through the frame pointer to fetch the flags word; instead
>> it would be part of the register state. But that's an extremely
>> uneducated guess :)
> Well, I am trying to avoid having a flags word even though in hardware
> it is so easy to have any size fields that you might want. I can check
> if context == nil very efficiently. For methods, t0 is the same value as
> the "self" register (x4, for example) while for blocks it is different.
> And with three pointers (fp, sp and control pointer) I shouldn't need to
> keep track of the number of arguments.
>> Jecel can also design the machine to avoid taking interrupts on
>> the operand stack and provide a separate interrupt stack.
> Hmmm... it has been a while since I designed hardware with interrupts,
> but have normally used Alto style coroutines instead. The stack cache is
> divided up into 32 word blocks and can hold parts of stacks from several
> threads at once. Checking for overflow/underflow only needs to happen
> when the stack pointer moves from one block to a different one (and even
> then, only in certain cases which aren't too common). An interesting
> feature of this scheme is that only 5 bit adders are needed (which are
> much faster than 16 or 32 bit adders, for example. Wide adders could
> reduce the clock speed or make the operation take an extra clock).
> Another detail is that having operand or control frames split among two
> stack pages is no problem at all.
> address of tN in the stack cache:
> raddr := fp[4:0] + N.
> scaddr := (raddr ? tHigh : tLow) , raddr[4:0].
> When fp changes value, then tLow := tHigh and tHigh := head of free
> block list (if fp was going up). If there are no free blocks, then some
> have to be flushed to their stack pages in main memory. When going down,
> tLow is loaded from a linked list, which might have to be extended by
> loading blocks from stack pages. With a 4KB stack cache, for example,
> there are 32 blocks with 32 words each and so block 0 can dedicate a
> word for each of the other 31 blocks. The bottom 7 bits of that word
> (only 5 actually needed, but it is nice to have a little room to grow)
> can form the "previous" linked list (tHigh and tLow would also be 5 bits
> wide) while the remaining bits can hold the block's address in main
> This might seem far more complicated than a split arg/temp frame and it
> certainly would be if implemented in software. In hardware, it is mostly
> wires, a multiplexer and a small adder.
> -- Jecel
Igor Stasenko AKA sig.
More information about the Vm-dev