[Vm-dev] goto instruction with Cog VM
roniesalg at gmail.com
Mon Nov 3 06:15:37 UTC 2014
Is this a request for a jump table? (switch statement in C)
2014-11-03 2:48 GMT-03:00 Ralph Boland <rpboland at gmail.com>:
> >> I am working on a parser generator tool (a replacement for SmaCC) and
> >> one of the things I a m interested in is the direct translation of
> >> language specifications into the virtual machine code (SmaCC and my
> >> current version of it use Squeak as the target language).
> > First, a different approach than compiling to Smalltalk is to compile to
> > at Cadence, translating an AST into a Squeak compiler parse tree for code
> > generation. Targeting a parse tree gives you much more freedom; you can
> > express things that aren't expressible in Smalltalk. And if you target
> > bytecodes you can do even more.
> I never considered using a parse tree as the target. An interesting idea
> which in many instances may be the best approach. But for my regular
> example I would still want to generate byte codes. In any case I wouldn't
> want to
> restrict users of my parser generator tool to any one of the three options
> (Smalltalk code,
> parse tree, byte code). It is my responsibility to make all three options
> as easy and
> efficient as reasonably possible for users of the parser generator tool.
> Haven't put
> much thought into this yet though. So far Smalltalk (Squeak) is the only
> >> One of the problems I have is that, for some languages, the natural
> >> translation
> >> into VM code uses computed gotos.
> >> There are two scenarios here:
> >> 1) goto X where X is a variable.
> >> 2) goto (coll at: y) where coll is a Collection.
> > There are several ways of implementing this without computed bytecodes in
> > the instruction set, but there is also the possibility of implementing it
> > directly in the instruction set.
> > Off the top of my head one could
> > - map to perform: using some mangled selector. Yes this is problematic
> > because one has to split the scope between the two methods, so in general
> > it's not a solution
> Doesn't appeal to me.
> > - map to a case statement, which is what Squeak does. Just map it to a
> > sequence of compare and branches. Or better still, to a binary tree.
> > Coincidentally this is used by the JIT to implement block dispatch in
> > methods that contain more than one block. I know of other VM
> > implementations using it for PIC dispatch with really good performance.
> I don't know what you mean my Squeak mapping to a case statement since
> there is no case statement in Squeak/Smalltalk and I can't off hand think
> where one is needed (some Squeak users might feel they need one but that
> is a
> different matter). The use of compare and branches might be OK in some
> but a mess for the finite state machines generated from regular
> Actually, even with computed gotos FSMs are somewhat messy but without
> it's worse. I don't know what 'PIC dispatch' is.
> To use a binary tree don't I need some kind of computed goto for when I
> a leaf of the tree????
> > - use thisContext pc: value.
> This would be a possibility for me to experiment with for now. When I
> have a working
> parser generator tool I could campaign for my computed goto instructions
> to be added
> to the VM.
> > This /should/ be fine in the stack VM, but
> > slooooow in the JIT because internally mapping bytecode pcs to machine
> > pcs is slow, and currently slower still because the frame will be
> > to a pure context and then converted back into a frame on the return from
> > pc:. But this solution isn't to be rejected out-of-hand. It can be
> > optimized to avoid the frame conversion and the JIT might be able to
> > optimize it.
> I assume that if computed gotos were used the translation to machine code
> would require a direct
> mapping of (virtually labeled) bytecode locations to machine code
> locations. I think this can be done
> in a reasonable amount of time but others such as yourself clearly
> understand the issues far better than
> I do. The dirty solution to start would be to simply not JIT the code
> that uses computed gotos.
> > The main problem is the compiler has no support for labels so
> > there would be work here.
> I don't mind doing the work but to my way of thinking "goto X" is pretty
> and is thus best handled at the VM/byte code level. Anything else is doing
> in a complicated way something that is fairly simple. Of course changing
> the VM/byte codes by even a single byte code is a major deal unless done
> when the VM/byte codes are initially created. Alas I must deal with what
> already exists. Even so, my preference is to work with the VM if at all
> >> For example, one such language is that of regular expressions, which I
> >> wish to translate into finite state machines implemented in VM code.
> >> In this case I need case 2) gotos where coll is a collection of
> >> associations, possibly a
> >> Dictionary. I also plan to write a debugger for this (and other
> >> but that is another story.
> >> I realize that the Cog VM is being built for Smalltalk (Squeak? Pharo?)
> >> for which the goto instructions are not needed and thus I assume
> >> unavailable. But there is something to
> >> viewing a virtual machine as general purpose and thus the target of
> >> multiple languages as is
> >> the case for the Java virtual machine.
> >> If the Cog VM is viewed this way then I argue there is a need for my
> >> instructions
> >> because some languages have need for them.
> >> For example, many languages have case statements. (I am all for object
> >> oriented
> >> but I would be willing to accept a case statement in Smalltalk too; the
> >> Squeak code
> >> implemented one in Squeak doesn't cut it).
> > I've occasionally thought about this for many years. A computed jump
> > be nice. Eg index an Array literal of pcs with the integer on top of
> > stack, falling through on bad type or out of range.
> This is the way I am thinking. If there are other reasons for a computed
> as well all the better.
> > Anyway, I am not arguing to Change Squeak or Smalltalk but I am arguing
> > to have my goto instructions in Cog VM. Is there any chance of this?????
> There's no chance of me spending time implementing this any time soon. I
> have too much high-priority tasks to tackle this. But I want to encourage
> you or others to have a go implementing it. It's fun!
> I understand and am willing to be the one to add one or more computed jump
> instructions, including working on the JIT code generator if needed.
> As you say it should be fun (and also educational). But
> 1) I am pretty busy now too and probably won't get to this for a year.
> 2) If I am to do this it would be great if someone can write a
> specification as to
> what is to be done. If someone can write this now that would be
> great but
> if they write it when I post that I am ready to do the work that
> would also
> be fine.
> 3) I don't want to just have my own private VM/byte codes. I want
> users of my
> parser generator tool to be able to load it into a standard
> version of Squeak
> and run it there including the possible generation of compilers
> for compiling
> their domain specific language programs into byte codes if desired.
> >> I don't know the Squeak VM or the Cog VM either but I assume these
> >> instructions don't exist because I see no need of them when the source
> >> language is
> >> Squeak or any version of Smalltalk for that matter. I also assume that
> >> there is already
> >> a full list of 256 instructions in the Cog VM and thus no room for my
> >> instructions
> >> unless some instructions are removed.
> >> Are there Cog VM instructions that are so rarely used that they could be
> >> removed without
> >> unreasonably slowing down the Cog VM interpretation of byte codes
> >> generated from Squeak source code?????
> > The current set has 3 unused bytecodes, one of which Spur uses, so
> > effectively there are two unused bytecodes.
> Levente Uzonyi in his posting pointed out that only one instruction is
> I don't like having to push the address to jump to onto the stack,
> preferring a byte
> code with an argument, but I could live with his solution if that is what
> is decided.
> In the case of goto coll at: X the address is likely to end up on top
> of the stack
> anyway so Levente's jumpToTop instruction looks good in any case.
> > The Cog VMs support multiple bytecode sets. If you look at the
> > BytecodeSets package on VMMaker you can read the class comments of the
> > BytecodeEncoder subclasses such as EncoderForSistaV1. These bytecode
> > have a few more unused bytecodes. This multiple bytecode set support is
> > better implemented in Spur where there is only one compiled method header
> > format and support for 64k literals. So let me encourage you to move to
> > Spur and to look at the Sista set. The class comment of each encoder
> > specifies the instruction set it targets.
> I am prepared to work with Spur and the Sista set. I am looking for
> someone to
> say that if I do this work that incorporating the work into Spur will be
> Ralph Boland
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Vm-dev