[Vm-dev] goto instruction with Cog VM

Ronie Salgado roniesalg at gmail.com
Mon Nov 3 06:15:37 UTC 2014


Is this a request for a jump table? (switch statement in C)


2014-11-03 2:48 GMT-03:00 Ralph Boland <rpboland at gmail.com>:

>
> >>
> >> I am working on a parser generator tool (a replacement for SmaCC) and
> >> one of the things I a m interested in is the direct translation of
> >> language specifications into the virtual machine code (SmaCC and my
> >> current version of it use Squeak as the target language).
> >>
>
> > First, a different approach than compiling to Smalltalk is to compile to
> a
> > parse tree.  We do this in the pseudo JavaScript compiler we've
> implemented
> > at Cadence, translating an AST into a Squeak compiler parse tree for code
> > generation.  Targeting a parse tree gives you much more freedom; you can
> > express things that aren't expressible in Smalltalk.  And if you target
> > bytecodes you can do even more.
>
>
> I never considered using a parse tree as the target.  An interesting idea
> which in many instances may be the best approach.  But for my regular
> expression
> example I would still want to generate byte codes.  In any case I wouldn't
> want to
> restrict users of my parser generator tool to any one of the three options
> (Smalltalk code,
> parse tree, byte code).  It is my responsibility to make all three options
> as easy and
> efficient as reasonably possible for users of the parser generator tool.
> Haven't put
> much thought into this yet though.   So far Smalltalk (Squeak) is the only
> option.
>
>
> >> One of the problems I have is that, for some languages, the natural
> >> translation
> >> into VM code uses computed gotos.
> >> There are two scenarios here:
> >>
> >>      1) goto X  where X is a variable.
> >>      2) goto  (coll at: y)  where coll is a Collection.
> >>
>
> > There are several ways of implementing this without computed bytecodes in
> > the instruction set, but there is also the possibility of implementing it
> > directly in the instruction set.
>
> > Off the top of my head one could
>
> > - map to perform: using some mangled selector.  Yes this is problematic
> > because one has to split the scope between the two methods, so in general
> > it's not a solution
>
> Doesn't appeal to me.
>
> > - map to a case statement, which is what Squeak does. Just map it to a
> > sequence of compare and branches.  Or better still, to a binary tree.
> > Coincidentally this is used by the JIT to implement block dispatch in
> > methods that contain more than one block.  I know of other VM
> > implementations using it for PIC dispatch with really good performance.
>
> I don't know what you mean my Squeak mapping to a case statement since
> there is no case statement in  Squeak/Smalltalk and I can't off hand think
> of
> where one is needed (some Squeak users might feel they need one but that
> is a
> different matter).  The use of compare and branches might be OK in some
> cases
> but a mess for the finite state machines generated from regular
> expressions.
> Actually, even with computed gotos  FSMs are somewhat messy but without
> them
> it's worse.  I don't know what  'PIC dispatch' is.
> To use a binary tree don't I need some kind of computed goto for when I
> reach
> a leaf of the tree????
>
> > - use thisContext pc: value.
>
> This would be a possibility for me to experiment with for now.  When I
> have a working
> parser generator tool I could campaign for my computed goto instructions
> to be added
> to the VM.
>
> > This /should/ be fine in the stack VM, but
> > slooooow in the JIT because internally mapping bytecode pcs to machine
> code
> > pcs is slow, and currently slower still because the frame will be
> converted
> > to a pure context and then converted back into a frame on the return from
> > pc:.  But this solution isn't to be rejected out-of-hand.  It can be
> > optimized to avoid the frame conversion and the JIT might be able to
> > optimize it.
>
> I assume that if computed gotos were used the translation to machine code
> would require a direct
> mapping of (virtually labeled) bytecode locations to machine code
> locations.  I think this can be done
> in a reasonable amount of time but others such as yourself clearly
> understand the issues far better than
> I do.  The dirty solution to start would be to simply not JIT the code
> that uses computed gotos.
>
> > The main problem is the compiler has no support for labels so
> > there would be work here.
>
> I don't mind doing the work but to my way of thinking  "goto X" is pretty
> basic
> and is thus best handled at the VM/byte code level.  Anything else is doing
> in a complicated way something that is fairly simple.  Of course changing
> the VM/byte codes by even a single byte code is a major deal unless done
> when the VM/byte codes are initially created.  Alas I must deal with what
> already exists.  Even so, my preference is to work with the VM if at all
> possible.
>
>
> >> For example, one such language is that of regular expressions, which I
> >> wish to translate into finite state machines implemented in VM code.
> >> In this case I need case 2) gotos where coll is a collection of
> >> associations, possibly a
> >> Dictionary. I also plan to write a debugger for this (and other
> languages)
> >> but that is another story.
> >>
> >> I realize that the Cog VM is being built for Smalltalk (Squeak? Pharo?)
> >> for which the goto instructions are not needed and thus I assume
> >> unavailable. But there is something to
> >> viewing a virtual machine as general purpose and thus the target of
> >> multiple languages as is
> >> the case for the Java virtual machine.
> >> If the Cog VM is viewed this way then I argue there is a need for my
> goto
> >> instructions
> >> because some languages have need for them.
> >> For example, many languages have case statements.  (I am all for object
> >> oriented
> >> but I would be willing to accept a case statement in Smalltalk too;  the
> >> Squeak code
> >> implemented one in Squeak doesn't cut it).
> >
>
> > I've occasionally thought about this for many years.  A computed jump
> might
> > be nice.  Eg index an Array literal of pcs with the integer on top of
> > stack, falling through on bad type or out of range.
>
> This is the way I am thinking.  If there are other reasons for a computed
> jumpTo
> as well all the better.
>
> > Anyway, I am not arguing to Change Squeak or Smalltalk but I am arguing
> > to have my goto instructions in Cog VM. Is there any chance of this?????
> >
>
> There's no chance of me spending time implementing this any time soon.  I
> have too much high-priority tasks to tackle this.  But I want to encourage
> you or others to have a go implementing it.  It's fun!
>
> I understand and am willing to be the one to add one or more computed jump
> instructions, including working on the JIT code generator if needed.
> As you say it should be fun (and also educational).  But
>    1)  I am pretty busy now too and probably won't get to this for a year.
>    2)  If I am to do this it would be great if someone can write a
> specification as to
>         what is to be done.  If someone can write this now that would be
> great but
>         if they write it when I post that I am ready to do the work that
> would also
>         be fine.
>    3)  I don't want to just have my own private VM/byte codes.  I want
> users of my
>         parser generator tool to be able to load it into a standard
> version of Squeak
>         and run it there including the possible generation of compilers
> for compiling
>         their domain specific language programs into byte codes if desired.
>
> >> I don't know the Squeak VM or the Cog VM either but I assume these
> >> instructions don't exist because I see no need of them when the source
> >> language is
> >> Squeak or any version of Smalltalk for that matter. I also assume that
> >> there is already
> >> a full list of 256 instructions in the Cog VM and thus no room for my
> goto
> >> instructions
> >> unless some instructions are removed.
> >>
> >> Are there Cog VM instructions that are so rarely used that they could be
> >> removed without
> >> unreasonably slowing down the Cog VM interpretation of byte codes
> >> generated from Squeak source code?????
> >>
>
> > The current set has 3 unused bytecodes, one of which Spur uses, so
> > effectively there are two unused bytecodes.
>
> Levente Uzonyi  in his posting pointed out that only one instruction is
> needed.
> I don't like having to push the address to jump to onto the stack,
> preferring a byte
> code with an argument, but I could live with his solution if that is what
> is decided.
> In the case of  goto  coll at: X  the address is likely to end up on top
> of the stack
> anyway so Levente's  jumpToTop instruction looks good in any case.
>
> > The Cog VMs support multiple bytecode sets.  If you look at the
> > BytecodeSets package on VMMaker you can read the class comments of the
> > BytecodeEncoder subclasses such as EncoderForSistaV1.  These bytecode
> sets
> > have a few more unused bytecodes.  This multiple bytecode set support is
> > better implemented in Spur where there is only one compiled method header
> > format and support for 64k literals.  So let me encourage you to move to
> > Spur and to look at the Sista set.  The class comment of each encoder
> class
> > specifies the instruction set it targets.
>
> I am prepared to work with Spur and the Sista set.  I am looking for
> someone to
> say that if I do this work that incorporating the work into Spur will be
> seriously
> considered.
>
> Ralph Boland
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20141103/91d13c47/attachment-0001.htm


More information about the Vm-dev mailing list