New subject: goto instruction with Cog VM

3 Nov 2014

      ...
...
I am working on a parser generator tool (a replacement for SmaCC) and
one of the things I a m interested in is the direct translation of
language specifications into the virtual machine code (SmaCC and my
current version of it use Squeak as the target language).
...
First, a different approach than compiling to Smalltalk is to compile to a
parse tree.  We do this in the pseudo JavaScript compiler we've
implemented
...
at Cadence, translating an AST into a Squeak compiler parse tree for code
generation.  Targeting a parse tree gives you much more freedom; you can
express things that aren't expressible in Smalltalk.  And if you target
bytecodes you can do even more.
I never considered using a parse tree as the target.  An interesting idea
which in many instances may be the best approach.  But for my regular
expression
example I would still want to generate byte codes.  In any case I wouldn't
want to
restrict users of my parser generator tool to any one of the three options
(Smalltalk code,
parse tree, byte code).  It is my responsibility to make all three options
as easy and
efficient as reasonably possible for users of the parser generator tool.
Haven't put
much thought into this yet though.   So far Smalltalk (Squeak) is the only
option.
...
...
One of the problems I have is that, for some languages, the natural
translation
into VM code uses computed gotos.
There are two scenarios here:
 1) goto X  where X is a variable.
 2) goto  (coll at: y)  where coll is a Collection.

...
There are several ways of implementing this without computed bytecodes in
the instruction set, but there is also the possibility of implementing it
directly in the instruction set.
...
Off the top of my head one could
...

map to perform: using some mangled selector.  Yes this is problematic

because one has to split the scope between the two methods, so in general
it's not a solution
Doesn't appeal to me.
...

map to a case statement, which is what Squeak does. Just map it to a

sequence of compare and branches.  Or better still, to a binary tree.
Coincidentally this is used by the JIT to implement block dispatch in
methods that contain more than one block.  I know of other VM
implementations using it for PIC dispatch with really good performance.
I don't know what you mean my Squeak mapping to a case statement since
there is no case statement in  Squeak/Smalltalk and I can't off hand think
of
where one is needed (some Squeak users might feel they need one but that is
a
different matter).  The use of compare and branches might be OK in some
cases
but a mess for the finite state machines generated from regular expressions.
Actually, even with computed gotos  FSMs are somewhat messy but without them
it's worse.  I don't know what  'PIC dispatch' is.
To use a binary tree don't I need some kind of computed goto for when I
reach
a leaf of the tree????
...

use thisContext pc: value.

This would be a possibility for me to experiment with for now.  When I have
a working
parser generator tool I could campaign for my computed goto instructions to
be added
to the VM.
...
This /should/ be fine in the stack VM, but
slooooow in the JIT because internally mapping bytecode pcs to machine
code
...
pcs is slow, and currently slower still because the frame will be
converted
...
to a pure context and then converted back into a frame on the return from
pc:.  But this solution isn't to be rejected out-of-hand.  It can be
optimized to avoid the frame conversion and the JIT might be able to
optimize it.
I assume that if computed gotos were used the translation to machine code
would require a direct
mapping of (virtually labeled) bytecode locations to machine code
locations.  I think this can be done
in a reasonable amount of time but others such as yourself clearly
understand the issues far better than
I do.  The dirty solution to start would be to simply not JIT the code that
uses computed gotos.
...
The main problem is the compiler has no support for labels so
there would be work here.
I don't mind doing the work but to my way of thinking  "goto X" is pretty
basic
and is thus best handled at the VM/byte code level.  Anything else is doing
in a complicated way something that is fairly simple.  Of course changing
the VM/byte codes by even a single byte code is a major deal unless done
when the VM/byte codes are initially created.  Alas I must deal with what
already exists.  Even so, my preference is to work with the VM if at all
possible.
...
...
For example, one such language is that of regular expressions, which I
wish to translate into finite state machines implemented in VM code.
In this case I need case 2) gotos where coll is a collection of
associations, possibly a
Dictionary. I also plan to write a debugger for this (and other
languages)
...
...
but that is another story.
I realize that the Cog VM is being built for Smalltalk (Squeak? Pharo?)
for which the goto instructions are not needed and thus I assume
unavailable. But there is something to
viewing a virtual machine as general purpose and thus the target of
multiple languages as is
the case for the Java virtual machine.
If the Cog VM is viewed this way then I argue there is a need for my goto
instructions
because some languages have need for them.
For example, many languages have case statements.  (I am all for object
oriented
but I would be willing to accept a case statement in Smalltalk too;  the
Squeak code
implemented one in Squeak doesn't cut it).
...
I've occasionally thought about this for many years.  A computed jump
might
...
be nice.  Eg index an Array literal of pcs with the integer on top of
stack, falling through on bad type or out of range.
This is the way I am thinking.  If there are other reasons for a computed
jumpTo
as well all the better.
...
Anyway, I am not arguing to Change Squeak or Smalltalk but I am arguing
to have my goto instructions in Cog VM. Is there any chance of this?????
There's no chance of me spending time implementing this any time soon.  I
have too much high-priority tasks to tackle this.  But I want to encourage
you or others to have a go implementing it.  It's fun!
I understand and am willing to be the one to add one or more computed jump
instructions, including working on the JIT code generator if needed.
As you say it should be fun (and also educational).  But
   1)  I am pretty busy now too and probably won't get to this for a year.
   2)  If I am to do this it would be great if someone can write a
specification as to
        what is to be done.  If someone can write this now that would be
great but
        if they write it when I post that I am ready to do the work that
would also
        be fine.
   3)  I don't want to just have my own private VM/byte codes.  I want
users of my
        parser generator tool to be able to load it into a standard version
of Squeak
        and run it there including the possible generation of compilers for
compiling
        their domain specific language programs into byte codes if desired.
...
...
I don't know the Squeak VM or the Cog VM either but I assume these
instructions don't exist because I see no need of them when the source
language is
Squeak or any version of Smalltalk for that matter. I also assume that
there is already
a full list of 256 instructions in the Cog VM and thus no room for my
goto
...
...
instructions
unless some instructions are removed.
Are there Cog VM instructions that are so rarely used that they could be
removed without
unreasonably slowing down the Cog VM interpretation of byte codes
generated from Squeak source code?????
...
The current set has 3 unused bytecodes, one of which Spur uses, so
effectively there are two unused bytecodes.
Levente Uzonyi  in his posting pointed out that only one instruction is
needed.
I don't like having to push the address to jump to onto the stack,
preferring a byte
code with an argument, but I could live with his solution if that is what
is decided.
In the case of  goto  coll at: X  the address is likely to end up on top of
the stack
anyway so Levente's  jumpToTop instruction looks good in any case.
...
The Cog VMs support multiple bytecode sets.  If you look at the
BytecodeSets package on VMMaker you can read the class comments of the
BytecodeEncoder subclasses such as EncoderForSistaV1.  These bytecode sets
have a few more unused bytecodes.  This multiple bytecode set support is
better implemented in Spur where there is only one compiled method header
format and support for 64k literals.  So let me encourage you to move to
Spur and to look at the Sista set.  The class comment of each encoder
class
...
specifies the instruction set it targets.
I am prepared to work with Spur and the Sista set.  I am looking for
someone to
say that if I do this work that incorporating the work into Spur will be
seriously
considered.
Ralph Boland

Re: [Vm-dev] goto instruction with Cog VM