<div dir="ltr">Is this a request for a jump table? (switch statement in C)<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">2014-11-03 2:48 GMT-03:00 Ralph Boland <span dir="ltr">&lt;<a href="mailto:rpboland@gmail.com" target="_blank">rpboland@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br><div dir="ltr"><div>&gt;&gt;<br>

&gt;&gt; I am working on a parser generator tool (a replacement for SmaCC) and<br>

&gt;&gt; one of the things I a m interested in is the direct translation of<br>

&gt;&gt; language specifications into the virtual machine code (SmaCC and my<br>

&gt;&gt; current version of it use Squeak as the target language).<br>

&gt;&gt;<br>

<br>

&gt; First, a different approach than compiling to Smalltalk is to compile to a<br>

&gt; parse tree.  We do this in the pseudo JavaScript compiler we&#39;ve implemented<br>

&gt; at Cadence, translating an AST into a Squeak compiler parse tree for code<br>

&gt; generation.  Targeting a parse tree gives you much more freedom; you can<br>

&gt; express things that aren&#39;t expressible in Smalltalk.  And if you target<br>

&gt; bytecodes you can do even more.<br><br><br></div><div>I never considered using a parse tree as the target.  An interesting idea<br></div><div>which in many instances may be the best approach.  But for my regular expression<br></div><div>example I would still want to generate byte codes.  In any case I wouldn&#39;t want to<br>restrict users of my parser generator tool to any one of the three options (Smalltalk code,<br></div><div>parse tree, byte code).  It is my responsibility to make all three options as easy and<br>efficient as reasonably possible for users of the parser generator tool.  Haven&#39;t put<br>much thought into this yet though.   So far Smalltalk (Squeak) is the only option.<br></div><div>

<br>

<br>

&gt;&gt; One of the problems I have is that, for some languages, the natural<br>

&gt;&gt; translation<br>

&gt;&gt; into VM code uses computed gotos.<br>

&gt;&gt; There are two scenarios here:<br>

&gt;&gt;<br>

&gt;&gt;      1) goto X  where X is a variable.<br>

&gt;&gt;      2) goto  (coll at: y)  where coll is a Collection.<br>

&gt;&gt;<br>

<br>

&gt; There are several ways of implementing this without computed bytecodes in<br>

&gt; the instruction set, but there is also the possibility of implementing it<br>

&gt; directly in the instruction set.<br>

<br>

&gt; Off the top of my head one could<br>

<br>

&gt; - map to perform: using some mangled selector.  Yes this is problematic<br>&gt; because one has to split the scope between the two methods, so in general<br>

&gt; it&#39;s not a solution<br>

<br>Doesn&#39;t appeal to me.<br><br>

&gt; - map to a case statement, which is what Squeak does. Just map it to a<br>

&gt; sequence of compare and branches.  Or better still, to a binary tree.<br>

&gt; Coincidentally this is used by the JIT to implement block dispatch in<br>

&gt; methods that contain more than one block.  I know of other VM<br>

&gt; implementations using it for PIC dispatch with really good performance.<br><br></div><div>I don&#39;t know what you mean my Squeak mapping to a case statement since<br>there is no case statement in  Squeak/Smalltalk and I can&#39;t off hand think of<br>where one is needed (some Squeak users might feel they need one but that is a<br></div><div>different matter).  The use of compare and branches might be OK in some cases<br></div><div>but a mess for the finite state machines generated from regular expressions.<br></div><div>Actually, even with computed gotos  FSMs are somewhat messy but without them<br></div><div>it&#39;s worse.  I don&#39;t know what  &#39;PIC dispatch&#39; is.<br></div><div>To use a binary tree don&#39;t I need some kind of computed goto for when I reach<br>a leaf of the tree????<br></div><div>

<br>

&gt; - use thisContext pc: value.<br><br>This would be a possibility for me to experiment with for now.  When I have a working<br>parser generator tool I could campaign for my computed goto instructions to be added<br>to the VM.  <br><br>&gt; This /should/ be fine in the stack VM, but<br>

&gt; slooooow in the JIT because internally mapping bytecode pcs to machine code<br>

&gt; pcs is slow, and currently slower still because the frame will be converted<br>

&gt; to a pure context and then converted back into a frame on the return from<br>

&gt; pc:.  But this solution isn&#39;t to be rejected out-of-hand.  It can be<br>

&gt; optimized to avoid the frame conversion and the JIT might be able to<br>

&gt; optimize it.<br><br></div><div>I assume that if computed gotos were used the translation to machine code would require a direct<br>mapping of (virtually labeled) bytecode locations to machine code locations.  I think this can be done<br>in a reasonable amount of time but others such as yourself clearly understand the issues far better than<br>I do.  The dirty solution to start would be to simply not JIT the code that uses computed gotos.<br></div><div><br>&gt; The main problem is the compiler has no support for labels so<br>

&gt; there would be work here.<br><br></div><div>I don&#39;t mind doing the work but to my way of thinking  &quot;goto X&quot; is pretty basic<br>and is thus best handled at the VM/byte code level.  Anything else is doing<br>in a complicated way something that is fairly simple.  Of course changing<br></div><div>the VM/byte codes by even a single byte code is a major deal unless done<br>when the VM/byte codes are initially created.  Alas I must deal with what<br>already exists.  Even so, my preference is to work with the VM if at all<br></div><div>possible.<br></div><div>

<br>

<br>

&gt;&gt; For example, one such language is that of regular expressions, which I<br>

&gt;&gt; wish to translate into finite state machines implemented in VM code.<br>

&gt;&gt; In this case I need case 2) gotos where coll is a collection of<br>

&gt;&gt; associations, possibly a<br>

&gt;&gt; Dictionary. I also plan to write a debugger for this (and other languages)<br>

&gt;&gt; but that is another story.<br>

&gt;&gt;<br>

&gt;&gt; I realize that the Cog VM is being built for Smalltalk (Squeak? Pharo?)<br>

&gt;&gt; for which the goto instructions are not needed and thus I assume<br>

&gt;&gt; unavailable. But there is something to<br>

&gt;&gt; viewing a virtual machine as general purpose and thus the target of<br>

&gt;&gt; multiple languages as is<br>

&gt;&gt; the case for the Java virtual machine.<br>

&gt;&gt; If the Cog VM is viewed this way then I argue there is a need for my goto<br>

&gt;&gt; instructions<br>

&gt;&gt; because some languages have need for them.<br>

&gt;&gt; For example, many languages have case statements.  (I am all for object<br>

&gt;&gt; oriented<br>

&gt;&gt; but I would be willing to accept a case statement in Smalltalk too;  the<br>

&gt;&gt; Squeak code<br>

&gt;&gt; implemented one in Squeak doesn&#39;t cut it).<br>

&gt;<br>

<br>

&gt; I&#39;ve occasionally thought about this for many years.  A computed jump might<br>&gt; be nice.  Eg index an Array literal of pcs with the integer on top of<br>

&gt; stack, falling through on bad type or out of range.<br>

<br></div><div>This is the way I am thinking.  If there are other reasons for a computed jumpTo<br>as well all the better.<br></div><div>

<br>

&gt; Anyway, I am not arguing to Change Squeak or Smalltalk but I am arguing<br>

&gt; to have my goto instructions in Cog VM. Is there any chance of this?????<br>

&gt;<br>

<br>

There&#39;s no chance of me spending time implementing this any time soon.  I<br>

have too much high-priority tasks to tackle this.  But I want to encourage<br>

you or others to have a go implementing it.  It&#39;s fun!<br><br></div><div>I understand and am willing to be the one to add one or more computed jump<br>instructions, including working on the JIT code generator if needed.<br>As you say it should be fun (and also educational).  But<br></div><div>   1)  I am pretty busy now too and probably won&#39;t get to this for a year.<br></div><div>   2)  If I am to do this it would be great if someone can write a specification as to<br></div><div>        what is to be done.  If someone can write this now that would be great but<br></div><div>        if they write it when I post that I am ready to do the work that would also<br></div><div>        be fine.<br></div><div>   3)  I don&#39;t want to just have my own private VM/byte codes.  I want users of my<br></div><div>        parser generator tool to be able to load it into a standard version of Squeak<br></div><div>        and run it there including the possible generation of compilers for compiling<br>        their domain specific language programs into byte codes if desired.<br></div><div>

<br>

&gt;&gt; I don&#39;t know the Squeak VM or the Cog VM either but I assume these<br>

&gt;&gt; instructions don&#39;t exist because I see no need of them when the source<br>

&gt;&gt; language is<br>

&gt;&gt; Squeak or any version of Smalltalk for that matter. I also assume that<br>

&gt;&gt; there is already<br>

&gt;&gt; a full list of 256 instructions in the Cog VM and thus no room for my goto<br>

&gt;&gt; instructions<br>

&gt;&gt; unless some instructions are removed.<br>

&gt;&gt;<br>

&gt;&gt; Are there Cog VM instructions that are so rarely used that they could be<br>

&gt;&gt; removed without<br>

&gt;&gt; unreasonably slowing down the Cog VM interpretation of byte codes<br>

&gt;&gt; generated from Squeak source code?????<br>

&gt;&gt;<br>

<br>

&gt; The current set has 3 unused bytecodes, one of which Spur uses, so<br>

&gt; effectively there are two unused bytecodes.<br>

<br>Levente Uzonyi  in his posting pointed out that only one instruction is needed.<br></div><div>I don&#39;t like having to push the address to jump to onto the stack, preferring a byte<br>code with an argument, but I could live with his solution if that is what is decided.<br></div><div>In the case of  goto  coll at: X  the address is likely to end up on top of the stack<br>anyway so Levente&#39;s  jumpToTop instruction looks good in any case.<br></div><div><br>

&gt; The Cog VMs support multiple bytecode sets.  If you look at the<br>

&gt; BytecodeSets package on VMMaker you can read the class comments of the<br>

&gt; BytecodeEncoder subclasses such as EncoderForSistaV1.  These bytecode sets<br>

&gt; have a few more unused bytecodes.  This multiple bytecode set support is<br>

&gt; better implemented in Spur where there is only one compiled method header<br>

&gt; format and support for 64k literals.  So let me encourage you to move to<br>

&gt; Spur and to look at the Sista set.  The class comment of each encoder class<br>

&gt; specifies the instruction set it targets.<br><br></div><div>I am prepared to work with Spur and the Sista set.  I am looking for someone to<br>say that if I do this work that incorporating the work into Spur will be seriously<br>considered.<br><br></div><div>Ralph Boland<br></div><div>

<br>

</div></div>

<br></blockquote></div><br></div>