<div dir="ltr">Is this a request for a jump table? (switch statement in C)<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">2014-11-03 2:48 GMT-03:00 Ralph Boland <span dir="ltr"><<a href="mailto:rpboland@gmail.com" target="_blank">rpboland@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br><div dir="ltr"><div>>><br>
>> I am working on a parser generator tool (a replacement for SmaCC) and<br>
>> one of the things I a m interested in is the direct translation of<br>
>> language specifications into the virtual machine code (SmaCC and my<br>
>> current version of it use Squeak as the target language).<br>
>><br>
<br>
> First, a different approach than compiling to Smalltalk is to compile to a<br>
> parse tree. We do this in the pseudo JavaScript compiler we've implemented<br>
> at Cadence, translating an AST into a Squeak compiler parse tree for code<br>
> generation. Targeting a parse tree gives you much more freedom; you can<br>
> express things that aren't expressible in Smalltalk. And if you target<br>
> bytecodes you can do even more.<br><br><br></div><div>I never considered using a parse tree as the target. An interesting idea<br></div><div>which in many instances may be the best approach. But for my regular expression<br></div><div>example I would still want to generate byte codes. In any case I wouldn't want to<br>restrict users of my parser generator tool to any one of the three options (Smalltalk code,<br></div><div>parse tree, byte code). It is my responsibility to make all three options as easy and<br>efficient as reasonably possible for users of the parser generator tool. Haven't put<br>much thought into this yet though. So far Smalltalk (Squeak) is the only option.<br></div><div>
<br>
<br>
>> One of the problems I have is that, for some languages, the natural<br>
>> translation<br>
>> into VM code uses computed gotos.<br>
>> There are two scenarios here:<br>
>><br>
>> 1) goto X where X is a variable.<br>
>> 2) goto (coll at: y) where coll is a Collection.<br>
>><br>
<br>
> There are several ways of implementing this without computed bytecodes in<br>
> the instruction set, but there is also the possibility of implementing it<br>
> directly in the instruction set.<br>
<br>
> Off the top of my head one could<br>
<br>
> - map to perform: using some mangled selector. Yes this is problematic<br>> because one has to split the scope between the two methods, so in general<br>
> it's not a solution<br>
<br>Doesn't appeal to me.<br><br>
> - map to a case statement, which is what Squeak does. Just map it to a<br>
> sequence of compare and branches. Or better still, to a binary tree.<br>
> Coincidentally this is used by the JIT to implement block dispatch in<br>
> methods that contain more than one block. I know of other VM<br>
> implementations using it for PIC dispatch with really good performance.<br><br></div><div>I don't know what you mean my Squeak mapping to a case statement since<br>there is no case statement in Squeak/Smalltalk and I can't off hand think of<br>where one is needed (some Squeak users might feel they need one but that is a<br></div><div>different matter). The use of compare and branches might be OK in some cases<br></div><div>but a mess for the finite state machines generated from regular expressions.<br></div><div>Actually, even with computed gotos FSMs are somewhat messy but without them<br></div><div>it's worse. I don't know what 'PIC dispatch' is.<br></div><div>To use a binary tree don't I need some kind of computed goto for when I reach<br>a leaf of the tree????<br></div><div>
<br>
> - use thisContext pc: value.<br><br>This would be a possibility for me to experiment with for now. When I have a working<br>parser generator tool I could campaign for my computed goto instructions to be added<br>to the VM. <br><br>> This /should/ be fine in the stack VM, but<br>
> slooooow in the JIT because internally mapping bytecode pcs to machine code<br>
> pcs is slow, and currently slower still because the frame will be converted<br>
> to a pure context and then converted back into a frame on the return from<br>
> pc:. But this solution isn't to be rejected out-of-hand. It can be<br>
> optimized to avoid the frame conversion and the JIT might be able to<br>
> optimize it.<br><br></div><div>I assume that if computed gotos were used the translation to machine code would require a direct<br>mapping of (virtually labeled) bytecode locations to machine code locations. I think this can be done<br>in a reasonable amount of time but others such as yourself clearly understand the issues far better than<br>I do. The dirty solution to start would be to simply not JIT the code that uses computed gotos.<br></div><div><br>> The main problem is the compiler has no support for labels so<br>
> there would be work here.<br><br></div><div>I don't mind doing the work but to my way of thinking "goto X" is pretty basic<br>and is thus best handled at the VM/byte code level. Anything else is doing<br>in a complicated way something that is fairly simple. Of course changing<br></div><div>the VM/byte codes by even a single byte code is a major deal unless done<br>when the VM/byte codes are initially created. Alas I must deal with what<br>already exists. Even so, my preference is to work with the VM if at all<br></div><div>possible.<br></div><div>
<br>
<br>
>> For example, one such language is that of regular expressions, which I<br>
>> wish to translate into finite state machines implemented in VM code.<br>
>> In this case I need case 2) gotos where coll is a collection of<br>
>> associations, possibly a<br>
>> Dictionary. I also plan to write a debugger for this (and other languages)<br>
>> but that is another story.<br>
>><br>
>> I realize that the Cog VM is being built for Smalltalk (Squeak? Pharo?)<br>
>> for which the goto instructions are not needed and thus I assume<br>
>> unavailable. But there is something to<br>
>> viewing a virtual machine as general purpose and thus the target of<br>
>> multiple languages as is<br>
>> the case for the Java virtual machine.<br>
>> If the Cog VM is viewed this way then I argue there is a need for my goto<br>
>> instructions<br>
>> because some languages have need for them.<br>
>> For example, many languages have case statements. (I am all for object<br>
>> oriented<br>
>> but I would be willing to accept a case statement in Smalltalk too; the<br>
>> Squeak code<br>
>> implemented one in Squeak doesn't cut it).<br>
><br>
<br>
> I've occasionally thought about this for many years. A computed jump might<br>> be nice. Eg index an Array literal of pcs with the integer on top of<br>
> stack, falling through on bad type or out of range.<br>
<br></div><div>This is the way I am thinking. If there are other reasons for a computed jumpTo<br>as well all the better.<br></div><div>
<br>
> Anyway, I am not arguing to Change Squeak or Smalltalk but I am arguing<br>
> to have my goto instructions in Cog VM. Is there any chance of this?????<br>
><br>
<br>
There's no chance of me spending time implementing this any time soon. I<br>
have too much high-priority tasks to tackle this. But I want to encourage<br>
you or others to have a go implementing it. It's fun!<br><br></div><div>I understand and am willing to be the one to add one or more computed jump<br>instructions, including working on the JIT code generator if needed.<br>As you say it should be fun (and also educational). But<br></div><div> 1) I am pretty busy now too and probably won't get to this for a year.<br></div><div> 2) If I am to do this it would be great if someone can write a specification as to<br></div><div> what is to be done. If someone can write this now that would be great but<br></div><div> if they write it when I post that I am ready to do the work that would also<br></div><div> be fine.<br></div><div> 3) I don't want to just have my own private VM/byte codes. I want users of my<br></div><div> parser generator tool to be able to load it into a standard version of Squeak<br></div><div> and run it there including the possible generation of compilers for compiling<br> their domain specific language programs into byte codes if desired.<br></div><div>
<br>
>> I don't know the Squeak VM or the Cog VM either but I assume these<br>
>> instructions don't exist because I see no need of them when the source<br>
>> language is<br>
>> Squeak or any version of Smalltalk for that matter. I also assume that<br>
>> there is already<br>
>> a full list of 256 instructions in the Cog VM and thus no room for my goto<br>
>> instructions<br>
>> unless some instructions are removed.<br>
>><br>
>> Are there Cog VM instructions that are so rarely used that they could be<br>
>> removed without<br>
>> unreasonably slowing down the Cog VM interpretation of byte codes<br>
>> generated from Squeak source code?????<br>
>><br>
<br>
> The current set has 3 unused bytecodes, one of which Spur uses, so<br>
> effectively there are two unused bytecodes.<br>
<br>Levente Uzonyi in his posting pointed out that only one instruction is needed.<br></div><div>I don't like having to push the address to jump to onto the stack, preferring a byte<br>code with an argument, but I could live with his solution if that is what is decided.<br></div><div>In the case of goto coll at: X the address is likely to end up on top of the stack<br>anyway so Levente's jumpToTop instruction looks good in any case.<br></div><div><br>
> The Cog VMs support multiple bytecode sets. If you look at the<br>
> BytecodeSets package on VMMaker you can read the class comments of the<br>
> BytecodeEncoder subclasses such as EncoderForSistaV1. These bytecode sets<br>
> have a few more unused bytecodes. This multiple bytecode set support is<br>
> better implemented in Spur where there is only one compiled method header<br>
> format and support for 64k literals. So let me encourage you to move to<br>
> Spur and to look at the Sista set. The class comment of each encoder class<br>
> specifies the instruction set it targets.<br><br></div><div>I am prepared to work with Spur and the Sista set. I am looking for someone to<br>say that if I do this work that incorporating the work into Spur will be seriously<br>considered.<br><br></div><div>Ralph Boland<br></div><div>
<br>
</div></div>
<br></blockquote></div><br></div>