Jeremy,<div><br></div><div> Smalltalk-80 (and Squeak) opcodes are for a spaghetti stack machine where each activation is a separate object. These activation objects are called contexts, and chain together thorugh the sender field. Each context has a fixed size stack (in Squeak there are small and large contexts, maximum size 52 stack slots). Each activation holds onto a compiled method which is a vector of literal objects and a vector of bytecodes. In Squeak and Smalltalk-80 these two vectors are encoded in a single flat object, half references to other objects (literals) half bytes (opcodes). Since both contexts and compiled methods are objects the system implements its compiler and meta-level interpreter in Smalltalk itself, which require a real machine (the virtual machine) to execute. If you run a Squeak or Pharo system you will be able to browse the classes that implement the compiler and the meta-level interpreter. In particular:</div>
<div><br></div><div>The classes EncoderForV3 & EncoderForV3PlusClosures implement the back-end of the compiler, generating concrete opcodes for abstract bytecodes such as pushReceiver: send:numArgs: etc.</div><div>Instances of class CompiledMethod are generated by the compiler (see MethodNode>generate:using:) using an instance of EncoderForV3PlusClosures.</div>
<div><br></div><div>The class InstructionClient defines all the abstract opcodes for the current V3 plus closures instruction set.</div><div>The class InstructionStream decodes/interprets CompiledMethod instances, dispatching sends of the messages understood by InstructionClient to itself. InstructionStream has several subclasses which respond to the seds of the opcodes in different ways.</div>
<div><br></div><div>Most importantly ContextPart and its subclass MethodContext implement the InstructionClient api by simulating execution. Hence ContextPart and MethodContext provide a specification in Smalltalk of the semantics of the bytecodes. EncoderForV3 & EncoderForV3PlusClosures serve as a convenient reference for opcode encodings, and are well-commented.</div>
<div><br></div><div>By the way InstructionClient's subclass InstructionPrinter responds to the api by disassembling a compiled method, hence aCompiledMethod symbolic prints opcodes, e.g.</div><div>(Object >> #printOn:) symbolic evaluates to the string</div>
<div><div>'37 <70> self</div><div>38 <C7> send: class</div><div>39 <D0> send: name</div><div>40 <69> popIntoTemp: 1</div><div>41 <10> pushTemp: 0</div><div>42 <88> dup</div><div>43 <11> pushTemp: 1</div>
<div>44 <D5> send: first</div><div>45 <D4> send: isVowel</div><div>46 <99> jumpFalse: 49</div><div>47 <23> pushConstant: ''an ''</div><div>48 <90> jumpTo: 50</div><div>49 <22> pushConstant: ''a ''</div>
<div>50 <E1> send: nextPutAll:</div><div>51 <87> pop</div><div>52 <11> pushTemp: 1</div><div>53 <E1> send: nextPutAll:</div><div>54 <87> pop</div><div>55 <78> returnSelf</div><div>'</div>
</div><div><br></div><div><br></div><div>and InstructionStream's subclass Decompiler implements the api by reconstructing a compiler parse tree for the compiled method, so e.g.</div><div><div>(Object >> #printOn:) decompile prints as</div>
<div><div>printOn: t1 </div><div><span class="Apple-tab-span" style="white-space:pre">        </span>| t2 |</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>t2 := self class name.</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>t1</div>
<div><span class="Apple-tab-span" style="white-space:pre">                </span>nextPutAll: (t2 first isVowel</div><div><span class="Apple-tab-span" style="white-space:pre">                                </span>ifTrue: ['an ']</div><div><span class="Apple-tab-span" style="white-space:pre">                                </span>ifFalse: ['a ']);</div>
<div><span class="Apple-tab-span" style="white-space:pre">                </span> nextPutAll: t2</div></div><div>whereas the source code for the same method ((Object >> #printOn:) getSourceFromFile) evaluates to a Text for</div><div>
<div>'printOn: aStream</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>"Append to the argument, aStream, a sequence of characters that </div><div><span class="Apple-tab-span" style="white-space:pre">        </span>identifies the receiver."</div>
<div><br></div><div><span class="Apple-tab-span" style="white-space:pre">        </span>| title |</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>title := self class name.</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>aStream</div>
<div><span class="Apple-tab-span" style="white-space:pre">                </span>nextPutAll: (title first isVowel ifTrue: [''an ''] ifFalse: [''a '']);</div><div><span class="Apple-tab-span" style="white-space:pre">                </span>nextPutAll: title'</div>
</div></div><div><br></div><div>So if you want to find a current, comprehensible specification of the Squeak/Pharo opcode set I recommend browsing EncoderForV3, EncoderForV3PlusClosures, InstructionClient, InstructionStream, ContextPart MethodContext. Further, I recommend exploring existing CompiledMethod instances using doits such as</div>
<div><br></div><div> SystemNavigation new browseAllSelect: [:m| m scanFor: 137]</div><div><br></div><div>HTH</div><div>Eliot</div><div><br><div class="gmail_quote">On Mon, Apr 16, 2012 at 10:03 AM, Jeremy Kajikawa <span dir="ltr"><<a href="mailto:jeremy.kajikawa@gmail.com">jeremy.kajikawa@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Colin: thanks... something like that... just trying to work out the<br>
octet numbers and formatting for what data goes where.<br>
<br>
as I trying to encode this at assembler level where each opcode value<br>
has a specific routine that is called from a opCodeVector JumpTable<br>
<br>
Each Entry in the JumpTable is directly executed by the processor with<br>
a second JumpTable encoded similarly for basic microcode Read/Write<br>
functions to deal with various standard DataTypes in fixed formats<br>
<br>
this is to plug into the generic Interpreter engine I already have.<br>
<br>
the first test of this was to Emulate an Intel 80486 on a Motorola<br>
68040 processor with the Host running at 25MHz.<br>
<br>
I managed to get an average speed rating of between 16MHz to 20MHz<br>
performance even with "real world" code being run through<br>
<br>
I am currently re-implimenting this engine on top of a PPC host and<br>
would like to expand its modularity to additional languages and<br>
targets.<br>
<br>
If at all possible I would like to make the equivalent "machine level"<br>
interpretation of the opcode numbers possible even if there is inline<br>
data and addresses present as well.<br>
<br>
With having no prior experience with Smalltalk any usage of terms I<br>
know in a different will won't make any sense initially and trying to<br>
get to grips with Smalltalk by using the Environment ... I already<br>
tried this unsuccessfully.<br>
<br>
I'm more interested in the number codes that each operation is<br>
represented by and making routines to match within set ranges, and<br>
where one operation is multiple codes chained, being able to have a<br>
listing starting with 0x00 is opcode "somename" and has N octets of<br>
immediate values following it formatted as ?:? bitstrings.<br>
<br>
If that makes any sense?<br>
<br>
as for stack or message information, I'm willing to work out what is<br>
needed to make those happen if they are needed as bytecode level<br>
information.<br>
<br>
On Tue, Apr 17, 2012 at 4:20 AM, Colin Putney <<a href="mailto:colin@wiresong.com">colin@wiresong.com</a>> wrote:<br>
><br>
><br>
> On 2012-04-16, at 8:14 AM, Jeremy Kajikawa wrote:<br>
><br>
> I am somewhat dogmatically minded about technical details, so I am<br>
> unlikely to wade through buckets of documentation about Smalltalk as a<br>
> language and how to use it if it is not answering the question about<br>
> what I am looking up.<br>
><br>
><br>
> I'm confused. You want to implement a Smalltalk interpreter, but you're not interested in the details of the language? Perhaps you should tell us what your overall goal is. That way we can provide more useful information.<br>
><br>
> As for documentation of the bytecode set, you may find the Blue Book useful. It's the canonical description of how Smalltalk works, including the interpreter. Squeak is a descendant of this implementation. The section on the interpreter is here:<br>
><br>
> <a href="http://www.mirandabanda.org/bluebook/bluebook_chapter28.html#StackBytecodes28" target="_blank">http://www.mirandabanda.org/bluebook/bluebook_chapter28.html#StackBytecodes28</a><br>
><br>
> Hope this helps,<br>
><br>
> Colin<br>
><br>
> PS. Since this has nothing to do with Ubuntu, I've changed the subject to something more appropriate<br>
><br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>best,<div>Eliot</div><br>
</div>