[Vm-dev] Bytecode set (was Ubuntu unit issues)

Jeremy Kajikawa jeremy.kajikawa at gmail.com
Mon Apr 16 18:17:28 UTC 2012

It would be nice to actually see actual formatted data without the
language specific semantics,

since I am writing the VM itself
 (there is no other VM on the target so any requirement to run an
existing SqueakVM ...)

nice to know that tidbit of detail was dutifully ignored...

Glad to see that the opcode values are presented... but you skipped
the formatting of the objects themselves,  or is this left as an
exercise for the reader?

I am NOT a classically trained programmer in that I am self-taught for
the most part...
  and I am trying to phrase my question the best I know how.

use of Pharo or Squeak VirtualMachines **IS*NOT** available, no if but
maybe or otherwise.

so without my actually implimenting the VM itself to make the language
available it simply won't happen.

so... is there a defined list where the format of the instructions and
any attached description where strings of one or more octets are
modified or moved?

or am I trapped in the need to learn smalltalk itself to understand
the semantics of any answers given on this list?

Use of smalltalk to actually produce smalltalk is not an option at this point.

The target host is neither posix, win32 nor bare-metal.

the first programming language I learned was C based on K&R book materials...
  however... I have only ever used the Host OS routines and never
relied on the std c library

Thank you at least for trying to answer the request,


Beware of Assumptions,  the Hee then Haw before kicking your head in...

On Tue, Apr 17, 2012 at 5:32 AM, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> Jeremy,
>     Smalltalk-80 (and Squeak) opcodes are for a spaghetti stack machine where each activation is a separate object.  These activation objects are called contexts, and chain together thorugh the sender field.  Each context has a fixed size stack (in Squeak there are small and large contexts, maximum size 52 stack slots).  Each activation holds onto a compiled method which is a vector of literal objects and a vector of bytecodes.  In Squeak and Smalltalk-80 these two vectors are encoded in a single flat object, half references to other objects (literals) half bytes (opcodes).  Since both contexts and compiled methods are objects the system implements its compiler and meta-level interpreter in Smalltalk itself, which require a real machine (the virtual machine) to execute.  If you run a Squeak or Pharo system you will be able to browse the classes that implement the compiler and the meta-level interpreter.  In particular:
> The classes EncoderForV3 & EncoderForV3PlusClosures implement the back-end of the compiler, generating concrete opcodes for abstract bytecodes such as pushReceiver: send:numArgs: etc.
> Instances of class CompiledMethod are generated by the compiler (see MethodNode>generate:using:) using an instance of EncoderForV3PlusClosures.
> The class InstructionClient defines all the abstract opcodes for the current V3 plus closures instruction set.
> The class InstructionStream decodes/interprets CompiledMethod instances, dispatching sends of the messages understood by InstructionClient to itself.  InstructionStream has several subclasses which respond to the seds of the opcodes in different ways.
> Most importantly ContextPart and its subclass MethodContext implement the InstructionClient api by simulating execution.  Hence ContextPart and MethodContext provide a specification in Smalltalk of the semantics of the bytecodes.  EncoderForV3 & EncoderForV3PlusClosures serve as a convenient reference for opcode encodings, and are well-commented.
> By the way InstructionClient's subclass InstructionPrinter responds to the api by disassembling a compiled method, hence aCompiledMethod symbolic prints opcodes, e.g.
> (Object >> #printOn:) symbolic evaluates to the string
> '37 <70> self
> 38 <C7> send: class
> 39 <D0> send: name
> 40 <69> popIntoTemp: 1
> 41 <10> pushTemp: 0
> 42 <88> dup
> 43 <11> pushTemp: 1
> 44 <D5> send: first
> 45 <D4> send: isVowel
> 46 <99> jumpFalse: 49
> 47 <23> pushConstant: ''an ''
> 48 <90> jumpTo: 50
> 49 <22> pushConstant: ''a ''
> 50 <E1> send: nextPutAll:
> 51 <87> pop
> 52 <11> pushTemp: 1
> 53 <E1> send: nextPutAll:
> 54 <87> pop
> 55 <78> returnSelf
> '
> and InstructionStream's subclass Decompiler implements the api by reconstructing a compiler parse tree for the compiled method, so e.g.
> (Object >> #printOn:) decompile prints as
> printOn: t1
> | t2 |
> t2 := self class name.
> t1
> nextPutAll: (t2 first isVowel
> ifTrue: ['an ']
> ifFalse: ['a ']);
>  nextPutAll: t2
> whereas the source code for the same method ((Object >> #printOn:) getSourceFromFile) evaluates to a Text for
> 'printOn: aStream
> "Append to the argument, aStream, a sequence of characters that
> identifies the receiver."
> | title |
> title := self class name.
> aStream
> nextPutAll: (title first isVowel ifTrue: [''an ''] ifFalse: [''a '']);
> nextPutAll: title'
> So if you want to find a current, comprehensible specification of the Squeak/Pharo opcode set I recommend browsing EncoderForV3, EncoderForV3PlusClosures, InstructionClient, InstructionStream, ContextPart MethodContext.  Further, I recommend exploring existing CompiledMethod instances using doits such as
>     SystemNavigation new browseAllSelect: [:m| m scanFor: 137]
> Eliot

> On Mon, Apr 16, 2012 at 10:03 AM, Jeremy Kajikawa <jeremy.kajikawa at gmail.com> wrote:
>> Colin: thanks... something like that... just trying to work out the
>> octet numbers and formatting for what data goes where.
>> as I trying to encode this at assembler level where each opcode value
>> has a specific routine that is called from a opCodeVector JumpTable
>> Each Entry in the JumpTable is directly executed by the processor with
>> a second JumpTable encoded similarly for basic microcode Read/Write
>> functions to deal with various standard DataTypes in fixed formats
>> this is to plug into the generic Interpreter engine I already have.
>> the first test of this was to Emulate an Intel 80486 on a Motorola
>> 68040 processor with the Host running at 25MHz.
>> I managed to get an average speed rating of between 16MHz to 20MHz
>> performance even with "real world" code being run through
>> I am currently re-implimenting this engine on top of a PPC host and
>> would like to expand its modularity to additional languages and
>> targets.
>> If at all possible I would like to make the equivalent "machine level"
>> interpretation of the opcode numbers possible even if there is inline
>> data and addresses present as well.
>> With having no prior experience with Smalltalk any usage of terms I
>> know in a different will won't make any sense initially and trying to
>> get to grips with Smalltalk by using the Environment ... I already
>> tried this unsuccessfully.
>> I'm more interested in the number codes that each operation is
>> represented by and making routines to match within set ranges,  and
>> where one operation is multiple codes chained,  being able to have a
>> listing starting with 0x00 is opcode "somename" and has N octets of
>> immediate values following it formatted as ?:? bitstrings.
>> If that makes any sense?
>> as for stack or message information,  I'm willing to work out what is
>> needed to make those happen if they are needed as bytecode level
>> information.
>> On Tue, Apr 17, 2012 at 4:20 AM, Colin Putney <colin at wiresong.com> wrote:
>> >
>> >
>> > On 2012-04-16, at 8:14 AM, Jeremy Kajikawa wrote:
>> >
>> > I am somewhat dogmatically minded about technical details,  so I am
>> > unlikely to wade through buckets of documentation about Smalltalk as a
>> > language and how to use it if it is not answering the question about
>> > what I am looking up.
>> >
>> >
>> > I'm confused. You want to implement a Smalltalk interpreter, but you're not interested in the details of the language? Perhaps you should tell us what your overall goal is. That way we can provide more useful information.
>> >
>> > As for documentation of the bytecode set, you may find the Blue Book useful. It's the canonical description of how Smalltalk works, including the interpreter. Squeak is a descendant of this implementation. The section on the interpreter is here:
>> >
>> > http://www.mirandabanda.org/bluebook/bluebook_chapter28.html#StackBytecodes28
>> >
>> > Hope this helps,
>> >
>> > Colin
>> >
>> > PS. Since this has nothing to do with Ubuntu, I've changed the subject to something more appropriate
>> >
> --
> best,
> Eliot

More information about the Vm-dev mailing list