[Vm-dev] Bytecode set (was Ubuntu unit issues)

Eliot Miranda eliot.miranda at gmail.com
Mon Apr 16 17:32:25 UTC 2012


    Smalltalk-80 (and Squeak) opcodes are for a spaghetti stack machine
where each activation is a separate object.  These activation objects are
called contexts, and chain together thorugh the sender field.  Each context
has a fixed size stack (in Squeak there are small and large contexts,
maximum size 52 stack slots).  Each activation holds onto a compiled method
which is a vector of literal objects and a vector of bytecodes.  In Squeak
and Smalltalk-80 these two vectors are encoded in a single flat object,
half references to other objects (literals) half bytes (opcodes).  Since
both contexts and compiled methods are objects the system implements its
compiler and meta-level interpreter in Smalltalk itself, which require a
real machine (the virtual machine) to execute.  If you run a Squeak or
Pharo system you will be able to browse the classes that implement the
compiler and the meta-level interpreter.  In particular:

The classes EncoderForV3 & EncoderForV3PlusClosures implement the back-end
of the compiler, generating concrete opcodes for abstract bytecodes such as
pushReceiver: send:numArgs: etc.
Instances of class CompiledMethod are generated by the compiler (see
MethodNode>generate:using:) using an instance of EncoderForV3PlusClosures.

The class InstructionClient defines all the abstract opcodes for the
current V3 plus closures instruction set.
The class InstructionStream decodes/interprets CompiledMethod instances,
dispatching sends of the messages understood by InstructionClient to
itself.  InstructionStream has several subclasses which respond to the seds
of the opcodes in different ways.

Most importantly ContextPart and its subclass MethodContext implement
the InstructionClient api by simulating execution.  Hence ContextPart and
MethodContext provide a specification in Smalltalk of the semantics of the
bytecodes.  EncoderForV3 & EncoderForV3PlusClosures serve as a convenient
reference for opcode encodings, and are well-commented.

By the way InstructionClient's subclass InstructionPrinter responds to the
api by disassembling a compiled method, hence aCompiledMethod symbolic
prints opcodes, e.g.
(Object >> #printOn:) symbolic evaluates to the string
'37 <70> self
38 <C7> send: class
39 <D0> send: name
40 <69> popIntoTemp: 1
41 <10> pushTemp: 0
42 <88> dup
43 <11> pushTemp: 1
44 <D5> send: first
45 <D4> send: isVowel
46 <99> jumpFalse: 49
47 <23> pushConstant: ''an ''
48 <90> jumpTo: 50
49 <22> pushConstant: ''a ''
50 <E1> send: nextPutAll:
51 <87> pop
52 <11> pushTemp: 1
53 <E1> send: nextPutAll:
54 <87> pop
55 <78> returnSelf

and InstructionStream's subclass Decompiler implements the api by
reconstructing a compiler parse tree for the compiled method, so e.g.
(Object >> #printOn:) decompile prints as
printOn: t1
| t2 |
t2 := self class name.
nextPutAll: (t2 first isVowel
ifTrue: ['an ']
ifFalse: ['a ']);
 nextPutAll: t2
whereas the source code for the same method ((Object >> #printOn:)
getSourceFromFile) evaluates to a Text for
'printOn: aStream
"Append to the argument, aStream, a sequence of characters that
identifies the receiver."

| title |
title := self class name.
nextPutAll: (title first isVowel ifTrue: [''an ''] ifFalse: [''a '']);
nextPutAll: title'

So if you want to find a current, comprehensible specification of the
Squeak/Pharo opcode set I recommend
browsing EncoderForV3, EncoderForV3PlusClosures, InstructionClient,
InstructionStream, ContextPart
MethodContext.  Further, I recommend exploring existing CompiledMethod
instances using doits such as

    SystemNavigation new browseAllSelect: [:m| m scanFor: 137]


On Mon, Apr 16, 2012 at 10:03 AM, Jeremy Kajikawa <jeremy.kajikawa at gmail.com
> wrote:

> Colin: thanks... something like that... just trying to work out the
> octet numbers and formatting for what data goes where.
> as I trying to encode this at assembler level where each opcode value
> has a specific routine that is called from a opCodeVector JumpTable
> Each Entry in the JumpTable is directly executed by the processor with
> a second JumpTable encoded similarly for basic microcode Read/Write
> functions to deal with various standard DataTypes in fixed formats
> this is to plug into the generic Interpreter engine I already have.
> the first test of this was to Emulate an Intel 80486 on a Motorola
> 68040 processor with the Host running at 25MHz.
> I managed to get an average speed rating of between 16MHz to 20MHz
> performance even with "real world" code being run through
> I am currently re-implimenting this engine on top of a PPC host and
> would like to expand its modularity to additional languages and
> targets.
> If at all possible I would like to make the equivalent "machine level"
> interpretation of the opcode numbers possible even if there is inline
> data and addresses present as well.
> With having no prior experience with Smalltalk any usage of terms I
> know in a different will won't make any sense initially and trying to
> get to grips with Smalltalk by using the Environment ... I already
> tried this unsuccessfully.
> I'm more interested in the number codes that each operation is
> represented by and making routines to match within set ranges,  and
> where one operation is multiple codes chained,  being able to have a
> listing starting with 0x00 is opcode "somename" and has N octets of
> immediate values following it formatted as ?:? bitstrings.
> If that makes any sense?
> as for stack or message information,  I'm willing to work out what is
> needed to make those happen if they are needed as bytecode level
> information.
> On Tue, Apr 17, 2012 at 4:20 AM, Colin Putney <colin at wiresong.com> wrote:
> >
> >
> > On 2012-04-16, at 8:14 AM, Jeremy Kajikawa wrote:
> >
> > I am somewhat dogmatically minded about technical details,  so I am
> > unlikely to wade through buckets of documentation about Smalltalk as a
> > language and how to use it if it is not answering the question about
> > what I am looking up.
> >
> >
> > I'm confused. You want to implement a Smalltalk interpreter, but you're
> not interested in the details of the language? Perhaps you should tell us
> what your overall goal is. That way we can provide more useful information.
> >
> > As for documentation of the bytecode set, you may find the Blue Book
> useful. It's the canonical description of how Smalltalk works, including
> the interpreter. Squeak is a descendant of this implementation. The section
> on the interpreter is here:
> >
> >
> http://www.mirandabanda.org/bluebook/bluebook_chapter28.html#StackBytecodes28
> >
> > Hope this helps,
> >
> > Colin
> >
> > PS. Since this has nothing to do with Ubuntu, I've changed the subject
> to something more appropriate
> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20120416/3ebccc36/attachment.htm

More information about the Vm-dev mailing list