[Vm-dev] Bytecode set (was Ubuntu unit issues)

Eliot Miranda eliot.miranda at gmail.com
Mon Apr 16 19:57:32 UTC 2012


Hi Jeremy,

   I'll try one more time :)

On Mon, Apr 16, 2012 at 11:17 AM, Jeremy Kajikawa <jeremy.kajikawa at gmail.com
> wrote:

>
> It would be nice to actually see actual formatted data without the
> language specific semantics,
>
> since I am writing the VM itself
>  (there is no other VM on the target so any requirement to run an
> existing SqueakVM ...)
>
> nice to know that tidbit of detail was dutifully ignored...
>

I didn't ignore it.  I wasn't assuming you would use the existing system
for other than reading code and exploring the system.  I understand you
want to do a clean-room VM for the target (and good luck, that's a fun
project, one I cut my teeth on many years ago now).  You're sending email,
therefore you probably have access to a system that can run Squeak and
Pharo and can hence run the system and explore it so as to educate yourself
on what's involved in a Smalltalk VM implementation.  IMO the best source
of specification and documentation on the bytecode set is in the image in
the classes I mentioned.

Glad to see that the opcode values are presented... but you skipped
> the formatting of the objects themselves,  or is this left as an
> exercise for the reader?
>

I was trying to direct your attention to specification of the opcodes.  I
can point you to implementation of the image format, and hence the object
format, but not to specification. But I think it is important you
understand the bytecode and VM semantics if you don't want to waste time on
implementation details.  Only by understanding the semantics will you have
any idea of how complex the implementation is.  Send semantics and context
semantics are much much more complex than conventional processor opcode
semantics.


>
> I am NOT a classically trained programmer in that I am self-taught for
> the most part...
>  and I am trying to phrase my question the best I know how.
>
> use of Pharo or Squeak VirtualMachines **IS*NOT** available, no if but
> maybe or otherwise.
>

So what systems are you using to send email?


> so without my actually implimenting the VM itself to make the language
> available it simply won't happen.
>
> so... is there a defined list where the format of the instructions and
> any attached description where strings of one or more octets are
> modified or moved?
>

To my knowledge the only up-to-date info is in the image and in the VMMaker
package which is difficult to read other than using the running system.


> or am I trapped in the need to learn smalltalk itself to understand
> the semantics of any answers given on this list?
>
> Use of smalltalk to actually produce smalltalk is not an option at this
> point.
>

Why not?  You have no linux, Mac or Windows systems available to you?


>
> The target host is neither posix, win32 nor bare-metal.
>

But you don't have to run the system on the target host to explore it do
you?


> the first programming language I learned was C based on K&R book
> materials...
>  however... I have only ever used the Host OS routines and never
> relied on the std c library
>

I find this hard to believe.  You've never used printf?


> Thank you at least for trying to answer the request,
>

you're welcome.


>
> ジェレミー
>
> Beware of Assumptions,  the Hee then Haw before kicking your head in...
>
> On Tue, Apr 17, 2012 at 5:32 AM, Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
> >
> > Jeremy,
> >
> >     Smalltalk-80 (and Squeak) opcodes are for a spaghetti stack machine
> where each activation is a separate object.  These activation objects are
> called contexts, and chain together thorugh the sender field.  Each context
> has a fixed size stack (in Squeak there are small and large contexts,
> maximum size 52 stack slots).  Each activation holds onto a compiled method
> which is a vector of literal objects and a vector of bytecodes.  In Squeak
> and Smalltalk-80 these two vectors are encoded in a single flat object,
> half references to other objects (literals) half bytes (opcodes).  Since
> both contexts and compiled methods are objects the system implements its
> compiler and meta-level interpreter in Smalltalk itself, which require a
> real machine (the virtual machine) to execute.  If you run a Squeak or
> Pharo system you will be able to browse the classes that implement the
> compiler and the meta-level interpreter.  In particular:
> >
> > The classes EncoderForV3 & EncoderForV3PlusClosures implement the
> back-end of the compiler, generating concrete opcodes for abstract
> bytecodes such as pushReceiver: send:numArgs: etc.
> > Instances of class CompiledMethod are generated by the compiler (see
> MethodNode>generate:using:) using an instance of EncoderForV3PlusClosures.
> >
> > The class InstructionClient defines all the abstract opcodes for the
> current V3 plus closures instruction set.
> > The class InstructionStream decodes/interprets CompiledMethod instances,
> dispatching sends of the messages understood by InstructionClient to
> itself.  InstructionStream has several subclasses which respond to the seds
> of the opcodes in different ways.
> >
> > Most importantly ContextPart and its subclass MethodContext implement
> the InstructionClient api by simulating execution.  Hence ContextPart and
> MethodContext provide a specification in Smalltalk of the semantics of the
> bytecodes.  EncoderForV3 & EncoderForV3PlusClosures serve as a convenient
> reference for opcode encodings, and are well-commented.
> >
> > By the way InstructionClient's subclass InstructionPrinter responds to
> the api by disassembling a compiled method, hence aCompiledMethod symbolic
> prints opcodes, e.g.
> > (Object >> #printOn:) symbolic evaluates to the string
> > '37 <70> self
> > 38 <C7> send: class
> > 39 <D0> send: name
> > 40 <69> popIntoTemp: 1
> > 41 <10> pushTemp: 0
> > 42 <88> dup
> > 43 <11> pushTemp: 1
> > 44 <D5> send: first
> > 45 <D4> send: isVowel
> > 46 <99> jumpFalse: 49
> > 47 <23> pushConstant: ''an ''
> > 48 <90> jumpTo: 50
> > 49 <22> pushConstant: ''a ''
> > 50 <E1> send: nextPutAll:
> > 51 <87> pop
> > 52 <11> pushTemp: 1
> > 53 <E1> send: nextPutAll:
> > 54 <87> pop
> > 55 <78> returnSelf
> > '
> >
> >
> > and InstructionStream's subclass Decompiler implements the api by
> reconstructing a compiler parse tree for the compiled method, so e.g.
> > (Object >> #printOn:) decompile prints as
> > printOn: t1
> > | t2 |
> > t2 := self class name.
> > t1
> > nextPutAll: (t2 first isVowel
> > ifTrue: ['an ']
> > ifFalse: ['a ']);
> >  nextPutAll: t2
> > whereas the source code for the same method ((Object >> #printOn:)
> getSourceFromFile) evaluates to a Text for
> > 'printOn: aStream
> > "Append to the argument, aStream, a sequence of characters that
> > identifies the receiver."
> >
> > | title |
> > title := self class name.
> > aStream
> > nextPutAll: (title first isVowel ifTrue: [''an ''] ifFalse: [''a '']);
> > nextPutAll: title'
> >
> > So if you want to find a current, comprehensible specification of the
> Squeak/Pharo opcode set I recommend
> browsing EncoderForV3, EncoderForV3PlusClosures, InstructionClient, InstructionStream, ContextPart
> MethodContext.  Further, I recommend exploring existing CompiledMethod
> instances using doits such as
> >
> >     SystemNavigation new browseAllSelect: [:m| m scanFor: 137]
> >
> > HTH
> > Eliot
>
> > On Mon, Apr 16, 2012 at 10:03 AM, Jeremy Kajikawa <
> jeremy.kajikawa at gmail.com> wrote:
> >>
> >>
> >> Colin: thanks... something like that... just trying to work out the
> >> octet numbers and formatting for what data goes where.
> >>
> >> as I trying to encode this at assembler level where each opcode value
> >> has a specific routine that is called from a opCodeVector JumpTable
> >>
> >> Each Entry in the JumpTable is directly executed by the processor with
> >> a second JumpTable encoded similarly for basic microcode Read/Write
> >> functions to deal with various standard DataTypes in fixed formats
> >>
> >> this is to plug into the generic Interpreter engine I already have.
> >>
> >> the first test of this was to Emulate an Intel 80486 on a Motorola
> >> 68040 processor with the Host running at 25MHz.
> >>
> >> I managed to get an average speed rating of between 16MHz to 20MHz
> >> performance even with "real world" code being run through
> >>
> >> I am currently re-implimenting this engine on top of a PPC host and
> >> would like to expand its modularity to additional languages and
> >> targets.
> >>
> >> If at all possible I would like to make the equivalent "machine level"
> >> interpretation of the opcode numbers possible even if there is inline
> >> data and addresses present as well.
> >>
> >> With having no prior experience with Smalltalk any usage of terms I
> >> know in a different will won't make any sense initially and trying to
> >> get to grips with Smalltalk by using the Environment ... I already
> >> tried this unsuccessfully.
> >>
> >> I'm more interested in the number codes that each operation is
> >> represented by and making routines to match within set ranges,  and
> >> where one operation is multiple codes chained,  being able to have a
> >> listing starting with 0x00 is opcode "somename" and has N octets of
> >> immediate values following it formatted as ?:? bitstrings.
> >>
> >> If that makes any sense?
> >>
> >> as for stack or message information,  I'm willing to work out what is
> >> needed to make those happen if they are needed as bytecode level
> >> information.
> >>
> >> On Tue, Apr 17, 2012 at 4:20 AM, Colin Putney <colin at wiresong.com>
> wrote:
> >> >
> >> >
> >> > On 2012-04-16, at 8:14 AM, Jeremy Kajikawa wrote:
> >> >
> >> > I am somewhat dogmatically minded about technical details,  so I am
> >> > unlikely to wade through buckets of documentation about Smalltalk as a
> >> > language and how to use it if it is not answering the question about
> >> > what I am looking up.
> >> >
> >> >
> >> > I'm confused. You want to implement a Smalltalk interpreter, but
> you're not interested in the details of the language? Perhaps you should
> tell us what your overall goal is. That way we can provide more useful
> information.
> >> >
> >> > As for documentation of the bytecode set, you may find the Blue Book
> useful. It's the canonical description of how Smalltalk works, including
> the interpreter. Squeak is a descendant of this implementation. The section
> on the interpreter is here:
> >> >
> >> >
> http://www.mirandabanda.org/bluebook/bluebook_chapter28.html#StackBytecodes28
> >> >
> >> > Hope this helps,
> >> >
> >> > Colin
> >> >
> >> > PS. Since this has nothing to do with Ubuntu, I've changed the
> subject to something more appropriate
> >> >
> >
> >
> >
> >
> > --
> > best,
> > Eliot
> >
> >
>



-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20120416/2188714b/attachment.htm


More information about the Vm-dev mailing list