[Vm-dev] Bytecode set (was Ubuntu unit issues)

Jeremy Kajikawa jeremy.kajikawa at gmail.com
Tue Apr 17 06:22:12 UTC 2012


Further details then...

On Tue, Apr 17, 2012 at 7:57 AM, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>
> Hi Jeremy,
>
>    I'll try one more time :)
>
> On Mon, Apr 16, 2012 at 11:17 AM, Jeremy Kajikawa <jeremy.kajikawa at gmail.com> wrote:
>>
>>
>> It would be nice to actually see actual formatted data without the
>> language specific semantics,
>>
>> since I am writing the VM itself
>>  (there is no other VM on the target so any requirement to run an
>> existing SqueakVM ...)
>>
>> nice to know that tidbit of detail was dutifully ignored...
>
>
> I didn't ignore it.  I wasn't assuming you would use the existing system for other than reading code and exploring the system.  I understand you want to do a clean-room VM for the target (and good luck, that's a fun project, one I cut my teeth on many years ago now).  You're sending email, therefore you probably have access to a system that can run Squeak and Pharo and can hence run the system and explore it so as to educate yourself on what's involved in a Smalltalk VM implementation.  IMO the best source of specification and documentation on the bytecode set is in the image in the classes I mentioned.
>

Yes I am clean-room building... and not dedicated to a single target.

>> Glad to see that the opcode values are presented... but you skipped
>> the formatting of the objects themselves,  or is this left as an
>> exercise for the reader?
>
>
> I was trying to direct your attention to specification of the opcodes.  I can point you to implementation of the image format, and hence the object format, but not to specification. But I think it is important you understand the bytecode and VM semantics if you don't want to waste time on implementation details.  Only by understanding the semantics will you have any idea of how complex the implementation is.  Send semantics and context semantics are much much more complex than conventional processor opcode semantics.
>

I'm working from Classical Semantics and need to know the differences,
 specifically so I can deal with them.  the other targets I am dealing
with are Classical Semantics based... so Smalltalk has to be "fit in"
with the rest without majorly redesigning the system just to deal with
it (the system I am working on has proven its speed and stability with
the existing design details for everyday reliable use)

>> I am NOT a classically trained programmer in that I am self-taught for
>> the most part...
>>  and I am trying to phrase my question the best I know how.
>>
>> use of Pharo or Squeak VirtualMachines **IS*NOT** available, no if but
>> maybe or otherwise.
>
>
> So what systems are you using to send email?

Same system, using the Gmail web interface.

>>
>> so without my actually implimenting the VM itself to make the language
>> available it simply won't happen.
>>
>> so... is there a defined list where the format of the instructions and
>> any attached description where strings of one or more octets are
>> modified or moved?
>
>
> To my knowledge the only up-to-date info is in the image and in the VMMaker package which is difficult to read other than using the running system.
>
Would that not break "Clean room" building the VM system?

>>
>> or am I trapped in the need to learn smalltalk itself to understand
>> the semantics of any answers given on this list?
>>
>> Use of smalltalk to actually produce smalltalk is not an option at this point.
>
>
> Why not?  You have no linux, Mac or Windows systems available to you?

Available, yes, capable?  that I question.
As they have enough trouble running the OS they are installed with for
defaults as far as I am concerned.

>>
>>
>> The target host is neither posix, win32 nor bare-metal.
>
>
> But you don't have to run the system on the target host to explore it do you?
>
No I don't... I need documentation on the format and specific handling
of the opcodes in a Classical sense.

>>
>> the first programming language I learned was C based on K&R book materials...
>>  however... I have only ever used the Host OS routines and never
>> relied on the std c library
>
>
> I find this hard to believe.  You've never used printf?
>
not once, ever,  I have always had other options and been able to step
through iterating over each code modification.

I have tried to deal with the Smalltalk system within OpenCobalt,
however found this to have its own maze of semantics and the
documentation I find no help in answering any questions I have as it
is.

I will just have to find a reasonably up to date Linux kernel build
and LinuxFromScratch for Dual-Booting Linux in addition to Amiga OS
4.x

I was hoping to sort out registers, opcodes and data formatting so
that I can properly map the operations on a classical single processor
system.

as for the FPGA,  I do have the option of pusing material from being
processed by the PPC onboard to the FPGA once an applicable program is
written for it.

I'll keep these Emails and try to work out what the actual operational
changes are
Currently Smalltalk appears as a black box and I am hoping to at least
work out getting a basic flat memory model working with classical
opcodes to Emulate other processors

>>
>> Thank you at least for trying to answer the request,
>
>
> you're welcome.
>
>>
>>
>> ジェレミー
>>
>> Beware of Assumptions,  the Hee then Haw before kicking your head in...
>>
>> On Tue, Apr 17, 2012 at 5:32 AM, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>> >
>> > Jeremy,
>> >
>> >     Smalltalk-80 (and Squeak) opcodes are for a spaghetti stack machine where each activation is a separate object.  These activation objects are called contexts, and chain together thorugh the sender field.  Each context has a fixed size stack (in Squeak there are small and large contexts, maximum size 52 stack slots).  Each activation holds onto a compiled method which is a vector of literal objects and a vector of bytecodes.  In Squeak and Smalltalk-80 these two vectors are encoded in a single flat object, half references to other objects (literals) half bytes (opcodes).  Since both contexts and compiled methods are objects the system implements its compiler and meta-level interpreter in Smalltalk itself, which require a real machine (the virtual machine) to execute.  If you run a Squeak or Pharo system you will be able to browse the classes that implement the compiler and the meta-level interpreter.  In particular:
>> >
>> > The classes EncoderForV3 & EncoderForV3PlusClosures implement the back-end of the compiler, generating concrete opcodes for abstract bytecodes such as pushReceiver: send:numArgs: etc.
>> > Instances of class CompiledMethod are generated by the compiler (see MethodNode>generate:using:) using an instance of EncoderForV3PlusClosures.
>> >
>> > The class InstructionClient defines all the abstract opcodes for the current V3 plus closures instruction set.
>> > The class InstructionStream decodes/interprets CompiledMethod instances, dispatching sends of the messages understood by InstructionClient to itself.  InstructionStream has several subclasses which respond to the seds of the opcodes in different ways.
>> >
>> > Most importantly ContextPart and its subclass MethodContext implement the InstructionClient api by simulating execution.  Hence ContextPart and MethodContext provide a specification in Smalltalk of the semantics of the bytecodes.  EncoderForV3 & EncoderForV3PlusClosures serve as a convenient reference for opcode encodings, and are well-commented.
>> >
>> > By the way InstructionClient's subclass InstructionPrinter responds to the api by disassembling a compiled method, hence aCompiledMethod symbolic prints opcodes, e.g.
>> > (Object >> #printOn:) symbolic evaluates to the string
>> > '37 <70> self
>> > 38 <C7> send: class
>> > 39 <D0> send: name
>> > 40 <69> popIntoTemp: 1
>> > 41 <10> pushTemp: 0
>> > 42 <88> dup
>> > 43 <11> pushTemp: 1
>> > 44 <D5> send: first
>> > 45 <D4> send: isVowel
>> > 46 <99> jumpFalse: 49
>> > 47 <23> pushConstant: ''an ''
>> > 48 <90> jumpTo: 50
>> > 49 <22> pushConstant: ''a ''
>> > 50 <E1> send: nextPutAll:
>> > 51 <87> pop
>> > 52 <11> pushTemp: 1
>> > 53 <E1> send: nextPutAll:
>> > 54 <87> pop
>> > 55 <78> returnSelf
>> > '
>> >
>> >
>> > and InstructionStream's subclass Decompiler implements the api by reconstructing a compiler parse tree for the compiled method, so e.g.
>> > (Object >> #printOn:) decompile prints as
>> > printOn: t1
>> > | t2 |
>> > t2 := self class name.
>> > t1
>> > nextPutAll: (t2 first isVowel
>> > ifTrue: ['an ']
>> > ifFalse: ['a ']);
>> >  nextPutAll: t2
>> > whereas the source code for the same method ((Object >> #printOn:) getSourceFromFile) evaluates to a Text for
>> > 'printOn: aStream
>> > "Append to the argument, aStream, a sequence of characters that
>> > identifies the receiver."
>> >
>> > | title |
>> > title := self class name.
>> > aStream
>> > nextPutAll: (title first isVowel ifTrue: [''an ''] ifFalse: [''a '']);
>> > nextPutAll: title'
>> >
>> > So if you want to find a current, comprehensible specification of the Squeak/Pharo opcode set I recommend browsing EncoderForV3, EncoderForV3PlusClosures, InstructionClient, InstructionStream, ContextPart MethodContext.  Further, I recommend exploring existing CompiledMethod instances using doits such as
>> >
>> >     SystemNavigation new browseAllSelect: [:m| m scanFor: 137]
>> >
>> > HTH
>> > Eliot
>>
>> > On Mon, Apr 16, 2012 at 10:03 AM, Jeremy Kajikawa <jeremy.kajikawa at gmail.com> wrote:
>> >>
>> >>
>> >> Colin: thanks... something like that... just trying to work out the
>> >> octet numbers and formatting for what data goes where.
>> >>
>> >> as I trying to encode this at assembler level where each opcode value
>> >> has a specific routine that is called from a opCodeVector JumpTable
>> >>
>> >> Each Entry in the JumpTable is directly executed by the processor with
>> >> a second JumpTable encoded similarly for basic microcode Read/Write
>> >> functions to deal with various standard DataTypes in fixed formats
>> >>
>> >> this is to plug into the generic Interpreter engine I already have.
>> >>
>> >> the first test of this was to Emulate an Intel 80486 on a Motorola
>> >> 68040 processor with the Host running at 25MHz.
>> >>
>> >> I managed to get an average speed rating of between 16MHz to 20MHz
>> >> performance even with "real world" code being run through
>> >>
>> >> I am currently re-implimenting this engine on top of a PPC host and
>> >> would like to expand its modularity to additional languages and
>> >> targets.
>> >>
>> >> If at all possible I would like to make the equivalent "machine level"
>> >> interpretation of the opcode numbers possible even if there is inline
>> >> data and addresses present as well.
>> >>
>> >> With having no prior experience with Smalltalk any usage of terms I
>> >> know in a different will won't make any sense initially and trying to
>> >> get to grips with Smalltalk by using the Environment ... I already
>> >> tried this unsuccessfully.
>> >>
>> >> I'm more interested in the number codes that each operation is
>> >> represented by and making routines to match within set ranges,  and
>> >> where one operation is multiple codes chained,  being able to have a
>> >> listing starting with 0x00 is opcode "somename" and has N octets of
>> >> immediate values following it formatted as ?:? bitstrings.
>> >>
>> >> If that makes any sense?
>> >>
>> >> as for stack or message information,  I'm willing to work out what is
>> >> needed to make those happen if they are needed as bytecode level
>> >> information.
>> >>
>> >> On Tue, Apr 17, 2012 at 4:20 AM, Colin Putney <colin at wiresong.com> wrote:
>> >> >
>> >> >
>> >> > On 2012-04-16, at 8:14 AM, Jeremy Kajikawa wrote:
>> >> >
>> >> > I am somewhat dogmatically minded about technical details,  so I am
>> >> > unlikely to wade through buckets of documentation about Smalltalk as a
>> >> > language and how to use it if it is not answering the question about
>> >> > what I am looking up.
>> >> >
>> >> >
>> >> > I'm confused. You want to implement a Smalltalk interpreter, but you're not interested in the details of the language? Perhaps you should tell us what your overall goal is. That way we can provide more useful information.
>> >> >
>> >> > As for documentation of the bytecode set, you may find the Blue Book useful. It's the canonical description of how Smalltalk works, including the interpreter. Squeak is a descendant of this implementation. The section on the interpreter is here:
>> >> >
>> >> > http://www.mirandabanda.org/bluebook/bluebook_chapter28.html#StackBytecodes28
>> >> >
>> >> > Hope this helps,
>> >> >
>> >> > Colin
>> >> >
>> >> > PS. Since this has nothing to do with Ubuntu, I've changed the subject to something more appropriate
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > best,
>> > Eliot
>> >
>> >
>
>
>
>
> --
> best,
> Eliot
>
>


More information about the Vm-dev mailing list