[squeak-dev] Re: Making a better Compiler for all

Eliot Miranda eliot.miranda at gmail.com
Sun Sep 7 22:54:57 UTC 2008


On Sun, Sep 7, 2008 at 3:39 PM, Kjell Godo <squeaklist at gmail.com> wrote:

> Here is an idea that might increase modularity in the compiler and VM
> and might encourage multiple compilers and VMs:
>
> Instead of having the Compiler dump byte codes directly into a ByteArray
> you have a SmalltalkByteCodeStream which displays all the different
> functionalities of the VM in a systematic way.
>
>      ***********************
> Each byte code instruction that the VM recognizes would be expressed
> as a SmalltalkByteCodeStream method that encodes that instruction.
>

This is what I've done in my Closure compiler.  There are different
back-ends for different bytecode encodings.  Subclasses of BytecodeEncoder
decide how to encode abstract opcodes such as PushReceiver ReturnTop etc.
 These classes implement two methods for each opcode, sizeFoo and genFoo,
e.g. sizePushReceiver and genPushReceiver.

I have four encoders.  One for the current V3 instruction set, and one for
its long-form subset.  One for the V3 set extended with 5 closure opcodes,
and one for its long-form subset.  I can generate code for any of these
simply by initializing the compiler with the relevant encoder.

The code is available on my blog (as is a description of the compiler).



>
>      ***********************
>
> In this way the functionality of the VM becomes self documenting.
>
> In order to find out the public functionality of the VM you would look
> at the public methods in the SmalltalkByteCodeStream.  These
> methods should be well documented with comments after the body
> of the methods.
>
> The byte code instructions that the VM recognizes become a well
> documented public set of instructions that compiler makers can
> make their compilers on.  These instructions become the VM's
> public language.
>
> Each VM would have its own SmalltalkByteCodeStream so you
> might name them SmalltalkByteCodeStreamForVM1 and
> SmalltalkByteCodeStreamForVM2 etc.
>
> So like if you were
> encoding an ifTrue:ifFalse: expression in a naive way you could do
> it like:
>
> aSmalltalkByteCodeStreamForVM1
>      nextPutAll: aBooleanExpression ;
>      nextIfStackTopTrueSkipNextIfStackTopFalse ;
>      skipNext:( SmalltalkByteCodeStream jumpSize ) ;
>      yourselfDo:[ :sbcs | jumpToFalseMarker := sbcs makeSpaceForJump ] ;
>      nextPutAll: ifTrueBranchExpression ;
>      yourselfDo:[ :sbcs | jumpToExitMarker := sbcs makeSpaceForJump ] ;
>      yourselfDo:[ :sbcs | jumpToFalseMarker fillInCodeFor: sbcs position ]
> ;
>      nextPutAll: ifFalseBranchExpression
>      yourselfDo:[ :sbcs | jumpToExitMarker fillInCodeFor: sbcs position ]
>
> Object>>yourselfDo: aBlock
>    aBlock value: self . ^self .
>
> SmalltalkByteCodeStream>>nextPutAll: aByteCodeGenerator
>      aByteCodeGenerator generateByteCodesOn: self
>
> SmalltalkByteCodeStream>>makeSpaceForJump
>      ^( JumpInstructionMarker new
>           position: self position
>           on: self ; yourself
>      ) yourselfDo:[ :na |
>           self position:(
>                ( self position ) +
>                ( SmalltalkByteCodeStream jumpSize ) ) ]
>
> JumpInstructionMarker>>fillInCodeFor: expressionPosition
>      | oldPosition |
>      oldPosition := stream position .
>      stream position: jumpInstructionPosition ;
>                 nextPutJumpByteCode ;
>                 intoNext: ( SmalltalkByteCodeStream jumpSize ) -
>                               ( SmalltalkByteCodeStream jumpByteCodeSize )
>                     putInteger: expressionPosition ;
>                 position: oldPosition
>      "<---( stream is a WriteStream on a ByteArray or something )"
> On Mon, Sep 1, 2008 at 8:14 PM, David Zmick <dz0004455 at gmail.com> wrote:
>
>> So, here is an idea, start the VM from scratch, and, redo the entire
>> project to allow what we want in Squeak, and the compiler.  I know that is a
>> really crazy idea, but I think it could be possible.  I have been thinking
>> about a couple of very unlikely, but, possible, maybe, VM ideas, but, what
>> do you guys think about that?
>>
>
>
> I think that the new VM2 should be right next to the old VM1
> in the running image.  So you could use the old VM1 to make the
> new VM2.
>
> Each VM can have multiple name spaces.  Each VM generates
> a VM space with multiple name spaces inside of it.  Each name
> space could have its own Object Class and Class hierarchy.
> Or you could have a hierarchy being run by VM1 and switch it
> over to be run by VM2.  And back and forth at runtime.
>
> Perhaps the CPU could be thought of as a big VM and then the
> SmalltalkByteCodeStreamForCPU would generate machine code
> into the ByteArray that the SmalltalkByteCodeStreamForCPU
> was on.  And that ByteArray would be stuck into the
> CompiledMethodForCPU.
>
> It would be cool if each VM was an Object and you could do
> things like:
>
> ( VM2 inImage: anImage
>           inNameSpace: aNameSpace
>           usingCompiler: aCompiler
>           eval:'[ someSmalltalkCode ]' )
>
> In that way the old VM could call up the new VM and have it
> evaluate some code.  And when that code was done if ever then
> the old VM would continue from there.  Or the new Image could
> fork into a new thread. etc.
>
> You would want the debugging to be able to step into this
> expression so that you could really see how the VM2 works.
>
> ( VM2 simulation
>      inImage: anImage
>      inNameSpace: aNameSpace
>      usingCompiler: aCompiler
>      eval: '[ someSmalltalkCode ]' )
>
> would allow you to see the byte codes being evaluated before
> your very eyes.  And then the simulation is translated into
> C or machine code to make VM2.  It would be cool if Squeak
> had a portable assembler in it so you didn't have to use C at
> all.  And that portable Assembler could be
> SmalltalkByteCodeStreamForCPUIA32
> SmalltalkByteCodeStreamForCPUIA64
> etc.  Instead of the traditional archain neumonics used in
> assemblers we could use Smalltalk messages instead to
> generate that machine code.
> The above expression would allow you to see an image being
> loaded up and a name space within that image being selected
> and a Compiler being used to compile '[ someSmalltalk ]'
> and then being able to see Smalltalk expressions being
> evaluated in the debugger in that image and name space
> on that VM.  And when you hop into a message send then
> the byte code debugger would move to the front of the screen
> and show the byte codes being executed if desired.  It would
> be cool if there was a machine code debugger so you could
> hop into a byte code instruction and see how it is being
> evaluated.  It would interpret what was in the registers and
> what was on the stack as Objects.  There would be inspectors.
>
> Hopefully this kind of thing would allow multiple VMs and
> multiple images and multiple compilers all to be running at
> the same time.  Hopefully it would encourage VM development
> and compiler development such that Squeak could branch
> out in all different ways.
>
> You could have SmalltalkByteCodeStreamForV8 which
> would make public the functionality of the V8 Java VM.
> And then you could have the V8 VM be one of the VMs
> inside of Squeak.
>
> You can switch from VM to VM at runtime.
>
> You can use the old VMs to make a new one.
>
> There are Smalltalk debuggers and byte code debuggers and
> machine code debuggers.
>
> There is the traditional Squeak VM and there are platform
> specific VMs that can all run side by side.  There are
> multiple different Windowing systems all running side by
> side.  Some native and some not.  Some the old Squeak
> way and some the Dolphin way some the Java way. etc.
>
> Squeak's portable assembler
> SmalltalkByteCodeStreamForCPU can be used to
> output an executable file that has zero or more VMs inside
> of it into a Directory on disk with zero or more image files
> and souce code files for the different name spaces and
> hierarchies.  Then you fire up that executable and those
> VMs are inside of it.
>
> It would be cool if there was a PEFileStream that could
> be used to make public all the sections inside of a
> PE format executable file.  With a sequence of tests
> going from simple to complex and lots of documentation.
>
> I do think that there should be a new VM and it should
> run right alongside of the old one and be the first of many.
>
>
>>   On Mon, Sep 1, 2008 at 7:56 PM, Igor Stasenko <siguctua at gmail.com>wrote:
>>
>>> 2008/9/2 Kjell Godo <squeaklist at gmail.com>:
>>> > Where is this new compiler project?  Where is NewCompiler?  I would
>>> like to
>>> > see it.
>>> > Does anybody know where that book about the Squeak Compiler went to?
>>> >
>>> > the rest down below is all nonsense and I wouldn't read it if I were
>>> you.
>>> >
>>> > i knew this was going to cost me.
>>> >
>>> > What is atomic loading?  Does it mean no dependencies or dependencies
>>> are
>>> > handled?
>>> > It seems to me that there needs to be some kind of intellegent
>>> dependencies
>>> > manager that works a lot better and is a lot smarter than what has been
>>> put
>>> > out there so far.
>>> >
>>>
>>> The atomic loading is not about handling dependencies (they are
>>> present and adressed as well, of course), but about installing a
>>> number of changes in system replacing old behavior in a single short
>>> operation, which can guarantee a safety from old/new behaviour
>>> conflicts.
>>>
>>> > How can I learn about how a good Squeak compiler works?  Without years
>>> and
>>> > millions of dead hours?
>>> >
>>>
>>> Sure, you need some experience in compiling things, especially
>>> smalltalk. Or , at least , if you even don't have such experience, but
>>> using Parser/Compiler for own purposes, your experience is valuable as
>>> well, since you can highlight different problems or propose better
>>> interface or new functionality.
>>>
>>> > Modularity is very good.  I think that all of Squeak should be very
>>> self
>>> > explaining.  This can be done if you put your explanations of what is
>>> going
>>> > on after the body of the method.  Colored source is good too.  See
>>> Dolphin.
>>> > But without reformating.
>>> >
>>> > I am making picoLARC on sourceforge.net.  Each lisp/smalltalk
>>> expression
>>> > gets compiled by an instance of a Compiler Class.  Each expression( let
>>> if
>>> > define etc ) has its own KEGLambdaLispCompiler subClass with one
>>> > standard method and zero or more helper methods in it.  Each Compiler
>>> > outputs an instance of a subClass of the Eval Class.  An Eval can be
>>> > evaluated at runtime by >>evalWithActivationRec: or it could generate
>>> byte
>>> > codes which do the same thing via some method like
>>> > EvalSubClass>>generateByteCodesOn:usingCodeWalker: where the CodeWalker
>>> > could tie Evals together or do optimizations?  Is this not a good
>>> design?  I
>>> > know I like the part about one Compiler Class for each expression and
>>> one
>>> > corresponding Eval Class.  But I haven't done any byte code generation
>>> yet
>>> > so I don't know about that part.  One Compiler per Eval is not strict.
>>>  The
>>> > ApplicationCompiler can output several different related kinds of Evals
>>> for
>>> > the different function calls and message sends.
>>> >
>>> > What is this visitor pattern?
>>>
>>> http://en.wikipedia.org/wiki/Visitor_pattern
>>>
>>> >  I don't like the idea of putting byte code
>>> > generation into a single Class.  But I feel like maybe I don't know
>>> what I'm
>>> > talking about.  To modify the byte code generation for an expression
>>> you
>>> > would subClass the Eval Class and modify the
>>> >>>generateByteCodeOn: aCodeStream.  The initial implementor would try to
>>> >>> seperate out the parts that might be modified by someone into
>>> seperate
>>> >>> methods that get called by
>>> >>>generateByteCodeOn: so these helper methods would generally be
>>> overridden
>>> >>> and not
>>> >>>generateByteCodeOn: unless that method was really simple.  So the
>>> initial
>>> >>> implementor has to think about reuse and the places where
>>> modification might
>>> >>> occure.  So you would have a lot of simple
>>> >>>generateByteCodeOn: methods instead of one big complex one.
>>> >
>>> > There are all different ways of calling a function or method or query
>>> etc in
>>> > picoLARC and these are all subClasses of
>>> KEGLambdaLispApplicationEvalV6p1
>>> > and it seems to work fine.
>>> >
>>> > But overriding >>generateByteCodesOn: is not good enough is it?  The
>>> > Compiler Classes can't have hard coded Eval instance creations either
>>> > right?  The Compiler Class has to be subClassed also and the
>>>
>>> The problem in Squeak compiler that you will need to override much
>>> more classes than just Compiler to emit different bytecode, for
>>> instance.
>>>
>>> >>>meaningOf:inEnviron: needs to have a
>>> > ( self createEval ) expression in it that could be subClassed and
>>> > overridden.  And then that subClass has to be easily inserted into the
>>> > expression dispatch table that pairs up expressions with expression
>>> > Compilers.  So when that table gets made there should be a
>>> > ( tableModifier modify: table ) which could stick the < expression
>>> Compiler
>>> >> pairs in that are needed.
>>> >
>>> > I think that is all that would be required to modify the compilation of
>>> an
>>> > expression.
>>> > I will have to make these changes to picoLARC so it will be more
>>> modifiable.
>>> >
>>> > I think the Compiler should be very modular.  For picoLARC one Class
>>> per
>>> > expression and one Class per Eval seems to work good.  Stuffing lots of
>>> > seperate things into a single Class and doing a procedural functional
>>> thing
>>> > and not an OOP thing does not seem good to me.
>>> >
>>> > I think that the Compiler should be very clean and a best practices
>>> example
>>> > with a long comment at the bottom of each method telling all about what
>>> it
>>> > does.  Writing it out and referencing other related methods helps to
>>> think
>>> > about what is really going on and then a better design without hacks
>>> comes
>>> > out.  I don't think hacking should be encouraged at all.  Hacking just
>>> makes
>>> > a mess.
>>> >
>>> +1
>>>
>>> The design should allow replacing critical parts of compiler by
>>> subclassing without the need in modifying original classes.
>>> A so-called extensions , or monkey patching is very bad practice which
>>> in straightly opposite direction from modularity.
>>>
>>> I thinking, maybe at some point, to prevent monkey patching, a
>>> deployed classes can disallow installing or modifying their methods.
>>>
>>>
>>> > And then this practice of not making any Package comments has got to
>>> stop.
>>> > I think that people who do that should be admonished in some way.  I
>>> think
>>> > that the Package comment for the Compiler should contain the design
>>> document
>>> > for it that tells all about how it is designed.  If it needs to be long
>>> then
>>> > it should be long.  It should include: How to understand the Compiler.
>>> > There should be a sequence of test cases that start simple and show how
>>> it
>>> > all works.
>>> >
>>> > And that should go for the VM too.  This idea that the VM can be opaque
>>> and
>>> > only recognizable to a few is not good.
>>> >
>>>
>>> VM tend to be complex. And complexity comes from inability of our
>>> hardware/OS work in a ways how we need/want it.
>>>
>>> > These should be works of art and not hacked up piles of rubbish to be
>>> hidden
>>> > away into obscurity.
>>> >
>>> > There is this idea that one should only care about what something does.
>>>  And
>>> > the insides of it are a random black box that you tweek and pray on.
>>>  But I
>>> > think that the insides should be shown to the world.  They should
>>> > be displayed on a backdrop of velvet.  Especially the Compiler and VM
>>> and VM
>>> > maker.  And then the whole Windowing thing should be modularized so you
>>> can
>>> > have multiple different Windowing systems.
>>> >
>>> > And what about having multiple VMs?  It would be cool if picoLARC could
>>> be
>>> > inside of Squeak in that way.  It would be cool if one VM was
>>> generalized so
>>> > that it could support different dialects and languages.  And another
>>> was
>>> > specific and fast.  And you could make various kinds of VMs and images
>>> and
>>> > output them onto disk without a lot of trouble.  It would come with gcc
>>> and
>>> > all that junk all set up so it would just work.  If you already had gcc
>>> you
>>> > could tell it not to download it.
>>> >
>>>
>>> What is gcc? And why it required to make VM? ;)
>>>
>>> > picoLARC has simple name spaces called Nodules where you can have
>>> Nodules
>>> > inside of Nodules and Nodules can multiply inherit variables from
>>> others.
>>> > Maybe such a thing could be used in Squeak?  Then you could have
>>> multiple
>>> > VMs.  And VMs inside of VMs.
>>> >
>>> > I think that Dolphin Smalltalk could be held up as an example of
>>> pretty.
>>> >
>>>
>>> Maybe, if you know how to deal with license & copyrights when taking
>>> their source and blindly putting it to Squeak :)
>>>
>>> > I hope picoLARC will be another one.
>>> >
>>> > I think that Squeak is pretty in a somewhat cancerous sort of way.
>>> > The cancer is all the hacking.  That goes on.
>>> > The vision is great but the hacking and undocumenting gum up all those
>>> big
>>> > ideas.
>>> >
>>> > Sure it's quick but it rots away quickly too.
>>> >
>>> > Undocumented features.  In Smalltalk this is less of a problem but in
>>> like
>>> > Lisp say you make this great feature but then don't document it.  You
>>> might
>>> > as well have not even made it.
>>> >
>>>
>>>
>>>
>>> --
>>>  Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>>
>>
>>
>> --
>> David Zmick
>> /dz0004455\
>> http://dz0004455.googlepages.com
>> http://dz0004455.blogspot.com
>>
>>
>>
>>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20080907/b48c161a/attachment.htm


More information about the Squeak-dev mailing list