[squeak-dev] Re: Making a better Compiler for all

Kjell Godo squeaklist at gmail.com
Sun Sep 7 22:39:20 UTC 2008


Here is an idea that might increase modularity in the compiler and VM
and might encourage multiple compilers and VMs:

Instead of having the Compiler dump byte codes directly into a ByteArray
you have a SmalltalkByteCodeStream which displays all the different
functionalities of the VM in a systematic way.

     ***********************
Each byte code instruction that the VM recognizes would be expressed
as a SmalltalkByteCodeStream method that encodes that instruction.

     ***********************

In this way the functionality of the VM becomes self documenting.

In order to find out the public functionality of the VM you would look
at the public methods in the SmalltalkByteCodeStream.  These
methods should be well documented with comments after the body
of the methods.

The byte code instructions that the VM recognizes become a well
documented public set of instructions that compiler makers can
make their compilers on.  These instructions become the VM's
public language.

Each VM would have its own SmalltalkByteCodeStream so you
might name them SmalltalkByteCodeStreamForVM1 and
SmalltalkByteCodeStreamForVM2 etc.

So like if you were
encoding an ifTrue:ifFalse: expression in a naive way you could do
it like:

aSmalltalkByteCodeStreamForVM1
     nextPutAll: aBooleanExpression ;
     nextIfStackTopTrueSkipNextIfStackTopFalse ;
     skipNext:( SmalltalkByteCodeStream jumpSize ) ;
     yourselfDo:[ :sbcs | jumpToFalseMarker := sbcs makeSpaceForJump ] ;
     nextPutAll: ifTrueBranchExpression ;
     yourselfDo:[ :sbcs | jumpToExitMarker := sbcs makeSpaceForJump ] ;
     yourselfDo:[ :sbcs | jumpToFalseMarker fillInCodeFor: sbcs position ] ;
     nextPutAll: ifFalseBranchExpression
     yourselfDo:[ :sbcs | jumpToExitMarker fillInCodeFor: sbcs position ]

Object>>yourselfDo: aBlock
   aBlock value: self . ^self .

SmalltalkByteCodeStream>>nextPutAll: aByteCodeGenerator
     aByteCodeGenerator generateByteCodesOn: self

SmalltalkByteCodeStream>>makeSpaceForJump
     ^( JumpInstructionMarker new
          position: self position
          on: self ; yourself
     ) yourselfDo:[ :na |
          self position:(
               ( self position ) +
               ( SmalltalkByteCodeStream jumpSize ) ) ]

JumpInstructionMarker>>fillInCodeFor: expressionPosition
     | oldPosition |
     oldPosition := stream position .
     stream position: jumpInstructionPosition ;
                nextPutJumpByteCode ;
                intoNext: ( SmalltalkByteCodeStream jumpSize ) -
                              ( SmalltalkByteCodeStream jumpByteCodeSize )
                    putInteger: expressionPosition ;
                position: oldPosition
     "<---( stream is a WriteStream on a ByteArray or something )"
On Mon, Sep 1, 2008 at 8:14 PM, David Zmick <dz0004455 at gmail.com> wrote:

> So, here is an idea, start the VM from scratch, and, redo the entire
> project to allow what we want in Squeak, and the compiler.  I know that is a
> really crazy idea, but I think it could be possible.  I have been thinking
> about a couple of very unlikely, but, possible, maybe, VM ideas, but, what
> do you guys think about that?
>


I think that the new VM2 should be right next to the old VM1
in the running image.  So you could use the old VM1 to make the
new VM2.

Each VM can have multiple name spaces.  Each VM generates
a VM space with multiple name spaces inside of it.  Each name
space could have its own Object Class and Class hierarchy.
Or you could have a hierarchy being run by VM1 and switch it
over to be run by VM2.  And back and forth at runtime.

Perhaps the CPU could be thought of as a big VM and then the
SmalltalkByteCodeStreamForCPU would generate machine code
into the ByteArray that the SmalltalkByteCodeStreamForCPU
was on.  And that ByteArray would be stuck into the
CompiledMethodForCPU.

It would be cool if each VM was an Object and you could do
things like:

( VM2 inImage: anImage
          inNameSpace: aNameSpace
          usingCompiler: aCompiler
          eval:'[ someSmalltalkCode ]' )

In that way the old VM could call up the new VM and have it
evaluate some code.  And when that code was done if ever then
the old VM would continue from there.  Or the new Image could
fork into a new thread. etc.

You would want the debugging to be able to step into this
expression so that you could really see how the VM2 works.

( VM2 simulation
     inImage: anImage
     inNameSpace: aNameSpace
     usingCompiler: aCompiler
     eval: '[ someSmalltalkCode ]' )

would allow you to see the byte codes being evaluated before
your very eyes.  And then the simulation is translated into
C or machine code to make VM2.  It would be cool if Squeak
had a portable assembler in it so you didn't have to use C at
all.  And that portable Assembler could be
SmalltalkByteCodeStreamForCPUIA32
SmalltalkByteCodeStreamForCPUIA64
etc.  Instead of the traditional archain neumonics used in
assemblers we could use Smalltalk messages instead to
generate that machine code.
The above expression would allow you to see an image being
loaded up and a name space within that image being selected
and a Compiler being used to compile '[ someSmalltalk ]'
and then being able to see Smalltalk expressions being
evaluated in the debugger in that image and name space
on that VM.  And when you hop into a message send then
the byte code debugger would move to the front of the screen
and show the byte codes being executed if desired.  It would
be cool if there was a machine code debugger so you could
hop into a byte code instruction and see how it is being
evaluated.  It would interpret what was in the registers and
what was on the stack as Objects.  There would be inspectors.

Hopefully this kind of thing would allow multiple VMs and
multiple images and multiple compilers all to be running at
the same time.  Hopefully it would encourage VM development
and compiler development such that Squeak could branch
out in all different ways.

You could have SmalltalkByteCodeStreamForV8 which
would make public the functionality of the V8 Java VM.
And then you could have the V8 VM be one of the VMs
inside of Squeak.

You can switch from VM to VM at runtime.

You can use the old VMs to make a new one.

There are Smalltalk debuggers and byte code debuggers and
machine code debuggers.

There is the traditional Squeak VM and there are platform
specific VMs that can all run side by side.  There are
multiple different Windowing systems all running side by
side.  Some native and some not.  Some the old Squeak
way and some the Dolphin way some the Java way. etc.

Squeak's portable assembler
SmalltalkByteCodeStreamForCPU can be used to
output an executable file that has zero or more VMs inside
of it into a Directory on disk with zero or more image files
and souce code files for the different name spaces and
hierarchies.  Then you fire up that executable and those
VMs are inside of it.

It would be cool if there was a PEFileStream that could
be used to make public all the sections inside of a
PE format executable file.  With a sequence of tests
going from simple to complex and lots of documentation.

I do think that there should be a new VM and it should
run right alongside of the old one and be the first of many.


>   On Mon, Sep 1, 2008 at 7:56 PM, Igor Stasenko <siguctua at gmail.com>wrote:
>
>> 2008/9/2 Kjell Godo <squeaklist at gmail.com>:
>> > Where is this new compiler project?  Where is NewCompiler?  I would like
>> to
>> > see it.
>> > Does anybody know where that book about the Squeak Compiler went to?
>> >
>> > the rest down below is all nonsense and I wouldn't read it if I were
>> you.
>> >
>> > i knew this was going to cost me.
>> >
>> > What is atomic loading?  Does it mean no dependencies or dependencies
>> are
>> > handled?
>> > It seems to me that there needs to be some kind of intellegent
>> dependencies
>> > manager that works a lot better and is a lot smarter than what has been
>> put
>> > out there so far.
>> >
>>
>> The atomic loading is not about handling dependencies (they are
>> present and adressed as well, of course), but about installing a
>> number of changes in system replacing old behavior in a single short
>> operation, which can guarantee a safety from old/new behaviour
>> conflicts.
>>
>> > How can I learn about how a good Squeak compiler works?  Without years
>> and
>> > millions of dead hours?
>> >
>>
>> Sure, you need some experience in compiling things, especially
>> smalltalk. Or , at least , if you even don't have such experience, but
>> using Parser/Compiler for own purposes, your experience is valuable as
>> well, since you can highlight different problems or propose better
>> interface or new functionality.
>>
>> > Modularity is very good.  I think that all of Squeak should be very self
>> > explaining.  This can be done if you put your explanations of what is
>> going
>> > on after the body of the method.  Colored source is good too.  See
>> Dolphin.
>> > But without reformating.
>> >
>> > I am making picoLARC on sourceforge.net.  Each lisp/smalltalk
>> expression
>> > gets compiled by an instance of a Compiler Class.  Each expression( let
>> if
>> > define etc ) has its own KEGLambdaLispCompiler subClass with one
>> > standard method and zero or more helper methods in it.  Each Compiler
>> > outputs an instance of a subClass of the Eval Class.  An Eval can be
>> > evaluated at runtime by >>evalWithActivationRec: or it could generate
>> byte
>> > codes which do the same thing via some method like
>> > EvalSubClass>>generateByteCodesOn:usingCodeWalker: where the CodeWalker
>> > could tie Evals together or do optimizations?  Is this not a good
>> design?  I
>> > know I like the part about one Compiler Class for each expression and
>> one
>> > corresponding Eval Class.  But I haven't done any byte code generation
>> yet
>> > so I don't know about that part.  One Compiler per Eval is not strict.
>>  The
>> > ApplicationCompiler can output several different related kinds of Evals
>> for
>> > the different function calls and message sends.
>> >
>> > What is this visitor pattern?
>>
>> http://en.wikipedia.org/wiki/Visitor_pattern
>>
>> >  I don't like the idea of putting byte code
>> > generation into a single Class.  But I feel like maybe I don't know what
>> I'm
>> > talking about.  To modify the byte code generation for an expression you
>> > would subClass the Eval Class and modify the
>> >>>generateByteCodeOn: aCodeStream.  The initial implementor would try to
>> >>> seperate out the parts that might be modified by someone into seperate
>> >>> methods that get called by
>> >>>generateByteCodeOn: so these helper methods would generally be
>> overridden
>> >>> and not
>> >>>generateByteCodeOn: unless that method was really simple.  So the
>> initial
>> >>> implementor has to think about reuse and the places where modification
>> might
>> >>> occure.  So you would have a lot of simple
>> >>>generateByteCodeOn: methods instead of one big complex one.
>> >
>> > There are all different ways of calling a function or method or query
>> etc in
>> > picoLARC and these are all subClasses of
>> KEGLambdaLispApplicationEvalV6p1
>> > and it seems to work fine.
>> >
>> > But overriding >>generateByteCodesOn: is not good enough is it?  The
>> > Compiler Classes can't have hard coded Eval instance creations either
>> > right?  The Compiler Class has to be subClassed also and the
>>
>> The problem in Squeak compiler that you will need to override much
>> more classes than just Compiler to emit different bytecode, for
>> instance.
>>
>> >>>meaningOf:inEnviron: needs to have a
>> > ( self createEval ) expression in it that could be subClassed and
>> > overridden.  And then that subClass has to be easily inserted into the
>> > expression dispatch table that pairs up expressions with expression
>> > Compilers.  So when that table gets made there should be a
>> > ( tableModifier modify: table ) which could stick the < expression
>> Compiler
>> >> pairs in that are needed.
>> >
>> > I think that is all that would be required to modify the compilation of
>> an
>> > expression.
>> > I will have to make these changes to picoLARC so it will be more
>> modifiable.
>> >
>> > I think the Compiler should be very modular.  For picoLARC one Class per
>> > expression and one Class per Eval seems to work good.  Stuffing lots of
>> > seperate things into a single Class and doing a procedural functional
>> thing
>> > and not an OOP thing does not seem good to me.
>> >
>> > I think that the Compiler should be very clean and a best practices
>> example
>> > with a long comment at the bottom of each method telling all about what
>> it
>> > does.  Writing it out and referencing other related methods helps to
>> think
>> > about what is really going on and then a better design without hacks
>> comes
>> > out.  I don't think hacking should be encouraged at all.  Hacking just
>> makes
>> > a mess.
>> >
>> +1
>>
>> The design should allow replacing critical parts of compiler by
>> subclassing without the need in modifying original classes.
>> A so-called extensions , or monkey patching is very bad practice which
>> in straightly opposite direction from modularity.
>>
>> I thinking, maybe at some point, to prevent monkey patching, a
>> deployed classes can disallow installing or modifying their methods.
>>
>>
>> > And then this practice of not making any Package comments has got to
>> stop.
>> > I think that people who do that should be admonished in some way.  I
>> think
>> > that the Package comment for the Compiler should contain the design
>> document
>> > for it that tells all about how it is designed.  If it needs to be long
>> then
>> > it should be long.  It should include: How to understand the Compiler.
>> > There should be a sequence of test cases that start simple and show how
>> it
>> > all works.
>> >
>> > And that should go for the VM too.  This idea that the VM can be opaque
>> and
>> > only recognizable to a few is not good.
>> >
>>
>> VM tend to be complex. And complexity comes from inability of our
>> hardware/OS work in a ways how we need/want it.
>>
>> > These should be works of art and not hacked up piles of rubbish to be
>> hidden
>> > away into obscurity.
>> >
>> > There is this idea that one should only care about what something does.
>>  And
>> > the insides of it are a random black box that you tweek and pray on.
>>  But I
>> > think that the insides should be shown to the world.  They should
>> > be displayed on a backdrop of velvet.  Especially the Compiler and VM
>> and VM
>> > maker.  And then the whole Windowing thing should be modularized so you
>> can
>> > have multiple different Windowing systems.
>> >
>> > And what about having multiple VMs?  It would be cool if picoLARC could
>> be
>> > inside of Squeak in that way.  It would be cool if one VM was
>> generalized so
>> > that it could support different dialects and languages.  And another was
>> > specific and fast.  And you could make various kinds of VMs and images
>> and
>> > output them onto disk without a lot of trouble.  It would come with gcc
>> and
>> > all that junk all set up so it would just work.  If you already had gcc
>> you
>> > could tell it not to download it.
>> >
>>
>> What is gcc? And why it required to make VM? ;)
>>
>> > picoLARC has simple name spaces called Nodules where you can have
>> Nodules
>> > inside of Nodules and Nodules can multiply inherit variables from
>> others.
>> > Maybe such a thing could be used in Squeak?  Then you could have
>> multiple
>> > VMs.  And VMs inside of VMs.
>> >
>> > I think that Dolphin Smalltalk could be held up as an example of pretty.
>> >
>>
>> Maybe, if you know how to deal with license & copyrights when taking
>> their source and blindly putting it to Squeak :)
>>
>> > I hope picoLARC will be another one.
>> >
>> > I think that Squeak is pretty in a somewhat cancerous sort of way.
>> > The cancer is all the hacking.  That goes on.
>> > The vision is great but the hacking and undocumenting gum up all those
>> big
>> > ideas.
>> >
>> > Sure it's quick but it rots away quickly too.
>> >
>> > Undocumented features.  In Smalltalk this is less of a problem but in
>> like
>> > Lisp say you make this great feature but then don't document it.  You
>> might
>> > as well have not even made it.
>> >
>>
>>
>>
>> --
>>  Best regards,
>> Igor Stasenko AKA sig.
>>
>>
>
>
> --
> David Zmick
> /dz0004455\
> http://dz0004455.googlepages.com
> http://dz0004455.blogspot.com
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20080907/36a12deb/attachment.htm


More information about the Squeak-dev mailing list