[squeak-dev] Re: Making a better Compiler for all

Mon Sep 8 01:02:48 UTC 2008

 I do think that there should be a new VM and it should
run right alongside of the old one and be the first of many.

You could easily write a new vm on top of the old one using OMeta.

On Sun, Sep 7, 2008 at 5:39 PM, Kjell Godo <squeaklist at gmail.com> wrote:

> Here is an idea that might increase modularity in the compiler and VM
> and might encourage multiple compilers and VMs:
>
> Instead of having the Compiler dump byte codes directly into a ByteArray
> you have a SmalltalkByteCodeStream which displays all the different
> functionalities of the VM in a systematic way.
>
>      ***********************
> Each byte code instruction that the VM recognizes would be expressed
> as a SmalltalkByteCodeStream method that encodes that instruction.
>
>      ***********************
>
> In this way the functionality of the VM becomes self documenting.
>
> In order to find out the public functionality of the VM you would look
> at the public methods in the SmalltalkByteCodeStream.  These
> methods should be well documented with comments after the body
> of the methods.
>
> The byte code instructions that the VM recognizes become a well
> documented public set of instructions that compiler makers can
> make their compilers on.  These instructions become the VM's
> public language.
>
> Each VM would have its own SmalltalkByteCodeStream so you
> might name them SmalltalkByteCodeStreamForVM1 and
> SmalltalkByteCodeStreamForVM2 etc.
>
> So like if you were
> encoding an ifTrue:ifFalse: expression in a naive way you could do
> it like:
>
> aSmalltalkByteCodeStreamForVM1
>      nextPutAll: aBooleanExpression ;
>      nextIfStackTopTrueSkipNextIfStackTopFalse ;
>      skipNext:( SmalltalkByteCodeStream jumpSize ) ;
>      yourselfDo:[ :sbcs | jumpToFalseMarker := sbcs makeSpaceForJump ] ;
>      nextPutAll: ifTrueBranchExpression ;
>      yourselfDo:[ :sbcs | jumpToExitMarker := sbcs makeSpaceForJump ] ;
>      yourselfDo:[ :sbcs | jumpToFalseMarker fillInCodeFor: sbcs position ]
> ;
>      nextPutAll: ifFalseBranchExpression
>      yourselfDo:[ :sbcs | jumpToExitMarker fillInCodeFor: sbcs position ]
>
> Object>>yourselfDo: aBlock
>    aBlock value: self . ^self .
>
> SmalltalkByteCodeStream>>nextPutAll: aByteCodeGenerator
>      aByteCodeGenerator generateByteCodesOn: self
>
> SmalltalkByteCodeStream>>makeSpaceForJump
>      ^( JumpInstructionMarker new
>           position: self position
>           on: self ; yourself
>      ) yourselfDo:[ :na |
>           self position:(
>                ( self position ) +
>                ( SmalltalkByteCodeStream jumpSize ) ) ]
>
> JumpInstructionMarker>>fillInCodeFor: expressionPosition
>      | oldPosition |
>      oldPosition := stream position .
>      stream position: jumpInstructionPosition ;
>                 nextPutJumpByteCode ;
>                 intoNext: ( SmalltalkByteCodeStream jumpSize ) -
>                               ( SmalltalkByteCodeStream jumpByteCodeSize )
>                     putInteger: expressionPosition ;
>                 position: oldPosition
>      "<---( stream is a WriteStream on a ByteArray or something )"
> On Mon, Sep 1, 2008 at 8:14 PM, David Zmick <dz0004455 at gmail.com> wrote:
>
>> So, here is an idea, start the VM from scratch, and, redo the entire
>> project to allow what we want in Squeak, and the compiler.  I know that is a
>> really crazy idea, but I think it could be possible.  I have been thinking
>> about a couple of very unlikely, but, possible, maybe, VM ideas, but, what
>> do you guys think about that?
>>
>
>
> I think that the new VM2 should be right next to the old VM1
> in the running image.  So you could use the old VM1 to make the
> new VM2.
>
> Each VM can have multiple name spaces.  Each VM generates
> a VM space with multiple name spaces inside of it.  Each name
> space could have its own Object Class and Class hierarchy.
> Or you could have a hierarchy being run by VM1 and switch it
> over to be run by VM2.  And back and forth at runtime.
>
> Perhaps the CPU could be thought of as a big VM and then the
> SmalltalkByteCodeStreamForCPU would generate machine code
> into the ByteArray that the SmalltalkByteCodeStreamForCPU
> was on.  And that ByteArray would be stuck into the
> CompiledMethodForCPU.
>
> It would be cool if each VM was an Object and you could do
> things like:
>
> ( VM2 inImage: anImage
>           inNameSpace: aNameSpace
>           usingCompiler: aCompiler
>           eval:'[ someSmalltalkCode ]' )
>
> In that way the old VM could call up the new VM and have it
> evaluate some code.  And when that code was done if ever then
> the old VM would continue from there.  Or the new Image could
> fork into a new thread. etc.
>
> You would want the debugging to be able to step into this
> expression so that you could really see how the VM2 works.
>
> ( VM2 simulation
>      inImage: anImage
>      inNameSpace: aNameSpace
>      usingCompiler: aCompiler
>      eval: '[ someSmalltalkCode ]' )
>
> would allow you to see the byte codes being evaluated before
> your very eyes.  And then the simulation is translated into
> C or machine code to make VM2.  It would be cool if Squeak
> had a portable assembler in it so you didn't have to use C at
> all.  And that portable Assembler could be
> SmalltalkByteCodeStreamForCPUIA32
> SmalltalkByteCodeStreamForCPUIA64
> etc.  Instead of the traditional archain neumonics used in
> assemblers we could use Smalltalk messages instead to
> generate that machine code.
> The above expression would allow you to see an image being
> loaded up and a name space within that image being selected
> and a Compiler being used to compile '[ someSmalltalk ]'
> and then being able to see Smalltalk expressions being
> evaluated in the debugger in that image and name space
> on that VM.  And when you hop into a message send then
> the byte code debugger would move to the front of the screen
> and show the byte codes being executed if desired.  It would
> be cool if there was a machine code debugger so you could
> hop into a byte code instruction and see how it is being
> evaluated.  It would interpret what was in the registers and
> what was on the stack as Objects.  There would be inspectors.
>
> Hopefully this kind of thing would allow multiple VMs and
> multiple images and multiple compilers all to be running at
> the same time.  Hopefully it would encourage VM development
> and compiler development such that Squeak could branch
> out in all different ways.
>
> You could have SmalltalkByteCodeStreamForV8 which
> would make public the functionality of the V8 Java VM.
> And then you could have the V8 VM be one of the VMs
> inside of Squeak.
>
> You can switch from VM to VM at runtime.
>
> You can use the old VMs to make a new one.
>
> There are Smalltalk debuggers and byte code debuggers and
> machine code debuggers.
>
> There is the traditional Squeak VM and there are platform
> specific VMs that can all run side by side.  There are
> multiple different Windowing systems all running side by
> side.  Some native and some not.  Some the old Squeak
> way and some the Dolphin way some the Java way. etc.
>
> Squeak's portable assembler
> SmalltalkByteCodeStreamForCPU can be used to
> output an executable file that has zero or more VMs inside
> of it into a Directory on disk with zero or more image files
> and souce code files for the different name spaces and
> hierarchies.  Then you fire up that executable and those
> VMs are inside of it.
>
> It would be cool if there was a PEFileStream that could
> be used to make public all the sections inside of a
> PE format executable file.  With a sequence of tests
> going from simple to complex and lots of documentation.
>
> I do think that there should be a new VM and it should
> run right alongside of the old one and be the first of many.
>
>
>>   On Mon, Sep 1, 2008 at 7:56 PM, Igor Stasenko <siguctua at gmail.com>wrote:
>>
>>> 2008/9/2 Kjell Godo <squeaklist at gmail.com>:
>>> > Where is this new compiler project?  Where is NewCompiler?  I would
>>> like to
>>> > see it.
>>> > Does anybody know where that book about the Squeak Compiler went to?
>>> >
>>> > the rest down below is all nonsense and I wouldn't read it if I were
>>> you.
>>> >
>>> > i knew this was going to cost me.
>>> >
>>> > What is atomic loading?  Does it mean no dependencies or dependencies
>>> are
>>> > handled?
>>> > It seems to me that there needs to be some kind of intellegent
>>> dependencies
>>> > manager that works a lot better and is a lot smarter than what has been
>>> put
>>> > out there so far.
>>> >
>>>
>>> The atomic loading is not about handling dependencies (they are
>>> present and adressed as well, of course), but about installing a
>>> number of changes in system replacing old behavior in a single short
>>> operation, which can guarantee a safety from old/new behaviour
>>> conflicts.
>>>
>>> > How can I learn about how a good Squeak compiler works?  Without years
>>> and
>>> > millions of dead hours?
>>> >
>>>
>>> Sure, you need some experience in compiling things, especially
>>> smalltalk. Or , at least , if you even don't have such experience, but
>>> using Parser/Compiler for own purposes, your experience is valuable as
>>> well, since you can highlight different problems or propose better
>>> interface or new functionality.
>>>
>>> > Modularity is very good.  I think that all of Squeak should be very
>>> self
>>> > explaining.  This can be done if you put your explanations of what is
>>> going
>>> > on after the body of the method.  Colored source is good too.  See
>>> Dolphin.
>>> > But without reformating.
>>> >
>>> > I am making picoLARC on sourceforge.net.  Each lisp/smalltalk
>>> expression
>>> > gets compiled by an instance of a Compiler Class.  Each expression( let
>>> if
>>> > define etc ) has its own KEGLambdaLispCompiler subClass with one
>>> > standard method and zero or more helper methods in it.  Each Compiler
>>> > outputs an instance of a subClass of the Eval Class.  An Eval can be
>>> > evaluated at runtime by >>evalWithActivationRec: or it could generate
>>> byte
>>> > codes which do the same thing via some method like
>>> > EvalSubClass>>generateByteCodesOn:usingCodeWalker: where the CodeWalker
>>> > could tie Evals together or do optimizations?  Is this not a good
>>> design?  I
>>> > know I like the part about one Compiler Class for each expression and
>>> one
>>> > corresponding Eval Class.  But I haven't done any byte code generation
>>> yet
>>> > so I don't know about that part.  One Compiler per Eval is not strict.
>>>  The
>>> > ApplicationCompiler can output several different related kinds of Evals
>>> for
>>> > the different function calls and message sends.
>>> >
>>> > What is this visitor pattern?
>>>
>>> http://en.wikipedia.org/wiki/Visitor_pattern
>>>
>>> >  I don't like the idea of putting byte code
>>> > generation into a single Class.  But I feel like maybe I don't know
>>> what I'm
>>> > talking about.  To modify the byte code generation for an expression
>>> you
>>> > would subClass the Eval Class and modify the
>>> >>>generateByteCodeOn: aCodeStream.  The initial implementor would try to
>>> >>> seperate out the parts that might be modified by someone into
>>> seperate
>>> >>> methods that get called by
>>> >>>generateByteCodeOn: so these helper methods would generally be
>>> overridden
>>> >>> and not
>>> >>>generateByteCodeOn: unless that method was really simple.  So the
>>> initial
>>> >>> implementor has to think about reuse and the places where
>>> modification might
>>> >>> occure.  So you would have a lot of simple
>>> >>>generateByteCodeOn: methods instead of one big complex one.
>>> >
>>> > There are all different ways of calling a function or method or query
>>> etc in
>>> > picoLARC and these are all subClasses of
>>> KEGLambdaLispApplicationEvalV6p1
>>> > and it seems to work fine.
>>> >
>>> > But overriding >>generateByteCodesOn: is not good enough is it?  The
>>> > Compiler Classes can't have hard coded Eval instance creations either
>>> > right?  The Compiler Class has to be subClassed also and the
>>>
>>> The problem in Squeak compiler that you will need to override much
>>> more classes than just Compiler to emit different bytecode, for
>>> instance.
>>>
>>> >>>meaningOf:inEnviron: needs to have a
>>> > ( self createEval ) expression in it that could be subClassed and
>>> > overridden.  And then that subClass has to be easily inserted into the
>>> > expression dispatch table that pairs up expressions with expression
>>> > Compilers.  So when that table gets made there should be a
>>> > ( tableModifier modify: table ) which could stick the < expression
>>> Compiler
>>> >> pairs in that are needed.
>>> >
>>> > I think that is all that would be required to modify the compilation of
>>> an
>>> > expression.
>>> > I will have to make these changes to picoLARC so it will be more
>>> modifiable.
>>> >
>>> > I think the Compiler should be very modular.  For picoLARC one Class
>>> per
>>> > expression and one Class per Eval seems to work good.  Stuffing lots of
>>> > seperate things into a single Class and doing a procedural functional
>>> thing
>>> > and not an OOP thing does not seem good to me.
>>> >
>>> > I think that the Compiler should be very clean and a best practices
>>> example
>>> > with a long comment at the bottom of each method telling all about what
>>> it
>>> > does.  Writing it out and referencing other related methods helps to
>>> think
>>> > about what is really going on and then a better design without hacks
>>> comes
>>> > out.  I don't think hacking should be encouraged at all.  Hacking just
>>> makes
>>> > a mess.
>>> >
>>> +1
>>>
>>> The design should allow replacing critical parts of compiler by
>>> subclassing without the need in modifying original classes.
>>> A so-called extensions , or monkey patching is very bad practice which
>>> in straightly opposite direction from modularity.
>>>
>>> I thinking, maybe at some point, to prevent monkey patching, a
>>> deployed classes can disallow installing or modifying their methods.
>>>
>>>
>>> > And then this practice of not making any Package comments has got to
>>> stop.
>>> > I think that people who do that should be admonished in some way.  I
>>> think
>>> > that the Package comment for the Compiler should contain the design
>>> document
>>> > for it that tells all about how it is designed.  If it needs to be long
>>> then
>>> > it should be long.  It should include: How to understand the Compiler.
>>> > There should be a sequence of test cases that start simple and show how
>>> it
>>> > all works.
>>> >
>>> > And that should go for the VM too.  This idea that the VM can be opaque
>>> and
>>> > only recognizable to a few is not good.
>>> >
>>>
>>> VM tend to be complex. And complexity comes from inability of our
>>> hardware/OS work in a ways how we need/want it.
>>>
>>> > These should be works of art and not hacked up piles of rubbish to be
>>> hidden
>>> > away into obscurity.
>>> >
>>> > There is this idea that one should only care about what something does.
>>>  And
>>> > the insides of it are a random black box that you tweek and pray on.
>>>  But I
>>> > think that the insides should be shown to the world.  They should
>>> > be displayed on a backdrop of velvet.  Especially the Compiler and VM
>>> and VM
>>> > maker.  And then the whole Windowing thing should be modularized so you
>>> can
>>> > have multiple different Windowing systems.
>>> >
>>> > And what about having multiple VMs?  It would be cool if picoLARC could
>>> be
>>> > inside of Squeak in that way.  It would be cool if one VM was
>>> generalized so
>>> > that it could support different dialects and languages.  And another
>>> was
>>> > specific and fast.  And you could make various kinds of VMs and images
>>> and
>>> > output them onto disk without a lot of trouble.  It would come with gcc
>>> and
>>> > all that junk all set up so it would just work.  If you already had gcc
>>> you
>>> > could tell it not to download it.
>>> >
>>>
>>> What is gcc? And why it required to make VM? ;)
>>>
>>> > picoLARC has simple name spaces called Nodules where you can have
>>> Nodules
>>> > inside of Nodules and Nodules can multiply inherit variables from
>>> others.
>>> > Maybe such a thing could be used in Squeak?  Then you could have
>>> multiple
>>> > VMs.  And VMs inside of VMs.
>>> >
>>> > I think that Dolphin Smalltalk could be held up as an example of
>>> pretty.
>>> >
>>>
>>> Maybe, if you know how to deal with license & copyrights when taking
>>> their source and blindly putting it to Squeak :)
>>>
>>> > I hope picoLARC will be another one.
>>> >
>>> > I think that Squeak is pretty in a somewhat cancerous sort of way.
>>> > The cancer is all the hacking.  That goes on.
>>> > The vision is great but the hacking and undocumenting gum up all those
>>> big
>>> > ideas.
>>> >
>>> > Sure it's quick but it rots away quickly too.
>>> >
>>> > Undocumented features.  In Smalltalk this is less of a problem but in
>>> like
>>> > Lisp say you make this great feature but then don't document it.  You
>>> might
>>> > as well have not even made it.
>>> >
>>>
>>>
>>>
>>> --
>>>  Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>>
>>
>>
>> --
>> David Zmick
>> /dz0004455\
>> http://dz0004455.googlepages.com
>> http://dz0004455.blogspot.com
>>
>>
>>
>>
>
>
>
>

-- 
David Zmick
/dz0004455\
http://dz0004455.googlepages.com
http://dz0004455.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20080907/55af8352/attachment.htm