Squid plan

Sat May 10 18:01:19 UTC 2003

Jecel Assumpcao Jr <jecel at merlintec.com> wrote:
> On Thursday 08 May 2003 23:21, Anthony Hannan wrote:
> > All modules except the Boot modules shall be implemented in Smalltalk.
>
> Why not Boot as well? You can bootstrap it from another Smalltalk as was 
> done originally for Squeak.

By Smalltalk I mean reall Smalltalk not Slang.  But your right we can
implement the Boot module in real Smalltalk, it just has to be
translated and stored in the native executable format (eg. ELF).

> > After we have a working Squid kernel, we can port Squeak code to it.
>
> "Port Squeak code" can have several different meanings, all interesting.
> It might mean being able to file in .cs files created in Squeak. Or 
> being able to read project files. Or even whole images.

Your right, but I would probably start by writing a tool in Squeak to
generate Squid module files.  This way the tool can interact with the
user where necessary to resolve semantic differences.

> > Boot
> >
> > Parses the command line, loads the named squid file, and executes its
> > boot method with the rest of the command line string as it argument.
> 
> One interesting variation I had in Merlin was to check if the system was 
> already running. If so, the command line is passed on to it and the 
> boot module dies. That made the other module files look like double 
> clickable applications in Windows or Mac. Of course, for this illusion 
> to work you need to be able to have multiple "native" windows.

We probably still want to support launching multiple Squids.  Your idea
above can be implemented using an alternative boot program.

> > Method Execution
> >
> > Method bytecodes are translated to machine code (if not already)
> > then executed.  Each method maintains a pointer to its machine code.
> > Translated code still manipulates its own Smalltalk object stacks.
> > Bytecodes are low-level like machine code.
> 
> Would this be essentially the send cache?

The send cache is the responsibility of the compiler not the VM. 
Bytecodes are a universal machine code.  Its call instruction doesn't
get translated to the native call instruction, however, because the
stack is not in its own protected segment and it grow up not down. 
Calls are implemented in their long form of pushing, jumping, then
testing overflow.  The stack is a regular Smalltalk object.

> > Translator
> >
> > The translator translates bytecodes to machine code using a
> > platform-specific machine code generator (conversion table).  The
> > translator and generator is written in Smalltalk.
> 
> What kinds of optimizations are you considering? Since the translator 
> translates itself (I suppose), this will have a great impact in the 
> system performance.

The translator is like an assembler, its a one-to-one translator from
bytecodes to machine-code.  I'm hoping the dynamic optimizer will
generate sufficient fast bytecodes.  If not we can write some methods in
straight bytecode (Smalltalk assembly).  Smalltalk assembly is how
"primitives" will be implemented.

> > Segments
> >
> > Each squid file contains a segment of objects.  Objects can point to
> > objects in other segments, using cross pointers.  Following a cross
> > pointer causes the target segment to be loaded (if not already). 
> > Files are kept in sync with their loaded segments (mmap),
> > maintaining persistence without explicit saving.  Cross pointers may
> > cross machine boundaries.
> 
> In my design, segments are compressed. When it is loaded into memory, 
> only those objects actually referenced get expanded into the heap. When 
> they are changed, the get written into a log instead of going back into 
> the compressed segment (probably wouldn't fit anyway).
> 
> Even mmap takes a finite time to update the disk. Please consider the 
> effects of a crash.

When you write your log, you have to flush to disk, don't you.  What
about just flushing the mmap in my scheme instead?

If the application requires transaction support, this can be implemented
at a higher level using objects.  So all we care about at this low level
is maintaining object pointer consistency.  Ie. suppose object A gets a
pointer to object B, if object A is flushed to disk, then object B must
be flushed to disk as well.  If a crash happens before the flush, the
system will resume with object A pointing to its previous object (the
state before the new pointer assignment).

> > Remote Pointers
> >
> > Cross pointers across machines must be established and maintain and
> > robust against failure.  When accessing a remote field or invoking a
> > remote method, execution can move to the remote machine or the
> > object can move or be replicated to the local machine.
> 
> How do you decide which of the three options should be chosen?

That is a hard decision and I believe it's an active area of research. 
In the meantime we can just choose one or come up with some simple
heuristic.

> Are remote pointers handled explicitly in the applications or do all 
> objects look alike?

The latter.  A CrossPointer object is a proxy for the remote object.  It
intercepts messages and does the appropriate action according to our
decision heuristic above.

> > Garbage Collection
> >
> > Garbage collection is performed on each segment individually.  It is
> > written in Smalltalk and runs in its own segment, thus allowing the
> > algorithm to utilize full Smalltalk power, ie. object creation.
> 
> What are the roots for this GC? What about inter segment cycles? There 
> are some very good papers at INRIA about this kind of thing.

CrossPointers point to a roots array in the target segment (a la
ImageSegments), so all roots will be in the target segment.  So the GC
algorithm will look similar to today.
> 
> In my own project I simply don't ever collect at all (in theory, at 
> least).

Interesting.  I guess this is possible thanks to your selective reading
and writing of segments.

> > C Interface
> >
> > There is no VM.  The native OS and machine are accessed via C library
> > functions called directly from Smalltalk.  C calls transfer args from
> > the Smalltalk stack to the C stack and vice versa.
> 
> How is that library linked to the rest of the runtime system?

The runtime system is written in Smalltalk and Smalltalk assembly,
instead of Slang and C.  Calls it makes to native libraries are done
through this C interface.

> > Modules
> >
> > A Smalltalk module defines a set of selector methods, classes, and
> > class methods, and imports a set of other modules whose public
> > methods and classes are accessible.  A selector method is a method
> > that stands alone and is called directly.  A class method overrides a
> > selector method and cannot be called directly.  A selector method may
> > delegate to the receiver's class if desired.  Only class method can
> > access fields of the receiver.  Delegation looks up the class method
> > in the sender's module (including visible imports) and the class's
> > module.
> 
> "A class method...cannot be called directly" ever? From outside of the 
> module? But it seems that instances of a class are outside of the 
> module that contains it. I am confused.

You can send #new to a class.  #new is a selector method.  Delegation
looks
in the module the class is defined in.

But I am going to change my module scheme again.  The above is great
from the VM point of view, but it is lacking from the documentation
point of view.  Please see my next post entitled "Squid modules" for my
new scheme.

> > Compiler
> >
> > Translate Smalltalk source into Squid bytecodes.  The closure
> > compiler is a start for this.
> 
> This is the easy part :-)
> 
> > Dynamic Optimizer
> >
> > Add profilers and inline heavily used code.  The inlined code is
> > bytecode which is then translated to machine code.
> 
> Shouldn't this be part of the Translator?

The translator could add the profilers, which then triggers the
optimizer when appropriate.  The optimizer will run in a separate thread
and when finished, will run the translator on its resulting
inlined-bytecode then relace the target method's native code with it.

> > The plan is to implement the following modules (and related ones not
> > yet realized).  If anyone wants to help, please let me know.
>
> This has a lot in common with some of the stuff I am doing, so we might 
> be able to work together. In addition, I might be able to hire somebody 
> to help with this in the second half of this year.

That would be great.  I think we should continue this discussion so we
can agree on the overall design.  Do you have any links I should check
out.

Cheers,
Anthony