[Squeak Installer] The Compiler, The Final Frontier(?)

Mark van Gulik ghoul6 at home.com
Sat Aug 25 08:30:36 UTC 2001


On Saturday, August 25, 2001, at 02:34 am, PhiHo Hoang wrote:
[...]
     Is it true that the Squeak compiler is implemented in Squeak only, 
there
> is no plugin ? Is it because there is no need for speed in compiling ? 
> Or is
> it impossible to implement the compiler outside of the image ?

I haven't been up on Squeak lately, but unless I'm really mistaken, the 
compiler within Squeak is the only one out there.

>     I need a compiler outside of the Squeak image for bootstrapping 
> purpose.
> Is there a mechanism to translate the 'System-Compiler' category into a
> plugin ? If not, how else can I get a standalone Squeak compiler from 
> the
> available Squeak codes ?

My fourth year project for my B.C.S. was bootstrapping a Smalltalk 
system.  I wrote a *simple* Smalltalk compiler in C and used it to 
directly grow an image from nothing more than a text file with a 
parenthesized hierarchical list of classes (with instance variable 
names), and all method source code for all the classes.

Seriously, writing the compiler in C shouldn't be hard at all (a day to 
a week, depending on experience).  The parser can be simple recursive 
descent, and your tokens don't even have to be allocated as Squeak 
objects.  You don't have to produce "optimal" code, using all the latest 
and greatest bytecodes.  My Smalltalk-in-C compiler didn't even bother 
optimizing conditionals (or maybe I added that to it later).  That kind 
of thing can be dealt with as a final "linking" stage, after all your 
modules have been compiled.

It's been a while since I wrote that code ('88-'89), but I recall an 
issue was how to survive a garbage collection during initial image 
construction (the C code had to point into the image a lot while it was 
being constructed).  If I had it to do over today, I would simply use 
smart pointers that add themselves to a global bi-directional ring in 
their constructor, and remove themselves in their destructor (I use this 
technique in my Avail primitives).  I think that's not a good idea with 
Squeak, due to unavailability of C++ compilers on some platforms.  In my 
old Smalltalk system I simply banned garbage collection during image 
construction -- it wasn't a serious problem, even in 1MB (Atari 1040ST).

Here's an idea:  Extend Slang to be able to translate the Squeak 
compiler.  Most of it is fairly simple code, and the stuff that's more 
complex can be made simple.  Even if everything won't translate, you can 
always fake the rest with a few C functions.  Don't worry about memory 
leaks initially.  Eventually you can use your own malloc substitute that 
allocates a "space" for the temporary structures, and then bulk 
deallocates the whole space after each method compilation.  The 
advantage of translating the existing compiler is that as bytecodes 
change, your code will continue to work.


Here's an alternative:  Use a Squeak image to grow your fetal Squeak 
image.  The compiler produces a CompiledMethod which you can then trace 
through and copy into the new image.  SystemTracer might help you with 
that (and you might help SystemTracer with that, too).  The Smalltalk 
compiler probably runs within an order of magnitude as fast as a 
compiler written hastily in C, and that should be fast enough.

You don't need to simulate image memory in an Array or anything so 
severe.  Just keep a few roots pointing to the key data structures of 
your fetal "image", and be prepared to do a little extra work separating 
your data from the running image when producing an image file.  If you 
want to do this live, create all your data structures (sharing 
immutables like Symbols if you want), then invoke a new magic primitive 
whose purpose is to do a big "context switch" of all the key Smalltalk 
roots (Processor, etc).  Two images worth of data can live in one actual 
image without much trouble.  Hm.  On second thought, don't share Symbols 
or you'll run into method lookup problems.  Hm.  Even SmallIntegers will 
be a problem (and you can't really build a class like that).  You'll 
have to switch method dictionaries for all the "known to the VM" classes 
atomically inside the context switch primitive.

-Mark




More information about the Squeak-dev mailing list