Squid VM

Anthony Hannan ajh18 at cornell.edu
Sat May 3 15:23:21 UTC 2003


Hello guys,

Thanks to your feedback and some implementation on my own, I have
changed my design from what I proposed a couple of weeks ago under the
title "Squik language features".  Below is my new VM and image format
ideas.  Tomorrow I will present my new language feature ideas.  I hope
you will continue to provide feedback.  I believe this
discussion/project can eventually lead to a better Squeak.  But for now
it will be called "Squid" (no longer "Squik" since it was too easily
confused with "Squeak").

Cheers,
Anthony

Minimal VM

The VM defines and executes a minimal set of low-level instructions
and primitives.  The instructions are: jump, jumpIfTrue/False, move,
callMethod, return, and callPrimitive.  Notice the use of callMethod
rather than send.  The VM does not assume a certain message lookup
scheme.  Rather the compiler must generate code to find the appropriate
method then call that method.  Hopefully an adaptive optimizing compiler
and JIT translator will make bytecode method lookup fast enough.

Primitives include: arithmetic/logic operations, internal object
operations, loadCLibrary and callCFunction.  Primitives will be kept to
a minimum.  Those that were implemented for speed alone will be removed,
relying on the adaptive compiler and translator for speed.  Others that
aren't needed for correct VM operation (for example, named
primitives like sound primitives) will move out to independent libraries
and called via callCFunction.  callCFunction enables direct calls from
Smalltalk to C, converting objects to C values and vice versa
automatically.  This functionality will borrow from named primitives
and FFI (foreign function interface).

Kernel Objects

The minimal VM assumes the structure of only 7 kinds of kernel objects:
Object, Class, Method, Context, Root, String, and SmallInteger.  The
first object is a Root (a la specialObjectArray) and contains pointers
to these kernel classes plus nil, true, and false.  It also points to
the image's "main" method.  This method is called when the image is
started with given the command line string as its sole argument.
It may do whatever it wants including resuming the activeProcess of
a ProcessScheduler, which is no longer controlled by the VM.  The root
also points to interrupt handler methods that get called from the VM
when interrupts are found.

Object Format

Every object has a single 32-bit header containing the class pointer in
the high 30 bits and the mark and root GC bits in the low 2 bits (I
believe there are only 2 GC bits as revealed in
initializeObjectHeaderConstants method although the ObjectMemory class
comment states 3 bits: mark, old & dirty).

We can remove the identityHash from the object header since most objects
never use their identityHash.  When it is needed, the object can
becomeForward a mirror instance that has the original field values plus
an extra field for the identityHash (a special mirror class defines the
extra field).  Classes that anticipate identityHash use can include the
identityHash as a regular fields up front.  In either case, a full
31-bits will be allocated for the identityHash improving hashing
distribution.

BecomeForward can be made fast by using a forwarder.  BecomeForward
changes the original object class to Forwarder class, and changes the
first field to point to the new object.  When a message is looked up in
a Forwarder, the message is forwarded to the object it is pointing to. 
The garbage collector may remove forwarders when traversing pointers. 
Identity equals (==) will have to test for forwarder arguments and skip
past them.  Their are two forms of the Forwarder class to accomodate
different size objects.  The first has just one fixed field and is used
for single field objects.  The second has one fixed field plus variable
size indexable fields and is used for objects with more than one field. 
Object with no fields (like nil, false, Object new) cannot use this
become forwarder.  If these special objects need an identityHash they
should add it as a regular field.

Object format is stored in the class, since each instance of a class has
the same format.  Their are three components of the object format stored
in 4 bits in the class's first field.  They are variableType (2 bits),
isWeak (1 bit), and isBytes (1 bit).  VariableType indicates if
instances have indexable fields and if they are pointers or data or
both.  0 means no indexable fields, 1 means all indexable fields are
data (ignored by GC), 2 means all indexable fields are pointers, and 3
means indexable fields are split between pointers and data.  The split
type is used by Methods (literals then bytecodes), and Contexts (temps
then garbage).  isWeak tells if variable pointers (if any) are weak. 
isBytes tells if variable data (if any) are bytes or words.

Object size is stored in the class's fixedSize bits if the class has no
indexable fields, otherwise the object size is stored in the object's
first field.  Two-bits are reserved in this field to hold padding for
byte data.  If the class supports the split type then the object's
second field holds the index of the last pointer (the split size).

To summarize, the object format is:

	header: classPointer(30bits), GC(2bits).
	field1: object | variableSizeN(29bits), bytePadding(2bits),
IntTag(1bit).
	field2: object | variableSplitSizeM(29bits), unused(2bits),
IntTag(1bit).
	...
	fieldL: object.
	fieldL+1: object.
	...
	fieldM: object.
	fieldM+1: object | rawBits.
	...
	fieldN: object | rawBits.

and class format is:

	header:	classPointer(30bits), GC(2bits)
	field1:	variableType(2bits), isWeak(1bit), fixedSizeL(12bits),
isBytes(1bit), IntTag(1bit)
	...

There are no special compact classes.  All fixed-size classes are
compact, which will make up a little for those variable size objects
that are no longer compact.  The main advantages of this format is that
all object headers are the same (no header type), and it supports mixed
objects (methods) and variable-size objects (contexts) without special
case code.



More information about the Squeak-dev mailing list