Report from a novice VM h4x0r.

Wed Mar 31 22:47:09 UTC 2004

om,

I've been obsessed with hacking the VM for the last 5 days or so, often 
staying up till 3AM watching builds compile....

I had hoped to be able to provide you with a snapshot of my progress 
but, unfortunately, I have hit some snags. I will describe these 
presently as I review the things I have learned from this excercise.

My optimization plan was roughly as follows:

1. remove all unused or redundant variables and methods.
2. tune out all redundant computations.
3. use C library functions where possible (or make better use of C 
constructs).
4. Go through the compiler to cancel out any negative side effects from #1.
5. go through the code and re-insert variables only where necessary to 
reverse any regressions that couldn't be fixed by #4.

I was doing okay, I made it all the way to steps 2-3 when I reached the 
point where my image was segfaulting or recursive erroring (with 
errionious instruction pointer values!) and there was no way to find 
what mistake I had made from the versions of the interpreter I had keept 
as backup for comparison.

The 3.7 alpha image had too many problems to beginwith anyway.

I started from scratch with a 3.6 image as a more stable base.
I then inserted  functions from my changes into the new image 1-by-1, 
comparing it to a "clean" interp.c (which I beleive to be identical to 
the one that I have used as my workhorse for several months now.)

I went through several of these re-inclusions, running the SU test 
runner each time.

I added a batch which caused errors, I thought "Horray! I found the 
bug". I was wrong.

I backed out those changes and tried again, the error re-occoured. I 
backed out further changes from earlier versions and still got errors! 
Beginning to suspect foul play, I ran the test runner on the same image 
with my workhorse VM and was able to produce the same errors! -- I have 
come to the conclusion that SU test runner is very sloppy about its 
cleanup and exacerbates a problem deeper in the system, somewhere....

For this reason I am unable to continue untill there is some way to do 
an integrity check on any given image and a way to repair any corruption 
that may occour. =\

In the meantime, I have made a number of findings that may be of interest:

1. Many of the computations in the VM involve simple integer pointer 
arithmetic.
I propose that instead of being addressed by only "char * memory", there 
be several pointers,
char[] = int[] = &x  which will simplify the VM code greatly... 
Furthermore, making all integer values word-aligned (are they already?) 
will tremendously improve memory access times on many architectures.

2. While the construct:  

| foo |
foo _ self bar: baz
foo = bat ifTrue: [Mars invade].

will help the inliner when bar is a non-trivial method significantly, it 
becomes counterproductive when bar happens to be "integerAt:" (which it 
is in several places) which is actually a compiler macro.

3. Many class, and a few instance variables are treated by the compiler 
as constants. Apparently this is a compiler hack which makes the C code 
more readable without affecting the binary. Unfortunately, this badly 
complicates the compiler class definition. Also, the C compiler does not 
follow the C idiom of making all constants ALLCAPS.

My proposal is to edit the compiler so that it will detect degenerate 
methods (methods which only produce a constant value) and to treat these 
as constants.

I beleive that this is an important change because it will allow class 
variables to be used by a future multithreaded interpreter which will 
probably require class variables such as "ActiveSmalltalkInterpreters".

4. A great many computations involve extracting bit-fields. In many 
cases these bit fields include important scalar information such as 
"sizeBits". A useful improvment would be to make these either bytes or 
short-ints and then let the C compiler figure out how best to extract 
them from the words...

5. I started experamenting with replacing long-coded block moves with 
calls to C's " memmove" and "memcpy" where appropriate. The motovation 
is that the 386 has an ungodly fast "rep movsb " compound instruction 
which does the  operation at the speed of the FSB. It would be even 
better to use an inline assembly command for this but to remain portable 
I started trying to insert the C library calls mentioned.

There are actually a number of DIFFERENT implementations of block memory 
copy in the VM source. One example in the Interpreter is called by a 
function which takes a word count then divides it by 4 to make a word 
coust for the method call, When it is processed by the method it is 
multiplied again to make the byte count used to determine the stop 
pointer for the For loop!

6. At many places in the VM, a whileTrue loop is used in place of a  x 
to: y by: z do: [] loop.
(it makes prettier C code.)

7. The compiler emits many unnecessary gotos.

8. The compiler will compile:

foo _ foo + 1.

as  foo += 1,

Which isn't bad code but on a non-optimizing C compiler this may turn 
into an add[immediate] opcode instead of the shorter inc x, opcode.

9. Both the optimizer and the interpreter  have a number of implicit 
subclasses which are probably coded inline for performance or other 
reasons. These include many tables, headers, and other constructs which 
might be more maintainable as seperate classes.

I would propose that a C calling convention be made so that message 
invocation such as:

foo _ Bar new.

foo baz.

would compile as  Bar_baz(); (when not inlined).

10. I came across an opcode in the interpreter whith a comment that 
stated it was supposed to be obsolete after 2.6!!!

11. It might be profitable to use parts of the internal smalltalk 
compiler in the cCode generator (if it doesn't already.)

12. One of my orrigional motives for this was the observation that 
certain bit-wise logic primitives on small ints will fail unnecessarily 
if they are stored as negative... A more general method of accessing any 
32-bit int would seem to be appropriate.

13. The add opcode seems to attempt the add without changing the stack 
pointer and change it only after succeding. The logic operations (and 
some others) will change the stack pointer 3 times on success or 
failure!  ( doing two pops followed by a push)

14. Many of the opcodes that access the stack without pushing will use 
the method "stackValue: 0" instead of the cleaner "stackTop". This 
probably won't affect the binary but it adds alot of constant arithmetic 
to the generated C code. It also indicates that the cCompiler relies on 
the native compiler too much to optimize out constant arithmetic...)

15. C99 adds a number of interesting features. Among which are improved 
libraries for dealing with integer types on a variety of platforms as 
well as new native boolean types.

Squeak obviously uses sperate classes for true and false. It is not 
clear to me wheather there is any way to profit from this enhancement to 
the underlying language.

C99 also adds complex math to the language. It would seem that adding 
primatives to take advantage of these enhancements would be of benefit.

16. C99 adds block contexts. The current cCompiler will take all inlined 
methods as well as loop variables and put them at the beginning of the 
generated function. Since any current C and even older C++  compilers 
will support for (int i; i < x ; i ++ ), it would be much better form to 
use this instead of heaping all variables at the beginning of the block.

17. The last major revision to the interpreter was 5 years ago. =\

om

I am still fairly moronic with regards to matters of the interpreter but 
I hope that the people who have bothered to read these will find them to 
be of some use.

Thanks for squeak!