Report from a novice VM h4x0r.
Alan Grimes
alangrimes at starpower.net
Wed Mar 31 22:47:09 UTC 2004
om,
I've been obsessed with hacking the VM for the last 5 days or so, often
staying up till 3AM watching builds compile....
I had hoped to be able to provide you with a snapshot of my progress
but, unfortunately, I have hit some snags. I will describe these
presently as I review the things I have learned from this excercise.
My optimization plan was roughly as follows:
1. remove all unused or redundant variables and methods.
2. tune out all redundant computations.
3. use C library functions where possible (or make better use of C
constructs).
4. Go through the compiler to cancel out any negative side effects from #1.
5. go through the code and re-insert variables only where necessary to
reverse any regressions that couldn't be fixed by #4.
I was doing okay, I made it all the way to steps 2-3 when I reached the
point where my image was segfaulting or recursive erroring (with
errionious instruction pointer values!) and there was no way to find
what mistake I had made from the versions of the interpreter I had keept
as backup for comparison.
The 3.7 alpha image had too many problems to beginwith anyway.
I started from scratch with a 3.6 image as a more stable base.
I then inserted functions from my changes into the new image 1-by-1,
comparing it to a "clean" interp.c (which I beleive to be identical to
the one that I have used as my workhorse for several months now.)
I went through several of these re-inclusions, running the SU test
runner each time.
I added a batch which caused errors, I thought "Horray! I found the
bug". I was wrong.
I backed out those changes and tried again, the error re-occoured. I
backed out further changes from earlier versions and still got errors!
Beginning to suspect foul play, I ran the test runner on the same image
with my workhorse VM and was able to produce the same errors! -- I have
come to the conclusion that SU test runner is very sloppy about its
cleanup and exacerbates a problem deeper in the system, somewhere....
For this reason I am unable to continue untill there is some way to do
an integrity check on any given image and a way to repair any corruption
that may occour. =\
In the meantime, I have made a number of findings that may be of interest:
1. Many of the computations in the VM involve simple integer pointer
arithmetic.
I propose that instead of being addressed by only "char * memory", there
be several pointers,
char[] = int[] = &x which will simplify the VM code greatly...
Furthermore, making all integer values word-aligned (are they already?)
will tremendously improve memory access times on many architectures.
2. While the construct:
| foo |
foo _ self bar: baz
foo = bat ifTrue: [Mars invade].
will help the inliner when bar is a non-trivial method significantly, it
becomes counterproductive when bar happens to be "integerAt:" (which it
is in several places) which is actually a compiler macro.
3. Many class, and a few instance variables are treated by the compiler
as constants. Apparently this is a compiler hack which makes the C code
more readable without affecting the binary. Unfortunately, this badly
complicates the compiler class definition. Also, the C compiler does not
follow the C idiom of making all constants ALLCAPS.
My proposal is to edit the compiler so that it will detect degenerate
methods (methods which only produce a constant value) and to treat these
as constants.
I beleive that this is an important change because it will allow class
variables to be used by a future multithreaded interpreter which will
probably require class variables such as "ActiveSmalltalkInterpreters".
4. A great many computations involve extracting bit-fields. In many
cases these bit fields include important scalar information such as
"sizeBits". A useful improvment would be to make these either bytes or
short-ints and then let the C compiler figure out how best to extract
them from the words...
5. I started experamenting with replacing long-coded block moves with
calls to C's " memmove" and "memcpy" where appropriate. The motovation
is that the 386 has an ungodly fast "rep movsb " compound instruction
which does the operation at the speed of the FSB. It would be even
better to use an inline assembly command for this but to remain portable
I started trying to insert the C library calls mentioned.
There are actually a number of DIFFERENT implementations of block memory
copy in the VM source. One example in the Interpreter is called by a
function which takes a word count then divides it by 4 to make a word
coust for the method call, When it is processed by the method it is
multiplied again to make the byte count used to determine the stop
pointer for the For loop!
6. At many places in the VM, a whileTrue loop is used in place of a x
to: y by: z do: [] loop.
(it makes prettier C code.)
7. The compiler emits many unnecessary gotos.
8. The compiler will compile:
foo _ foo + 1.
as foo += 1,
Which isn't bad code but on a non-optimizing C compiler this may turn
into an add[immediate] opcode instead of the shorter inc x, opcode.
9. Both the optimizer and the interpreter have a number of implicit
subclasses which are probably coded inline for performance or other
reasons. These include many tables, headers, and other constructs which
might be more maintainable as seperate classes.
I would propose that a C calling convention be made so that message
invocation such as:
foo _ Bar new.
foo baz.
would compile as Bar_baz(); (when not inlined).
10. I came across an opcode in the interpreter whith a comment that
stated it was supposed to be obsolete after 2.6!!!
11. It might be profitable to use parts of the internal smalltalk
compiler in the cCode generator (if it doesn't already.)
12. One of my orrigional motives for this was the observation that
certain bit-wise logic primitives on small ints will fail unnecessarily
if they are stored as negative... A more general method of accessing any
32-bit int would seem to be appropriate.
13. The add opcode seems to attempt the add without changing the stack
pointer and change it only after succeding. The logic operations (and
some others) will change the stack pointer 3 times on success or
failure! ( doing two pops followed by a push)
14. Many of the opcodes that access the stack without pushing will use
the method "stackValue: 0" instead of the cleaner "stackTop". This
probably won't affect the binary but it adds alot of constant arithmetic
to the generated C code. It also indicates that the cCompiler relies on
the native compiler too much to optimize out constant arithmetic...)
15. C99 adds a number of interesting features. Among which are improved
libraries for dealing with integer types on a variety of platforms as
well as new native boolean types.
Squeak obviously uses sperate classes for true and false. It is not
clear to me wheather there is any way to profit from this enhancement to
the underlying language.
C99 also adds complex math to the language. It would seem that adding
primatives to take advantage of these enhancements would be of benefit.
16. C99 adds block contexts. The current cCompiler will take all inlined
methods as well as loop variables and put them at the beginning of the
generated function. Since any current C and even older C++ compilers
will support for (int i; i < x ; i ++ ), it would be much better form to
use this instead of heaping all variables at the beginning of the block.
17. The last major revision to the interpreter was 5 years ago. =\
om
I am still fairly moronic with regards to matters of the interpreter but
I hope that the people who have bothered to read these will find them to
be of some use.
Thanks for squeak!
More information about the Squeak-dev
mailing list
|