Traits approaching mainstream Squeak

Bryce Kampjes bryce at kampjes.demon.co.uk
Fri Sep 2 21:54:16 UTC 2005


Tony Garnock-Jones writes:
 > I'm extremely keen to help out on this front. Could you perhaps write a
 > paragraph or two on the process of running the stress tests? I'd like to
 > see it crashing, and to start to get the experience needed to fix it
 > when it does.

First, Exupery only runs on Linux/x86 at the moment. If you're
running that then follow the instructions on either building
or installing Exupery from here.

http://minnow.cc.gatech.edu/squeak/3842

If you're planning on debugging the compiler then you'll want to build
your own kernel. Debugging normally starts by analysing generated
machine code in gdb. I'm not sure how well this would work without a
local build. 

Once you've got a local working version of Exupery then:


      ExuperyProfiler profileAndCompile: 
         [5 timesRepeat: [ExuperyBenchmarks new compilerBenchmark]]

Will profile the expression given and compile the 10 most used
methods. This method doesn't do primitive inlining yet so don't expect
your bytecode performance to beat VW. I'll add that as soon as the
stress test passes.

      ExuperyProfiler stressTest

Will run the stress test. In somewhere between 1 and 10 minutes it
will crash. Some of the tests freeze, I think this is due to the
progress dialog morph but haven't investigated. Hitting Alt-. then
proceeding will get past this. The progress dialog morph lock-up
is easy to detect, Squeak drops to 0% CPU without crashing.

Once it's crashed look in the Exupery.log file.

If you're lucky the log shows that 10 methods have been compiled since
the code cache was initialised. Try compiling these methods manually
and figure out which one's causing the crash. Then I normally write a
test that reliably creates the crash.

If you're unlucky then the crash was caused by memory corruption.
A stack backtrace will probably show that the GC was executing and
there may be no compiled methods. Memory corruption crashes can be
a big pain because the corruption may have happened a long time ago.

After getting a test that reproduces the bug then have a look
at the crash. It's worthwhile to look at both the current active
context (foo->activeContext on Linux) and the C stack trace.

A context looks like:

(gdb) x/20x foo->activeContext  
0x434d5ccc:     0x1736e35f      0x434d5c70      0x000000b1      0x0000000b
0x434d5cdc:     0x40da1f8c      0x808b415b      0x434dcc50      0x00000015
0x434d5cec:     0x434dcee0      0x4035b004      0x4035b004      0x434dcc50
0x434d5cfc:     0x434dced8      0x00000003      0x000001ff      0x000001ff
0x434d5d0c:     0x0000010d      0x00002b3f      0x434a1ab4      0x434a1ab4



0x000000b1 is the byte code program counter
0x0000000b is the stack pointer

0x808b415b is Exupery's return address.

>From this it's possible to figure out that it crashed while executing
a compiled context and trace back to where in the method and which 
basic block it last entered. Exupery only updates the context when
it leaves a method so the stack pointer and Exupery basic block
pointer will point to where it re-entered the method.

>From here, options involve exploring with gdb break-points, adding
printf statements (self cCode: 'printf("Entering method\n")'. will
add a printf in Slang), or adding calls to validation code
(Interpreter methods in the "debug support" catagory). 

Staring at the generated methods sometimes helps. But less so when
debugging random bugs. Place a halt at then end of Exupery>>run then
try opening inspectors on the instance variables holding compiled
methods (most of them). I save all the stages results in instance
variables to make it easier to debug. The inspector will open up a
graphical view of the method, it's an animated springs and repulsion
Connectors graph. The explorers are normal.

After figuring out why the machine code is crashing then we need to
know where Exupery went wrong. This is a game of chasing the bug
back through the different versions of the method until we find
the first version with a fault.

Bryce

P.S. I'm currently working at London Bridge, that's reasonably close
to you isn't it? If you're serious about debugging then some pairing
may help.



More information about the Squeak-dev mailing list