Hi All,<div><br></div><div> responding to Andrew here because this is generally of interest to the vm-list.<br><br><div class="gmail_quote">On Mon, Sep 26, 2011 at 11:06 AM, Andrew Gaylard <span dir="ltr"><<a href="mailto:apg@4dst.com">apg@4dst.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hmmm. Thanks for the advice -- we now build with -O3, and all's well.<br>
I've run the VM at full load (mostly compiling) for 30 hours without a<br>
hiccup. Interesting that -O2 is problematic, but -O3 isn't; I assumed<br>
that higher optimisations would make things less stable, not more so.<br>
And we get a 17% speed increase.<br>
<br>
My GCC is:<br>
$ gcc --version<br>
gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3<br></blockquote><div><br></div><div>So this really surprises me since we see exactly the same thing with gcc version 3.4.6 20060404 (Red Hat 3.4.6-3). If we compile with -O1 or -O3 we get functional Cog VMs, but -O2 crashes on start-up or soon there-after. I'm surprised that two very different versions of gcc show the same behaviour but I guess I shouldn't be. Some time some of us (me included) could really do to put the effort into understanding what the issue is. It could be a gcc bug or it could be that we're generating C code with ill-defined behaviour. I have to say that I suspect the latter given how different gcc 3.4.x and gcc 4.4.x are (BTW Andrew also sees the same issue with gcc 4.1.x).</div>
<div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<font color="#888888"><br>
- Andrew<br>
</font><div class="im"><br>
On 2011.09.25 23:12:50 -0700, Eliot Miranda <<a href="mailto:eliot.miranda@gmail.com">eliot.miranda@gmail.com</a>> wrote:<br>
> On Sat, Sep 24, 2011 at 9:02 AM, Andrew Gaylard <<a href="mailto:apg@4dst.com">apg@4dst.com</a>> wrote:<br>
><br>
> > Actually, it looks like I was wrong. After rebuiding everything from<br>
> > scratch, I've been unable to reproduce these crashes, except for the<br>
> > one with unix-4.4.7.image.<br>
> ><br>
> > Sorry for the false alarm. r2495 looks pretty good, at both -O0 and<br>
> > -O1. It still crashes at -O2, but that's not a huge concern.<br>
> ><br>
><br>
> Which gcc are you using? Here at Cadence on a much older 32-bit machine<br>
> using gcc 3.4.x we see crashes at -O2 but no crashes at -O0 -O1 & -O3 :)<br>
><br>
><br>
> ><br>
> ><br>
</div><div><div></div><div class="h5">> > On 2011.09.24 08:07:47 +0200, Andrew Gaylard <<a href="mailto:apg@4dst.com">apg@4dst.com</a>> wrote:<br>
> > > On 2011.09.23 13:26:06 -0700, Eliot Miranda <<a href="mailto:eliot.miranda@gmail.com">eliot.miranda@gmail.com</a>><br>
> > wrote:<br>
> > > > Thank you, Andrew, you nailed it. I've found the bug via your stack<br>
> > trace<br>
> > > > below. Huge relief. Thanks! New VMs and explanation to the list<br>
> > soon.<br>
> > ><br>
> > > Alas, we spoke too soon. -2495 exhibits the same symptoms; traces and<br>
> > > gdb transcripts are attached.<br>
> > ><br>
> > > - vm-*-2495.0.txt are from our basic.image, running the test-runner.<br>
> > > - vm-*-2495.1.txt are from Squeak4.2-10966.image, running the<br>
> > test-runner.<br>
> > > - vm-*-2495.2.txt are from unix-4.4.7.image, having just started up the<br>
> > VM.<br>
> > ><br>
> > > The first two of these appear to be the same problem I encountered<br>
> > > with -2493. The backtraces certainly look very similar.<br>
> > ><br>
> > > The third one is rather different. Looking at the stack trace, the<br>
> > > 'rcvr' variable in ceSendsupertonumArgs is 17039140, which is<br>
> > > de-referenced in line 10733, causing a SEGV; the handler duly confirms<br>
> > > the faulting address as si_addr = 0x103ff24:<br>
> > ><br>
> > > $ perl -e 'print 0x103ff24'<br>
> > > 17039140<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>best,<div>Eliot</div><br>
</div>