[Vm-dev] Linux 4.4.7.2357 VM crash under memory pressure

Mon May 21 02:32:04 UTC 2012

On Sun, May 20, 2012 at 08:55:18PM +0200, Bert Freudenberg wrote:
> 
> 
> On 20.05.2012, at 19:35, David T. Lewis wrote:
> 
> > 
> > On Sun, May 20, 2012 at 03:08:20PM +0200, Bert Freudenberg wrote:
> >> 
> >> Hilaire discovered that his newest DrGeo segfaults on the XO-1. It works fine elsewhere, including the XO-1.5, which has pretty much the same OS.
> >> 
> >> We narrowed down the problem to the XO-1 having only 256 MB of RAM and no swap space. I can reproduce the crash in a virtual Ubuntu 12 with 768 MB RAM (!) but no swap. Top reports:
> >> 
> >> Mem:    766204k total,   601588k used,   164616k free,    45624k buffers
> >> Swap:        0k total,        0k used,        0k free,   277024k cached
> >> 
> >> but DrGeo still crashes. Etoys runs fine using the same Squeak VM on the same system (and on XO-1). DrGeo is based on Pharo 1.4, using a closure image. Etoys still is pre-closure. 
> >> 
> > 
> > I recall some recent discussion on the Pharo list about some "strange objects"
> > that had entered the image for a period of time. It was something to do with
> > a mismatch in the number of instance variable slots. The VM crash is happening
> > in a method that is stepping through the fields of an object, so if something
> > was out of whack there it might well lead to problems.
> > 
> > The discussion started here:
> > 
> >  http://lists.gforge.inria.fr/pipermail/pharo-project/2012-May/064539.html
> > 
> > It would be worth checking if the DrGeo image might have this issue, in
> > case those objects might for some reason be interacting badly with the
> > garbage collector.
> > 
> > Dave
> 
> Interesting. However, it's strange that this would manifest only on machines with less memory - shouldn't the VM topple over at the first GC no matter what?
>

I don't see any obvious reason why the small memory would make a difference
either. I'm really just trying to think of things that might be different in
this case. The failure seems to be happening in GC code that has not changed
in recent years, but it's happening after calling a method that depends on
object header format (#lastPointerWhileForwarding:), and it is happening in
code the iterates over the fields of an object. Beyond that I'm just totally
guessing.

Dumb question - do you know if any other closure-enabled images have this
problem on small memory systems? I'd have thought someone would have noticed
by now, but maybe not.

Dave