[Vm-dev] Fwd: [Pharo-fuel] Possible collections problem

Thu May 16 12:33:23 UTC 2013

First to check to see if I understand the problem:

- You are forking an image using OSProcess, and the failures are occuring
the child VM/image that you created using one of the #forkSqueak methods.

- You are serializing some object in the child image so that they can be
written to a serialized output file. You are doing this in the background
child VM/image so that your main image can continue doing whatever it was
doing without being impacted by the serializing process.

- If you do this 10 times, it will fail about 4 times out of the 10 on average.

>From the stack traces, it looks like your VM is hanging up in a normal
garbage collection sweep that is happening as a side effect of allocating
a new object. The first thing that comes to mind is that your VM is hanging
up while trying to allocate more memory than the OS can provide to it, and
it appears to be stuck while waiting memory to be provided by the OS. I'm
suggesting this because I've seen similar symptoms when I intentially try
to allocate very large object memories, so perhaps there may be something
in your runtime environment (OS settings) that limit available memory.
This is a guess only, but it might give you some ideas of where to look.

This might be a VM problem, or it might be a symptom of some other problem
that just puts the VM and OS into a condition where they are unexpectedly
trying to use a very large amount of memory.

If possible it would be good to try to rule out the OSProcess forkImage as
a possible contributing factor. If you were to do the serialization 10 times
in the main image, rather than in the background child image, would it
still fail about 4 times out of 10? This could be a factor in memory use.
Although forkSqueak is very memory efficient, if the child and parent
images do a lot of different things (such as try to serialize a bunch of
stuff in the child image), then the OS will be forced to map in enough
real memory to satisfy both of the images.

Another thing to check, if possible, is to see if the OS is doing a
lot of disk swapping at the time of the failure. If you see all of the
system memory in use, along with a lot of disk activity, then you are
very likely look at a case of the VM trying to allocate more memory
than the system can reasonable provide.

FWIW, here is what your VM is doing at the time of the "hang up". This
involves scanning object memory through memEnd, which might be a very
large value for one reason or another:

ObjectMemory>>updatePointersInRootObjectsFrom: memStart to: memEnd 
    "update pointers in root objects"
    | oop |
    <inline: false>
    1 to: rootTableCount do: [:i | 
            oop := rootTable at: i.
            ((self oop: oop isLessThan: memStart)
                    or: [self oop: oop isGreaterThanOrEqualTo: memEnd])
                ifTrue: ["Note: must not remap the fields of any object twice!"
                    "remap this oop only if not in the memory range 
                    covered below"
                    self remapFieldsAndClassOf: oop]]

HTH,

Dave

On Thu, May 16, 2013 at 09:29:42AM +0200, Max Leske wrote:
>  
> Hi
> 
> I'm forwarding this because I'd like to rule out a VM problem. Short summary:
> I fork a squeak image and then serialize objects with Fuel. In roughly 40% of the cases the fork suddenly locks and consumes 100% CPU. The trace I most often see with gdb in that case is the one with 
> "#0  0x08060453 in updatePointersInRootObjectsFromto ()" at the top.
> 
> The object processed when the lockup occufrs is always of class Timestamp, although that doesn't necessarily mean anything. Maybe it's more about the number of objects.
> 
> I'm working on Debian, 32-bit and I can reproduce the problem with SqueakVM 4.4.7-2364 and 4.0.3-2202 (the newer ones wont run because of glibc). I haven't tried Cog yet.
> 
> I also just checked that the problem occurs even if I don't serialize any timestamps (nor Process, Delay, Monitor, Semaphore; just to be sure).
> 
> So if anyone has a clue as to what might be going on I'd really appreciate the help.
> 
> Cheers,
> Max
> 
> Begin forwarded message:
> 
> > From: Mariano Martinez Peck <marianopeck at gmail.com>
> > Subject: Re: [Pharo-fuel] Possible collections problem
> > Date: 15. Mai 2013 16:53:10 MESZ
> > To: The Fuel Project <pharo-fuel at lists.gforge.inria.fr>
> > Reply-To: The Fuel Project <pharo-fuel at lists.gforge.inria.fr>
> > 
> > I cannot see anything in particular. Too many GC stuff.
> > So I wouldn't spend more time trying to debug. I would try the none large collections. Then I would try with latest Cog and latest StackVM.
> > 
> > 
> > 
> > On Wed, May 15, 2013 at 11:47 AM, Max Leske <maxleske at gmail.com> wrote:
> > I've had several forks hanging around just now. Apart from one all of these were locked. I attached gdb and generated the c stack for all of them. Not sure if there's anything really interesting in there although clearly a lot of time is spent in GC and with creation of objects. That doesn't have to mean anything though.
> > 
> > I haven't yet tried your suggestion Mariano.
> > 
> > Cheers,
> > Max
> > 
> > 
> > [Thread debugging using libthread_db enabled]
> > 0x08060453 in updatePointersInRootObjectsFromto ()
> > 
> > #0  0x08060453 in updatePointersInRootObjectsFromto ()
> > #1  0x08060a77 in mapPointersInObjectsFromto ()
> > #2  0x08060bb0 in incCompBody ()
> > #3  0x08065fa7 in incrementalGC ()
> > #4  0x080661a4 in sufficientSpaceAfterGC ()
> > #5  0x08069420 in primitiveNew ()
> > #6  0x0806de15 in interpret ()
> > #7  0x08073dfe in main ()
> > 
> > 
> > 
> > [Thread debugging using libthread_db enabled]
> > 0x08060453 in updatePointersInRootObjectsFromto ()
> > 
> > #0  0x08060453 in updatePointersInRootObjectsFromto ()
> > #1  0x08060a77 in mapPointersInObjectsFromto ()
> > #2  0x08060bb0 in incCompBody ()
> > #3  0x08065fa7 in incrementalGC ()
> > #4  0x080661a4 in sufficientSpaceAfterGC ()
> > #5  0x08069420 in primitiveNew ()
> > #6  0x0806de15 in interpret ()
> > #7  0x08073dfe in main ()
> > 
> > 
> > 
> > 
> > [Thread debugging using libthread_db enabled]
> > 0x08060453 in updatePointersInRootObjectsFromto ()
> > 
> > #0  0x08060453 in updatePointersInRootObjectsFromto ()
> > #1  0x08060a77 in mapPointersInObjectsFromto ()
> > #2  0x08060bb0 in incCompBody ()
> > #3  0x08065fa7 in incrementalGC ()
> > #4  0x080661a4 in sufficientSpaceAfterGC ()
> > #5  0x0806fed2 in clone ()
> > #6  0x08070095 in primitiveClone ()
> > #7  0x0806de15 in interpret ()
> > #8  0x08073dfe in main ()
> > 
> > 
> > [Thread debugging using libthread_db enabled]
> > 0x08060453 in updatePointersInRootObjectsFromto ()
> > 
> > #0  0x08060453 in updatePointersInRootObjectsFromto ()
> > #1  0x08060a77 in mapPointersInObjectsFromto ()
> > #2  0x08060bb0 in incCompBody ()
> > #3  0x08065fa7 in incrementalGC ()
> > #4  0x080661a4 in sufficientSpaceAfterGC ()
> > #5  0x08069270 in primitiveNewWithArg ()
> > #6  0x0806de15 in interpret ()
> > #7  0x08073dfe in main ()
> > 
> > 
> > [Thread debugging using libthread_db enabled]
> > 0xb76f0f68 in select () from /lib/libc.so.6
> > 
> > #0  0xb76f0f68 in select () from /lib/libc.so.6
> > #1  0x08070880 in aioPoll ()
> > #2  0xb762419e in ?? () from /usr/lib/squeak/4.0.3-2202//so.vm-display-X11
> > #3  0x08073595 in ioRelinquishProcessorForMicroseconds ()
> > #4  0x08061f24 in primitiveRelinquishProcessor ()
> > #5  0x0806de15 in interpret ()
> > #6  0x08073dfe in main ()
> > 
> > 
> > [Thread debugging using libthread_db enabled]
> > 0x08060453 in updatePointersInRootObjectsFromto ()
> > 
> > #0  0x08060453 in updatePointersInRootObjectsFromto ()
> > #1  0x08060a77 in mapPointersInObjectsFromto ()
> > #2  0x08060bb0 in incCompBody ()
> > #3  0x08065fa7 in incrementalGC ()
> > #4  0x080661a4 in sufficientSpaceAfterGC ()
> > #5  0x08069420 in primitiveNew ()
> > #6  0x0806de15 in interpret ()
> > #7  0x08073dfe in main ()
> > 
> > 
> > 
> > [Thread debugging using libthread_db enabled]
> > 0x08064e7e in markAndTrace ()
> > 
> > #0  0x08064e7e in markAndTrace ()
> > #1  0x0806593a in markPhase ()
> > #2  0x08065f60 in incrementalGC ()
> > #3  0x080661a4 in sufficientSpaceAfterGC ()
> > #4  0x0806fed2 in clone ()
> > #5  0x08070095 in primitiveClone ()
> > #6  0x0806de15 in interpret ()
> > #7  0x08073dfe in main ()
> > 
> > 
> > 
> > 
> > On 15.05.2013, at 13:59, Mariano Martinez Peck <marianopeck at gmail.com> wrote:
> > 
> >> Ok. So, first thing you should try, is to replace the uses of LargeIdentityDictionary with IdentityDictionary. And LargeIdentitySet with IdentitySet.
> >> If the problem disappears, then yes, there is something wrong with LargeCollections. If there is a problem with them, try updating VM, since they use a particular primitive. 
> >> Let us know!
> >> 
> >> 
> >> On Tue, May 14, 2013 at 9:29 AM, Max Leske <maxleske at gmail.com> wrote:
> >> 
> >> On 14.05.2013, at 13:52, Mariano Martinez Peck <marianopeck at gmail.com> wrote:
> >> 
> >>> Hi Max. Question, are you able to reproduce the problem?
> >> 
> >> Yes, but not "on purpose". The situation usually happens once or twice a day and then with consistent log entries. That's why I want to use gdb the next time it happens.
> >> 
> >>> 
> >>> 
> >>> On Tue, Apr 30, 2013 at 3:57 PM, Max Leske <maxleske at gmail.com> wrote:
> >>> Hi guys
> >>> 
> >>> I have a problem serializing a graph. Sometimes (not always) the image will consume +/- 100% CPU and stop responding. I was able to pin the problem down a bit:
> >>> - fails always in FLIteratingCluster>>registerIndexesOn: when called from FLFixedObjectCluster with class TimeStamp (this might not actually be relevan but it's consistent)
> >>> - the problem *might* be in FLLargeIdentityDictionary>>at:put: (or further up the stack)
> >>> 
> >>> I've done excessive logging to a file but even with flushing after every write the results are not consistent. Sometimes the image locks after leaving #at:put: sometimes it does somewhere in the middle or in #registerIndexesOn: (but remember: the logging might not be precise).
> >>> 
> >>> It's probably not the size of the objects in the cluster (the graph is big but not overly large), since there are other clusters with more objects.
> >>> 
> >>> What I did find is that the #grow operation for HashedCollections can be *very* slow, up to 20 seconds or more, at other times the snapshot runs through within no time.
> >>> 
> >>> So here's my theory: There migth be a VM problem with HashedCollections.
> >>> Now, the VM is a rather old one and I haven't had the possibility to test this with a newer one (but I'll probably have to). The version is Squeak4.0.3-2202 running on 32-bit Debian Squeeze.
> >>> 
> >>> I'll try some more but if anyone has any ideas I'd be very happy :)
> >>> 
> >>> Cheers,
> >>> Max
> >>> _______________________________________________
> >>> Pharo-fuel mailing list
> >>> Pharo-fuel at lists.gforge.inria.fr
> >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-fuel
> >>> 
> >>> 
> >>> 
> >>> -- 
> >>> Mariano
> >>> http://marianopeck.wordpress.com
> >>> _______________________________________________
> >>> Pharo-fuel mailing list
> >>> Pharo-fuel at lists.gforge.inria.fr
> >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-fuel
> >> 
> >> 
> >> _______________________________________________
> >> Pharo-fuel mailing list
> >> Pharo-fuel at lists.gforge.inria.fr
> >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-fuel
> >> 
> >> 
> >> 
> >> 
> >> -- 
> >> Mariano
> >> http://marianopeck.wordpress.com
> >> _______________________________________________
> >> Pharo-fuel mailing list
> >> Pharo-fuel at lists.gforge.inria.fr
> >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-fuel
> > 
> > 
> > _______________________________________________
> > Pharo-fuel mailing list
> > Pharo-fuel at lists.gforge.inria.fr
> > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-fuel
> > 
> > 
> > 
> > 
> > -- 
> > Mariano
> > http://marianopeck.wordpress.com
> > _______________________________________________
> > Pharo-fuel mailing list
> > Pharo-fuel at lists.gforge.inria.fr
> > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-fuel
>