[Vm-dev] [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Guillermo Polito guillermopolito at gmail.com
Wed Feb 27 11:05:56 UTC 2019


Hi Manuel,

Disclaimer: I'm not an objectiveC expert either :)

When I did that (and compiled, and tested), I remember having seen that it
was using the automatic reference counting from objectiveC.
Actually, putting the [release] was producing for me a segmentation fault.
Did you try it yourself and did it fix the problem for you?


On Wed, Feb 27, 2019 at 10:52 AM Manuel Leuenberger <
leuenberger at inf.unibe.ch> wrote:

>
> Not sure if I found something, but looking
> at sqSqueakOSXApplication+events.m I find
>
> - (void) pumpRunLoopEventSendAndSignal:(BOOL)signal {
>        NSEvent *event;
>        NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
>        while ((event = [NSApp nextEventMatchingMask:NSEventMaskAny
>                              untilDate:nil
>                              inMode:NSEventTrackingRunLoopMode
>                              dequeue:YES])) {
>          // If the event is not a system event or an event of *this*
> window, queue the event
>          // Otherwise treat the event normally and send it to the app
>          if (event.window && event.window != gDelegateApp.window){
>            [alienEventQueue addObject: event];
>          }else{
>            [NSApp sendEvent: event];
>            if (signal) {
>
>  interpreterProxy->signalSemaphoreWithIndex(gDelegateApp.squeakApplication.inputSemaphoreIndex);
>            }
>          }
>      }
>
>      // Put back in the queue all events that did not belong to this window
>      // They will be managed by other windowing systems
>      // (or by a subsequent invocation to this function)
>      while ((event = [alienEventQueue firstObject])) {
>        [NSApp postEvent: event atStart: NO];
>        [alienEventQueue removeObject: event];
>      }
> }
>
> I have no clue about Objective-C, but I think
>
>  NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
>
> makes me own the queue on the heap, isn't there a
>
> [alienEventQueue release];
>
> missing at the end?
>
> Cheers,
> Manuel
>
> On 26 Feb 2019, at 23:31, Manuel Leuenberger <leuenberger at inf.unibe.ch>
> wrote:
>
> Alright, back in the game, this time using gperftools.
>
> I finally managed to reproduce a leaking run, but it took me hours to get
> there, randomly rendering GtCoders and messing around. gperftools
> instruments malloc and friends, capturing heap profiles every 30 seconds.
> Now I have profiles from a freshly started, small image, before it went
> berserk on memory hogging, and when it allocates over 3GB. Now, I have
> visualizations of allocation sites at these instants, and diffs between
> them, as well as some thread dumps on the berserk state. I managed to rule
> out the in-image objects to contribute to memory by cleaning images with
> garbage collect and cache resets. Some handy resets are these:
>
> NECController reset.
> FreeTypeCache current removeAll.
> ASTCache reset.
> Smalltalk garbageCollect.
>
> These get rid of a leak in NECController, resets the font cache (creates a
> lot of bitmaps), and ASTs, created by the coders. With
>
> SpaceTally printSpaceAnalysis.
>
> I get objects stats from within the image, and I managed to keep them in
> the same ballpark for all instants I took profiles of, by inspecting
> over-populated instances, like this
>
> instances := WeakOrderedCollection withAll: RBMessageNode allInstances.
> ReferenceFinder findPathTo: instances atRandom.
>
> With
>
> FFIExternalResourceManager uniqueInstance instVarNamed: #registry
>
> I also made sure that external objects are clean.
>
> Now, I am pretty confident that I can clean images, so the memory-overload
> should be sourced in the VM. Still, I cannot reliably reproduce the leak.
> It just happens after running an image for hours, doing stuff, idling,
> going to sleep, wake up.
>
> Unfortunately, I did not run it on a debug VM, so nearly no symbols, but I
> prepared a summary of the analysis anyway, as it already gives some hints
> that some of you can maybe interpret better than me. Have look at the PDFs
> in particular, they show the allocation sites at the three interesting
> instants, (clean, before berserk, after berserk), as well as the diffs in
> between.
>
> https://www.dropbox.com/s/zusnfn5mhyw3rmp/memory-analysis.zip?dl=0
>
> I find that berserk mode is caused by a huge pile of Objective-C objects.
> The thread samples show that while idling, it seems to be busy processing
> OS events. But I do not know what kind of objects they actually are.
>
> My current hypothesis is:
>
> The leak is caused by processing a huge pile of OS events being processed
> (reprocessed?). For some reason, this only happens after a long run time.
>
> Now, how can I accept/reject this hypothesis? Is there a way to log events
> received by the VM? I also have a core dump of the 3GB process, but I have
> no idea how I could check it for Objective-C objects grouped by type. If
> anybody knows how to do that, I might be able to come up with a
> comprehensive list of objects that lead to the berserk mode. Maybe it is
> not even are real leak, but just accumulating objects that should be thrown
> away at some point. Also, I can still not fully reject that the image is
> involved, maybe it forgets external resources it should not forget at some
> point.
>
> As I now know again that I can trigger berserk mode, I will switch to the
> debug VM/Moz2D again with ASan. More symbols, leak reports, might reveal
> better information.
>
> Anyway, if anybody made it through the whole email and maybe even looked
> at the ZIP, I would appreciate ideas on how to proceed, what to test, what
> to inspect. I recently had a few incidents where I realized that one of my
> implicit assumptions/axioms does not hold, and I kept digging in the wrong
> direction for too long, so any feedback on this is welcome.
>
> Cheers,
> Manuel
>
> On 30 Jan 2019, at 18:40, Manuel Leuenberger <leuenberger at inf.unibe.ch>
> wrote:
>
> I thought about adding a flag for all ASan magic in the build scripts,
> then document the procedure in the readme, maybe also more extensive
> somewhere else. You probably have a better idea where a good place would be
> than I do.
>
> I dug into this today again and wrote a parser for the reports (
> https://github.com/maenu/leak-reporter). It has a not-so-good
> visualization of merged stacks showing heavy frames as big and solid, it
> also includes example reports (
> https://github.com/maenu/leak-reporter/tree/master/res). The reported
> leaks only amount for about 1% of leaked memory, not too many objects
> leaking, but I am not sure if ASan just counts object layout size or
> objects graphs (probably not). What I notice is that OS events are leaking,
> as they are copied. Might be an issue. But I am not entirely sure if this
> explains everything. Maybe the VM just allocates too many objects of some
> type, but frees them on shutdown, so they might not even appear in the
> reports. Can a C heap analyzer give me stats about struct instances per
> struct type? Might be useful to analyze core dumps then.
>
> Cheers,
> Manuel
>
> <Screen Shot 2019-01-30 at 18.27.22.png>
>
> Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
>     #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
>     #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
>     #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
>     #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
>     #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
>     #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
>     #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
>     #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
>     #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
>     #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
>     #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
>     #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
>     #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
>     #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
>     #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
>     #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
>     #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
>     #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
>     #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
>     #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
>     #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
>     #21 0x119a75794  (<unknown module>)
>     #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
>     #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
>     #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
>     #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
>     #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
>     #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
>     #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
>     #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
>
>
>
>
> On 29 Jan 2019, at 20:19, Alistair Grant <akgrant0710 at gmail.com> wrote:
>
>
> On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
>
>
>
> ...
>
> So this is great.  The final step is seeing which FFI method invokes
> moz2d_draw_target_create_for_data_type and then the storage leak can be
> fixed.
> What would be wonderful is to capture your experience in diagnosing this
> storage leak as a guide to others.  But in what form?  One would be a blog
> post.  I’m nervous though because blogs sometimes disappear.  One would be
> some section in one of the READMEs in the source tree.  One would be a
> carefully written email to this list recapitulating or summarizing the
> experience.
>
> The advantage of the mailing list is that it won’t go away any time soon.
> So how about writing a new message in this thread which summarizes?  It
> could be written as more of a guide “Storage leaks come in more than one
> variety...  To diagnose a leak in the C heap... To link against a C leak
> checker...” and include links to the other messages in this thread on the
> mail list serve for details “Specifics on how to modify the Mac build to
> link in the leak checker are ...”.
>
> What do you think?  What to others think?  How do we best capture this for
> posterity?
>
>
> I would split this in two:
>
> - The basic instructions on compiling with the leak checker should go
> in the HowToBuild.
> - The description you're suggesting above should go in
> http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
> OpenSmalltalk-VM repository.
>
> Each should provide a link to the other.
>
> I'm not a big fan of github wiki pages, but they have two advantages:
>
> - They keep the content with the code.
> - They're relatively easy to download and back-up offline (in the
> event that we decide to move the code elsewhere).
>
> My preference would be http://opensmalltalk.org/, but I'm not sure
> what the access rights on it are.  Actually, I didn't even know it
> existed, thanks Ben!
>
> As Tim said, email lists are hard to search, and can't be easily
> updated.  Blog posts don't really lend themselves to being updated
> either.
>
> My 2c...
>
> Cheers,
> Alistair
>
>
>
>
>

-- 



Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - *http://www.cnrs.fr
<http://www.cnrs.fr>*


*Web:* *http://guillep.github.io* <http://guillep.github.io>

*Phone: *+33 06 52 70 66 13
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190227/c8a6314f/attachment-0001.html>


More information about the Vm-dev mailing list