[Vm-dev] [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Ben Coman btc at openinworld.com
Thu Feb 28 13:48:07 UTC 2019


Great work Manuel. Thanks for your perseverance.

cheers -ben

On Thu, 28 Feb 2019 at 18:57, Manuel Leuenberger <leuenberger at inf.unibe.ch>
wrote:

>
> Found it, reproduced it, fixed it.
>
> https://twitter.com/maenuleu/status/1101068937750634496
> https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/373
>
> I think memory consumption should be a point in the review process, and
> there should be memory performance tests (are there any already?).
>
> Took me two months to write the necessary two lines of code.
>
> Cheers,
> Manuel
>
> On 28 Feb 2019, at 02:52, John M McIntosh <johnmci at smalltalkconsulting.com>
> wrote:
>
> Hi, so if you point your git tool to the squeak vm tree at commit
> fae3d7eb403158f177c306f668cb461c35c99419
> The you can download the JMM/IOS64 branch which should let you build a OSX
> VM from xCode and do address sanitation, leak detection etc, from the
> comforts of xCode and Instruments. (check Instruments out) allocations &
> generations etc.
>
> if you turn on address sanitizer, detect use of stack after return,
> undefined behavior sanitizer, (pause on  issues), Main Thread checker
> (Pause on issues)  Likely there is 'stuff' there that could be fixed...
>
>
> As for the alloc/init  the vm can be build as arc or non-arc. And or a
> mixture used at a file level, plus of course bridging from core foundation
> to swift and indicating who has ownership of the object.
>
>
>
> ....
> John M. McIntosh. Corporate Smalltalk Consulting Ltd
> https://www.linkedin.com/in/smalltalk
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Wednesday, February 27, 2019 3:05 AM, Guillermo Polito <
> guillermopolito at gmail.com> wrote:
>
> Hi Manuel,
>
> Disclaimer: I'm not an objectiveC expert either :)
>
> When I did that (and compiled, and tested), I remember having seen that it
> was using the automatic reference counting from objectiveC.
> Actually, putting the [release] was producing for me a segmentation fault.
> Did you try it yourself and did it fix the problem for you?
>
>
> On Wed, Feb 27, 2019 at 10:52 AM Manuel Leuenberger <
> leuenberger at inf.unibe.ch> wrote:
>
>>
>> Not sure if I found something, but looking
>> at sqSqueakOSXApplication+events.m I find
>>
>> - (void) pumpRunLoopEventSendAndSignal:(BOOL)signal {
>>        NSEvent *event;
>>        NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
>>        while ((event = [NSApp nextEventMatchingMask:NSEventMaskAny
>>                              untilDate:nil
>>                              inMode:NSEventTrackingRunLoopMode
>>                              dequeue:YES])) {
>>          // If the event is not a system event or an event of *this*
>> window, queue the event
>>          // Otherwise treat the event normally and send it to the app
>>          if (event.window && event.window != gDelegateApp.window){
>>            [alienEventQueue addObject: event];
>>          }else{
>>            [NSApp sendEvent: event];
>>            if (signal) {
>>
>>  interpreterProxy->signalSemaphoreWithIndex(gDelegateApp.squeakApplication.inputSemaphoreIndex);
>>            }
>>          }
>>      }
>>
>>      // Put back in the queue all events that did not belong to this
>> window
>>      // They will be managed by other windowing systems
>>      // (or by a subsequent invocation to this function)
>>      while ((event = [alienEventQueue firstObject])) {
>>        [NSApp postEvent: event atStart: NO];
>>        [alienEventQueue removeObject: event];
>>      }
>> }
>>
>> I have no clue about Objective-C, but I think
>>
>>  NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
>>
>> makes me own the queue on the heap, isn't there a
>>
>> [alienEventQueue release];
>>
>> missing at the end?
>>
>> Cheers,
>> Manuel
>>
>> On 26 Feb 2019, at 23:31, Manuel Leuenberger <leuenberger at inf.unibe.ch>
>> wrote:
>>
>> Alright, back in the game, this time using gperftools.
>>
>> I finally managed to reproduce a leaking run, but it took me hours to get
>> there, randomly rendering GtCoders and messing around. gperftools
>> instruments malloc and friends, capturing heap profiles every 30 seconds.
>> Now I have profiles from a freshly started, small image, before it went
>> berserk on memory hogging, and when it allocates over 3GB. Now, I have
>> visualizations of allocation sites at these instants, and diffs between
>> them, as well as some thread dumps on the berserk state. I managed to rule
>> out the in-image objects to contribute to memory by cleaning images with
>> garbage collect and cache resets. Some handy resets are these:
>>
>> NECController reset.
>> FreeTypeCache current removeAll.
>> ASTCache reset.
>> Smalltalk garbageCollect.
>>
>> These get rid of a leak in NECController, resets the font cache (creates
>> a lot of bitmaps), and ASTs, created by the coders. With
>>
>> SpaceTally printSpaceAnalysis.
>>
>> I get objects stats from within the image, and I managed to keep them in
>> the same ballpark for all instants I took profiles of, by inspecting
>> over-populated instances, like this
>>
>> instances := WeakOrderedCollection withAll: RBMessageNode allInstances.
>> ReferenceFinder findPathTo: instances atRandom.
>>
>> With
>>
>> FFIExternalResourceManager uniqueInstance instVarNamed: #registry
>>
>> I also made sure that external objects are clean.
>>
>> Now, I am pretty confident that I can clean images, so the
>> memory-overload should be sourced in the VM. Still, I cannot reliably
>> reproduce the leak. It just happens after running an image for hours, doing
>> stuff, idling, going to sleep, wake up.
>>
>> Unfortunately, I did not run it on a debug VM, so nearly no symbols, but
>> I prepared a summary of the analysis anyway, as it already gives some hints
>> that some of you can maybe interpret better than me. Have look at the PDFs
>> in particular, they show the allocation sites at the three interesting
>> instants, (clean, before berserk, after berserk), as well as the diffs in
>> between.
>>
>> https://www.dropbox.com/s/zusnfn5mhyw3rmp/memory-analysis.zip?dl=0
>>
>> I find that berserk mode is caused by a huge pile of Objective-C objects.
>> The thread samples show that while idling, it seems to be busy processing
>> OS events. But I do not know what kind of objects they actually are.
>>
>> My current hypothesis is:
>>
>> The leak is caused by processing a huge pile of OS events being processed
>> (reprocessed?). For some reason, this only happens after a long run time.
>>
>> Now, how can I accept/reject this hypothesis? Is there a way to log
>> events received by the VM? I also have a core dump of the 3GB process, but
>> I have no idea how I could check it for Objective-C objects grouped by
>> type. If anybody knows how to do that, I might be able to come up with a
>> comprehensive list of objects that lead to the berserk mode. Maybe it is
>> not even are real leak, but just accumulating objects that should be thrown
>> away at some point. Also, I can still not fully reject that the image is
>> involved, maybe it forgets external resources it should not forget at some
>> point.
>>
>> As I now know again that I can trigger berserk mode, I will switch to the
>> debug VM/Moz2D again with ASan. More symbols, leak reports, might reveal
>> better information.
>>
>> Anyway, if anybody made it through the whole email and maybe even looked
>> at the ZIP, I would appreciate ideas on how to proceed, what to test, what
>> to inspect. I recently had a few incidents where I realized that one of my
>> implicit assumptions/axioms does not hold, and I kept digging in the wrong
>> direction for too long, so any feedback on this is welcome.
>>
>> Cheers,
>> Manuel
>>
>> On 30 Jan 2019, at 18:40, Manuel Leuenberger <leuenberger at inf.unibe.ch>
>> wrote:
>>
>> I thought about adding a flag for all ASan magic in the build scripts,
>> then document the procedure in the readme, maybe also more extensive
>> somewhere else. You probably have a better idea where a good place would be
>> than I do.
>>
>> I dug into this today again and wrote a parser for the reports (
>> https://github.com/maenu/leak-reporter). It has a not-so-good
>> visualization of merged stacks showing heavy frames as big and solid, it
>> also includes example reports (
>> https://github.com/maenu/leak-reporter/tree/master/res). The reported
>> leaks only amount for about 1% of leaked memory, not too many objects
>> leaking, but I am not sure if ASan just counts object layout size or
>> objects graphs (probably not). What I notice is that OS events are leaking,
>> as they are copied. Might be an issue. But I am not entirely sure if this
>> explains everything. Maybe the VM just allocates too many objects of some
>> type, but frees them on shutdown, so they might not even appear in the
>> reports. Can a C heap analyzer give me stats about struct instances per
>> struct type? Might be useful to analyze core dumps then.
>>
>> Cheers,
>> Manuel
>>
>> <Screen Shot 2019-01-30 at 18.27.22.png>
>>
>> Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
>>     #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
>>     #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
>>     #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
>>     #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
>>     #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
>>     #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
>>     #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
>>     #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
>>     #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
>>     #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
>>     #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
>>     #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
>>     #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
>>     #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
>>     #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
>>     #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
>>     #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
>>     #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
>>     #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
>>     #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
>>     #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
>>     #21 0x119a75794  (<unknown module>)
>>     #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
>>     #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
>>     #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
>>     #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
>>     #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
>>     #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
>>     #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
>>     #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
>>
>>
>>
>>
>> On 29 Jan 2019, at 20:19, Alistair Grant <akgrant0710 at gmail.com> wrote:
>>
>>
>> On Tue, 29 Jan 2019 at 15:32, Eliot Miranda <eliot.miranda at gmail.com>
>> wrote:
>>
>>
>>
>> ...
>>
>> So this is great.  The final step is seeing which FFI method invokes
>> moz2d_draw_target_create_for_data_type and then the storage leak can be
>> fixed.
>> What would be wonderful is to capture your experience in diagnosing this
>> storage leak as a guide to others.  But in what form?  One would be a blog
>> post.  I’m nervous though because blogs sometimes disappear.  One would be
>> some section in one of the READMEs in the source tree.  One would be a
>> carefully written email to this list recapitulating or summarizing the
>> experience.
>>
>> The advantage of the mailing list is that it won’t go away any time
>> soon.  So how about writing a new message in this thread which summarizes?
>> It could be written as more of a guide “Storage leaks come in more than one
>> variety...  To diagnose a leak in the C heap... To link against a C leak
>> checker...” and include links to the other messages in this thread on the
>> mail list serve for details “Specifics on how to modify the Mac build to
>> link in the leak checker are ...”.
>>
>> What do you think?  What to others think?  How do we best capture this
>> for posterity?
>>
>>
>> I would split this in two:
>>
>> - The basic instructions on compiling with the leak checker should go
>> in the HowToBuild.
>> - The description you're suggesting above should go in
>> http://opensmalltalk.org/ as Ben suggested, or a wiki page in the
>> OpenSmalltalk-VM repository.
>>
>> Each should provide a link to the other.
>>
>> I'm not a big fan of github wiki pages, but they have two advantages:
>>
>> - They keep the content with the code.
>> - They're relatively easy to download and back-up offline (in the
>> event that we decide to move the code elsewhere).
>>
>> My preference would be http://opensmalltalk.org/, but I'm not sure
>> what the access rights on it are.  Actually, I didn't even know it
>> existed, thanks Ben!
>>
>> As Tim said, email lists are hard to search, and can't be easily
>> updated.  Blog posts don't really lend themselves to being updated
>> either.
>>
>> My 2c...
>>
>> Cheers,
>> Alistair
>>
>>
>>
>>
>>
>
> --
>
> Guille Polito
> Research Engineer
>
> Centre de Recherche en Informatique, Signal et Automatique de Lille
> CRIStAL - UMR 9189
> French National Center for Scientific Research - *http://www.cnrs.fr
> <http://www.cnrs.fr/>*
>
> *Web:* *http://guillep.github.io* <http://guillep.github.io/>
> *Phone: *+33 06 52 70 66 13
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190228/815a7354/attachment-0001.html>


More information about the Vm-dev mailing list