[Vm-dev] [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Manuel Leuenberger leuenberger at inf.unibe.ch
Thu Feb 28 12:11:12 UTC 2019


Yeah, I am not sure if my fix only has the effects wanted to release the alienEventQueue, we need expert judgment here.

> On 28 Feb 2019, at 12:19, Tobias Pape <Das.Linux at gmx.de> wrote:
> 
> 
> Hi,
> 
>> On 28.02.2019, at 11:57, Manuel Leuenberger <leuenberger at inf.unibe.ch <mailto:leuenberger at inf.unibe.ch>> wrote:
>> 
>> Found it, reproduced it, fixed it.
>> 
>> https://twitter.com/maenuleu/status/1101068937750634496 <https://twitter.com/maenuleu/status/1101068937750634496>
>> https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/373 <https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/373>
>> 
>> I think memory consumption should be a point in the review process, and there should be memory performance tests (are there any already?).
>> 
>> Took me two months to write the necessary two lines of code.
> 
> Read it. Makes sense.
> I first had the suspicion that the thing allocated ehre may be sent/processed somewhere else, too. But I read around and it looks good.
> 
> Have added a few comments to the PR. Am fine with the PR.
> 
> Best regards
> 	-tObias
> 
> 
>> 
>> Cheers,
>> Manuel
>> 
>>> On 28 Feb 2019, at 02:52, John M McIntosh <johnmci at smalltalkconsulting.com> wrote:
>>> 
>>> Hi, so if you point your git tool to the squeak vm tree at commit 
>>> fae3d7eb403158f177c306f668cb461c35c99419
>>> The you can download the JMM/IOS64 branch which should let you build a OSX VM from xCode and do address sanitation, leak detection etc, from the comforts of xCode and Instruments. (check Instruments out) allocations & generations etc. 
>>> 
>>> if you turn on address sanitizer, detect use of stack after return, undefined behavior sanitizer, (pause on  issues), Main Thread checker (Pause on issues)  Likely there is 'stuff' there that could be fixed...
>>> 
>>> 
>>> As for the alloc/init  the vm can be build as arc or non-arc. And or a mixture used at a file level, plus of course bridging from core foundation to swift and indicating who has ownership of the object. 
>>> 
>>> 
>>> 
>>> ....
>>> John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk
>>> 
>>> 
>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>> On Wednesday, February 27, 2019 3:05 AM, Guillermo Polito <guillermopolito at gmail.com> wrote:
>>> 
>>>> Hi Manuel,
>>>> 
>>>> Disclaimer: I'm not an objectiveC expert either :) 
>>>> 
>>>> When I did that (and compiled, and tested), I remember having seen that it was using the automatic reference counting from objectiveC.
>>>> Actually, putting the [release] was producing for me a segmentation fault.
>>>> Did you try it yourself and did it fix the problem for you?
>>>> 
>>>> 
>>>> On Wed, Feb 27, 2019 at 10:52 AM Manuel Leuenberger <leuenberger at inf.unibe.ch> wrote:
>>>> 
>>>> Not sure if I found something, but looking at sqSqueakOSXApplication+events.m I find 
>>>> 
>>>> - (void) pumpRunLoopEventSendAndSignal:(BOOL)signal {
>>>>       NSEvent *event;
>>>>       NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
>>>>       while ((event = [NSApp nextEventMatchingMask:NSEventMaskAny
>>>>                             untilDate:nil
>>>>                             inMode:NSEventTrackingRunLoopMode
>>>>                             dequeue:YES])) {
>>>>         // If the event is not a system event or an event of *this* window, queue the event
>>>>         // Otherwise treat the event normally and send it to the app
>>>>         if (event.window && event.window != gDelegateApp.window){
>>>>           [alienEventQueue addObject: event];
>>>>         }else{
>>>>           [NSApp sendEvent: event];
>>>>           if (signal) {
>>>>               interpreterProxy->signalSemaphoreWithIndex(gDelegateApp.squeakApplication.inputSemaphoreIndex);
>>>>           }
>>>>         }
>>>>     }
>>>> 
>>>>     // Put back in the queue all events that did not belong to this window
>>>>     // They will be managed by other windowing systems
>>>>     // (or by a subsequent invocation to this function)
>>>>     while ((event = [alienEventQueue firstObject])) {
>>>>       [NSApp postEvent: event atStart: NO];
>>>>       [alienEventQueue removeObject: event];
>>>>     }
>>>> }
>>>> 
>>>> I have no clue about Objective-C, but I think 
>>>> 
>>>> NSMutableArray *alienEventQueue = [[NSMutableArray alloc] init];
>>>> 
>>>> makes me own the queue on the heap, isn't there a
>>>> 
>>>> [alienEventQueue release];
>>>> 
>>>> missing at the end?
>>>> 
>>>> Cheers,
>>>> Manuel
>>>> 
>>>>> On 26 Feb 2019, at 23:31, Manuel Leuenberger <leuenberger at inf.unibe.ch> wrote:
>>>>> 
>>>>> Alright, back in the game, this time using gperftools.
>>>>> 
>>>>> I finally managed to reproduce a leaking run, but it took me hours to get there, randomly rendering GtCoders and messing around. gperftools instruments malloc and friends, capturing heap profiles every 30 seconds. Now I have profiles from a freshly started, small image, before it went berserk on memory hogging, and when it allocates over 3GB. Now, I have visualizations of allocation sites at these instants, and diffs between them, as well as some thread dumps on the berserk state. I managed to rule out the in-image objects to contribute to memory by cleaning images with garbage collect and cache resets. Some handy resets are these:
>>>>> 
>>>>> NECController reset.
>>>>> FreeTypeCache current removeAll.
>>>>> ASTCache reset.
>>>>> Smalltalk garbageCollect.
>>>>> 
>>>>> These get rid of a leak in NECController, resets the font cache (creates a lot of bitmaps), and ASTs, created by the coders. With
>>>>> 
>>>>> SpaceTally printSpaceAnalysis.
>>>>> 
>>>>> I get objects stats from within the image, and I managed to keep them in the same ballpark for all instants I took profiles of, by inspecting over-populated instances, like this
>>>>> 
>>>>> instances := WeakOrderedCollection withAll: RBMessageNode allInstances.
>>>>> ReferenceFinder findPathTo: instances atRandom.
>>>>> 
>>>>> With
>>>>> 
>>>>> FFIExternalResourceManager uniqueInstance instVarNamed: #registry
>>>>> 
>>>>> I also made sure that external objects are clean.
>>>>> 
>>>>> Now, I am pretty confident that I can clean images, so the memory-overload should be sourced in the VM. Still, I cannot reliably reproduce the leak. It just happens after running an image for hours, doing stuff, idling, going to sleep, wake up.
>>>>> 
>>>>> Unfortunately, I did not run it on a debug VM, so nearly no symbols, but I prepared a summary of the analysis anyway, as it already gives some hints that some of you can maybe interpret better than me. Have look at the PDFs in particular, they show the allocation sites at the three interesting instants, (clean, before berserk, after berserk), as well as the diffs in between.
>>>>> 
>>>>> https://www.dropbox.com/s/zusnfn5mhyw3rmp/memory-analysis.zip?dl=0
>>>>> 
>>>>> I find that berserk mode is caused by a huge pile of Objective-C objects. The thread samples show that while idling, it seems to be busy processing OS events. But I do not know what kind of objects they actually are.
>>>>> 
>>>>> My current hypothesis is:
>>>>> 
>>>>> The leak is caused by processing a huge pile of OS events being processed (reprocessed?). For some reason, this only happens after a long run time.
>>>>> 
>>>>> Now, how can I accept/reject this hypothesis? Is there a way to log events received by the VM? I also have a core dump of the 3GB process, but I have no idea how I could check it for Objective-C objects grouped by type. If anybody knows how to do that, I might be able to come up with a comprehensive list of objects that lead to the berserk mode. Maybe it is not even are real leak, but just accumulating objects that should be thrown away at some point. Also, I can still not fully reject that the image is involved, maybe it forgets external resources it should not forget at some point.
>>>>> 
>>>>> As I now know again that I can trigger berserk mode, I will switch to the debug VM/Moz2D again with ASan. More symbols, leak reports, might reveal better information.
>>>>> 
>>>>> Anyway, if anybody made it through the whole email and maybe even looked at the ZIP, I would appreciate ideas on how to proceed, what to test, what to inspect. I recently had a few incidents where I realized that one of my implicit assumptions/axioms does not hold, and I kept digging in the wrong direction for too long, so any feedback on this is welcome.
>>>>> 
>>>>> Cheers,
>>>>> Manuel
>>>>> 
>>>>>> On 30 Jan 2019, at 18:40, Manuel Leuenberger <leuenberger at inf.unibe.ch> wrote:
>>>>>> 
>>>>>> I thought about adding a flag for all ASan magic in the build scripts, then document the procedure in the readme, maybe also more extensive somewhere else. You probably have a better idea where a good place would be than I do.
>>>>>> 
>>>>>> I dug into this today again and wrote a parser for the reports (https://github.com/maenu/leak-reporter). It has a not-so-good visualization of merged stacks showing heavy frames as big and solid, it also includes example reports (https://github.com/maenu/leak-reporter/tree/master/res). The reported leaks only amount for about 1% of leaked memory, not too many objects leaking, but I am not sure if ASan just counts object layout size or objects graphs (probably not). What I notice is that OS events are leaking, as they are copied. Might be an issue. But I am not entirely sure if this explains everything. Maybe the VM just allocates too many objects of some type, but frees them on shutdown, so they might not even appear in the reports. Can a C heap analyzer give me stats about struct instances per struct type? Might be useful to analyze core dumps then.
>>>>>> 
>>>>>> Cheers,
>>>>>> Manuel
>>>>>> 
>>>>>> <Screen Shot 2019-01-30 at 18.27.22.png>
>>>>>> Direct leak of 219512 byte(s) in 1193 object(s) allocated from:
>>>>>>    #0 0x10bb1fffc in __sanitizer_mz_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5dffc)
>>>>>>    #1 0x7fffdd224281 in malloc_zone_malloc (libsystem_malloc.dylib:x86_64+0x2281)
>>>>>>    #2 0x7fffc6de33d5 in AllocateAndInitializeEvent(__CFAllocator const*) (HIToolbox:x86_64+0x33d5)
>>>>>>    #3 0x7fffc6e3fc4d in CopyEventInternal(__CFAllocator const*, OpaqueEventRef*) (HIToolbox:x86_64+0x5fc4d)
>>>>>>    #4 0x7fffc6e49228 in CopyEventAs (HIToolbox:x86_64+0x69228)
>>>>>>    #5 0x7fffc6e1d349 in CreateEventWithCGEvent (HIToolbox:x86_64+0x3d349)
>>>>>>    #6 0x7fffc6e1b0e2 in CreateAndPostEventWithCGEvent(__CGEvent*, unsigned int, unsigned char, __CFMachPortBoost*) (HIToolbox:x86_64+0x3b0e2)
>>>>>>    #7 0x7fffc6e275cd in Convert1CGEvent(unsigned char) (HIToolbox:x86_64+0x475cd)
>>>>>>    #8 0x7fffc6e2745e in MainLoopObserver(unsigned int, OpaqueEventRef*, void*) (HIToolbox:x86_64+0x4745e)
>>>>>>    #9 0x7fffc6de8368 in _NotifyEventLoopObservers (HIToolbox:x86_64+0x8368)
>>>>>>    #10 0x7fffc6e10ea5 in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ea5)
>>>>>>    #11 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
>>>>>>    #12 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
>>>>>>    #13 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
>>>>>>    #14 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
>>>>>>    #15 0x10b7ce07c in -[sqSqueakOSXApplication(events) pumpRunLoopEventSendAndSignal:] sqSqueakOSXApplication+events.m:78
>>>>>>    #16 0x10b7ce7d9 in -[sqSqueakOSXApplication(events) pumpRunLoop] sqSqueakOSXApplication+events.m:105
>>>>>>    #17 0x10b7e4898 in vmIOProcessEvents sqSqueakEventsAPI.m:80
>>>>>>    #18 0x10b7e49c7 in ioProcessEvents sqSqueakEventsAPI.m:103
>>>>>>    #19 0x10b5b5653 in checkForEventsMayContextSwitch gcc3x-cointerp.c:62680
>>>>>>    #20 0x10b5c17ed in ceCheckForInterrupts gcc3x-cointerp.c:15188
>>>>>>    #21 0x119a75794  (<unknown module>)
>>>>>>    #22 0x10b566712 in interpret gcc3x-cointerp.c:2754
>>>>>>    #23 0x10b7ebada in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
>>>>>>    #24 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
>>>>>>    #25 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
>>>>>>    #26 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
>>>>>>    #27 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
>>>>>>    #28 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
>>>>>>    #29 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> Guille Polito
>>>> Research Engineer
>>>> 
>>>> Centre de Recherche en Informatique, Signal et Automatique de Lille
>>>> CRIStAL - UMR 9189
>>>> French National Center for Scientific Research - http://www.cnrs.fr <http://www.cnrs.fr/>
>>>> 
>>>> Web: http://guillep.github.io <http://guillep.github.io/>
>>>> Phone: +33 06 52 70 66 13

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190228/a1870cd7/attachment-0001.html>


More information about the Vm-dev mailing list