[Vm-dev] [Pharo-users] macOS VM builds with AddressSanitizer/LeakSanitizer

Eliot Miranda eliot.miranda at gmail.com
Tue Jan 29 14:32:11 UTC 2019


Hi Manuel,

> On Jan 29, 2019, at 3:16 AM, Manuel Leuenberger <leuenberger at inf.unibe.ch> wrote:
> 
> As it turns out, I am just a special kind of an idiot.
> 
> After Alistair's hint at just stepping through the VM, I threw all my fears of gdb etc. over board an stepped around in lldb. The failed primitive was indeed due to the missing SurfacePlugin, which I had in the standard build, but not in my leak build. So I looked at my git diff again and found that I disabled the CroquetPlugin, because it caused some trouble a while ago. It did not seem too harmful, as most stuff worked, and I thought I do not need this as it seemed related to the Croquet project for multi-user applications. What could possibly go wrong? Well, apparently I couldn't really draw without it. It's funny how initially seemingly unrelated decisions lead to side-effects that seem unexplainable because I forget that I made this decision.
> 
> Anyway, now I have those finger-licking good leak reports from ASan for GToolkit builds including resolved symbols and source line numbers for both the VM and Moz2D. Time for some leak hunting now, I'll keep you posted if I find something in the VM that needs attention.
> 
> Sorry for the spam and that you had to observe an idiot trying to hammer the ball into the square hole. Maybe it was at least entertaining for you.
> 
> Thanks for all the help, you managed to take away my fear of the VM, it no longer intimidates me.

Wonderful!!!! Welcome! But see below.

> Cheers,
> Manuel
> 
> Direct leak of 184 byte(s) in 1 object(s) allocated from:
>     #0 0x10c58748c in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x5d48c)
>     #1 0x12599889d in moz_xmalloc mozalloc.cpp:83
>     #2 0x123ce2895 in mozilla::gfx::Factory::CreateDrawTargetForData(mozilla::gfx::BackendType, unsigned char*, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, int, mozilla::gfx::SurfaceFormat, bool) mozalloc.h:194
>     #3 0x126282c4b in moz2d_draw_target_create_for_data_type draw_target.cpp:35
>     #4 0x10b76eca6 in primitiveCalloutWithArgs (Pharo:x86_64+0x10035cca6)
>     #5 0x10b47f472 in primitiveExternalCall gcc3x-cointerp.c:76887
>     #6 0x10b4765ed in executeNewMethod gcc3x-cointerp.c:22341
>     #7 0x10b47b7fd in ceSendsupertonumArgs gcc3x-cointerp.c:16540
>     #8 0x1151a8134  (<unknown module>)
>     #9 0x10b414ac9 in interpret gcc3x-cointerp.c:2754
>     #10 0x10b672afa in -[sqSqueakMainApplication runSqueak] sqSqueakMainApplication.m:201
>     #11 0x7fffc93786fc in __NSFirePerformWithOrder (Foundation:x86_64+0xd76fc)
>     #12 0x7fffc78cfc56 in __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (CoreFoundation:x86_64h+0xa6c56)
>     #13 0x7fffc78cfbc6 in __CFRunLoopDoObservers (CoreFoundation:x86_64h+0xa6bc6)
>     #14 0x7fffc78b05f8 in __CFRunLoopRun (CoreFoundation:x86_64h+0x875f8)
>     #15 0x7fffc78b0033 in CFRunLoopRunSpecific (CoreFoundation:x86_64h+0x87033)
>     #16 0x7fffc6e10ebb in RunCurrentEventLoopInMode (HIToolbox:x86_64+0x30ebb)
>     #17 0x7fffc6e10bf8 in ReceiveNextEventCommon (HIToolbox:x86_64+0x30bf8)
>     #18 0x7fffc6e10b25 in _BlockUntilNextEventMatchingListInModeWithFilter (HIToolbox:x86_64+0x30b25)
>     #19 0x7fffc53a5a53 in _DPSNextEvent (AppKit:x86_64+0x46a53)
>     #20 0x7fffc5b217ed in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (AppKit:x86_64+0x7c27ed)
>     #21 0x7fffc539a3da in -[NSApplication run] (AppKit:x86_64+0x3b3da)
>     #22 0x7fffc5364e0d in NSApplicationMain (AppKit:x86_64+0x5e0d)
>     #23 0x7fffdd0a2234 in start (libdyld.dylib:x86_64+0x5234)

So this is great.  The final step is seeing which FFI method invokes moz2d_draw_target_create_for_data_type and then the storage leak can be fixed.  What would be wonderful is to capture your experience in diagnosing this storage leak as a guide to others.  But in what form?  One would be a blog post.  I’m nervous though because blogs sometimes disappear.  One would be some section in one of the READMEs in the source tree.  One would be a carefully written email to this list recapitulating or summarizing the experience.  

The advantage of the mailing list is that it won’t go away any time soon.  So how about writing a new message in this thread which summarizes?  It could be written as more of a guide “Storage leaks come in more than one variety...  To diagnose a leak in the C heap... To link against a C leak checker...” and include links to the other messages in this thread on the mail list serve for details “Specifics on how to modify the Mac build to link in the leak checker are ...”.

What do you think?  What to others think?  How do we best capture this for posterity?

> 
>> On 29 Jan 2019, at 00:18, Manuel Leuenberger <leuenberger at inf.unibe.ch> wrote:
>> 
>> 
>> Hi Alistair,
>> 
>> Thanks for the hint, the primitive error is indeed nil. And it seems to my sanitizer flags that cause the trouble, because I cannot reproduce the error when I build the VM without them. I tried different version of clang (Apple clang 9, MacPorts clang 6, MacPorts clang 7, Chromium clang 9) with different sanitizer flags, but could either not even start up the VM properly due to a reported stack-buffer-underflow, or the primitive failed.
>> 
>> Debugging revealed that createManualSurface returns -1 because registerSurface is not initialized. Halting in initSurfacePluginFunctionPointers then shows that findOrLoadModule("SurfacePlugin", 0) is 0, so there seems no plugin to be found. I need to continue digging.
>> 
>> Cheers,
>> Manuel
>> 
>>> On 28 Jan 2019, at 19:29, Alistair Grant <akgrant0710 at gmail.com> wrote:
>>> 
>>> 
>>> Hi Manuel,
>>> 
>>> On Mon, 28 Jan 2019 at 11:51, Manuel Leuenberger
>>> <leuenberger at inf.unibe.ch> wrote:
>>>> 
>>>> BlExternalForm >> primCreateManualSurfaceWidth: width height: height rowPitch: rowPitch depth: depth isMSB: isMSB
>>>> <primitive: 'primitiveCreateManualSurface' module: 'SqueakFFIPrims'>
>>>> self primitiveFailed
>>>> 
>>>> How can I debug this?
>>> 
>>> My first suggestion would be to see if the primitive is returning a
>>> descriptive error.  The method would become (untested):
>>> 
>>> BlExternalForm >> primCreateManualSurfaceWidth: width height: height
>>> rowPitch: rowPitch depth: depth isMSB: isMSB
>>>   <primitive: 'primitiveCreateManualSurface' module:
>>> 'SqueakFFIPrims' error: error>
>>>   self primitiveFailed
>>> 
>>> When the primitive fails, open the debugger, inspect this method in
>>> the call stack and check the value of error.  If the primitive hasn't
>>> been found, error will be something like #'not found' (from memory).
>>> If it's something else other than nil, report back with the value.
>>> 
>>> If error is nil it will be a matter of diving in to the Slang / C code
>>> and figuring it out :-).  If you've successfully built a debug version
>>> of the VM, maybe the easiest next step is to set a breakpoint in
>>> primitiveCreateManualSurface and step through?
>>> 
>>> Cheers,
>>> Alistair

_,,,^..^,,,_ (phone)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190129/b813a3db/attachment-0001.html>


More information about the Vm-dev mailing list