[Vm-dev] Difficult to debug VM crash with full blocks and Sista V1

Eliot Miranda eliot.miranda at gmail.com
Fri Oct 4 19:00:31 UTC 2019


Hi Nicolas,

On Thu, Oct 3, 2019 at 1:10 PM Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> wrote:

> I confirm that each crash follows a SIGIO -> forceInterruptCheck() with
> corrupted stack pointer pointing to methodZone (that is executable zone so
> program counter).
> I did:
> 1) remove static declaration of methodZoneBase and limitAddress in
> cogitX64Sysv.c so as to make them exported globals
> 2) declared them as extern sqInt in gcc-cointerp.c
> 3) instrumented forceInterruptCheck() so as to emit a warning and/or set a
> breakpoint:
>
>     if ( getsp() > methodZoneBase && getsp() <= limitAddress ) {
>         printf("pb in forceInterruptCheck\n");
>     }
>
> Unlike what I said, forceInterruptCheck() does nothing special and
> certainly not invoke Smalltalk code nor JITTED code.
> The interruptCheckChain function that it could invoke is generally null
> (except for debug/simulation purpose), and always null in the crash i
> observed.
> The signal handler also generally operate on same stack (unless
> specifically asked via sigaction() - which is not the case).
> My earler diagnostic was driven by the false assumption that handler would
> use the interrupt stack pointer (IST), and thus that only special stack
> pointer manipulation happening in trampoline could corrupt it, hmmm that
> was late night blurred minded ;)
>
> What comes to my mind:
> - the signal SIGIO might happen while we are manipulating the stack
> pointer, hence the heisenbug
> - the stack pointer could be offseted by some incorrect code (but why no
> error would happen before SIGIO then?)
>
> I don't have clear understanding of all spur machinery details that would
> help further diagnosis
> Eliot, doesn't it sound interesting?
>

It looks to me like the right cause.  The corruption I was seeing in the
methodSZOne doesn't;t look like a bug in the JIT; it looks like
overwriting.  I had thought we were using sigaltstack to ensure the signal
handler runs on a dedicated stack, but it appears we don't.  If I'm right
that we're not using sigaltstack then this may be very easy to fix :-)

[I have to mention ALSA and its awful SIGIO sign al handling code;
see platforms/unix//vm-sound-ALSA/sqUnixSoundALSA.c.  I don't think this
affects the Mac, but it could be an issue on Linux]


>
> Le jeu. 3 oct. 2019 à 08:10, Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> a écrit :
>
>> Maybe I draw conclusions too quickly.
>> But clearly, the signal handler has a stack pointer pointing in
>> methodZone.
>>
>> (lldb) p/x $rsp
>> (unsigned long) $211 = 0x0000000107c92da0
>> (lldb) p/x methodZoneBase
>> (sqInt) $212 = 0x0000000107c01d30
>> (lldb) p/x limitAddress
>> (usqInt) $213 = 0x0000000107d5e000
>>
>> It remains to see how that happened.
>>
>> Le jeu. 3 oct. 2019 à 05:49, Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com> a écrit :
>>
>>> Hi all,
>>> So I finally got it!
>>> That's our specific handling of SIGIO with forceInterruptCheck.
>>> Don't you JIT the signal handler!
>>>
>>> The signal handler is using a different stack pointer so as to not
>>> interfere with the program.
>>> Until Cog decide to jump into JITTED code from within the signal handler.
>>> It now share a stack pointer pointing somewhere in cog methodZone, and
>>> anything it does with it's stack (call a function, save registers...) will
>>> destroy the JITTED code.
>>> It even happened when I used the debugger which shares the interrupt
>>> handler stack pointer!
>>>
>>> Why does it not happen in linux, and why only Mac OS 64, I have no
>>> clue...
>>> But we can't decently live very long with such handling.
>>>
>>> Le mer. 2 oct. 2019 à 23:29, Nicolas Cellier <
>>> nicolas.cellier.aka.nice at gmail.com> a écrit :
>>>
>>>> Hi Eliot,
>>>> so I have set watchpoint to 4 previous crash and got this:
>>>>
>>>> (lldb) run ../../image/crash.image
>>>> There is a running process, kill it and restart?: [Y/n]
>>>> Process 98150 exited with status = 9 (0x00000009)
>>>> Process 98164 launched:
>>>> '/Users/nicolas/Smalltalk/OpenSmalltalk/opensmalltalk-vm/build.macos64x64/squeak.cog.spur/SqueakDebug.app/Contents/MacOS/Squeak'
>>>> (x86_64)
>>>> Process 98164 stopped
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint
>>>> 1.1
>>>>     frame #0: 0x00000001001618ec Squeak`-[sqSqueakMainApplication
>>>> setupMenus](self=0x000000010076e1a0, _cmd="setupMenus") at
>>>> sqSqueakMainApplication.m:123
>>>>    120
>>>>    121 - (void) setupMenus {
>>>>    122 //    nothing to do so far since the menu is setup in the
>>>> MainMenu.nib file
>>>> -> 123 }
>>>>    124
>>>>    125 - (void) setupTimers {
>>>>    126 extern void SetUpTimers(void);
>>>> Target 0: (Squeak) stopped.
>>>> (lldb) wa se e -s 1 -- 0x107c992b4
>>>> Watchpoint created: Watchpoint 35: addr = 0x107c992b4 size = 1 state =
>>>> enabled type = w
>>>> (lldb) wa se e -s 1 -- 0x107c929e0
>>>> Watchpoint created: Watchpoint 34: addr = 0x107c929e0 size = 1 state =
>>>> enabled type = w
>>>> (lldb) wa se e -s 1 -- 0x0000000107c945df
>>>> Watchpoint created: Watchpoint 36: addr = 0x107c945df size = 1 state =
>>>> enabled type = w
>>>> (lldb) wa se e -s 1 -- 0x0000000107c92da3
>>>> Watchpoint created: Watchpoint 37: addr = 0x107c92da3 size = 1 state =
>>>> enabled type = w
>>>>     new value: -3689348814741910324
>>>> (lldb) c
>>>> Process 98164 resuming
>>>>
>>>> Watchpoint 34 hit:
>>>> new value: -3689348814741910456
>>>> Process 98164 stopped
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint
>>>> 34
>>>>     frame #0: 0x0000000100102067
>>>> Squeak`byteAtPointerput(ptr="H???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????,
>>>> val=72) at sqMemoryAccess.h:136
>>>>    133  /* Use static inline functions when the compiler produces
>>>> efficient code for small accessors.
>>>>    134     These are preferred because static type checking will
>>>> prevent inadvertent confusion of pointers and oops. */
>>>>    135  static inline sqInt byteAtPointer(char *ptr) { return
>>>> (sqInt)(*((unsigned char *)ptr)); }
>>>> -> 136  static inline sqInt byteAtPointerput(char *ptr, int val) {
>>>> return (sqInt)(*((unsigned char *)ptr)= (unsigned char)val); }
>>>>    137  static inline sqInt shortAtPointer(char *ptr) { return
>>>> (sqInt)(*((short *)ptr)); }
>>>>    138  static inline sqInt shortAtPointerput(char *ptr, int val) {
>>>> return (sqInt)(*((short *)ptr)= (short)val); }
>>>>    139  static inline sqInt intAtPointer(char *ptr) { return
>>>> (sqInt)(*((int *)ptr)); }
>>>> Target 0: (Squeak) stopped.
>>>> (lldb)
>>>> Process 98164 resuming
>>>>
>>>> Watchpoint 37 hit:
>>>> old value: -3689348814741910324
>>>> new value: -3689912642157150198
>>>> Process 98164 stopped
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint
>>>> 37
>>>>     frame #0: 0x000000010012de84
>>>> Squeak`fillInMethodHeadersizeselector(method=0x0000000107c92da0, size=1240,
>>>> selector=4435376352) at cogitX64SysV.c:6321
>>>>    6318
>>>>    6319 (method->cmType = CMMethod);
>>>>    6320 (method->objectHeader = nullHeaderForMachineCodeMethod());
>>>> -> 6321 (method->blockSize = size);
>>>>    6322 (method->methodObject = methodObj);
>>>>    6323
>>>>    6324 /* If the method has already been cogged (e.g. Newspeak
>>>> accessors) then
>>>> Target 0: (Squeak) stopped.
>>>> (lldb)
>>>> Process 98164 resuming
>>>>
>>>> Watchpoint 36 hit:
>>>> new value: -3689348814741910520
>>>> Process 98164 stopped
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint
>>>> 36
>>>>     frame #0: 0x0000000100102067
>>>> Squeak`byteAtPointerput(ptr="\b???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????,
>>>> val=8) at sqMemoryAccess.h:136
>>>>    133  /* Use static inline functions when the compiler produces
>>>> efficient code for small accessors.
>>>>    134     These are preferred because static type checking will
>>>> prevent inadvertent confusion of pointers and oops. */
>>>>    135  static inline sqInt byteAtPointer(char *ptr) { return
>>>> (sqInt)(*((unsigned char *)ptr)); }
>>>> -> 136  static inline sqInt byteAtPointerput(char *ptr, int val) {
>>>> return (sqInt)(*((unsigned char *)ptr)= (unsigned char)val); }
>>>>    137  static inline sqInt shortAtPointer(char *ptr) { return
>>>> (sqInt)(*((short *)ptr)); }
>>>>    138  static inline sqInt shortAtPointerput(char *ptr, int val) {
>>>> return (sqInt)(*((short *)ptr)= (short)val); }
>>>>    139  static inline sqInt intAtPointer(char *ptr) { return
>>>> (sqInt)(*((int *)ptr)); }
>>>> Target 0: (Squeak) stopped.
>>>> (lldb)
>>>> Process 98164 resuming
>>>>
>>>> Watchpoint 35 hit:
>>>> new value: -3689348814741910456
>>>> Process 98164 stopped
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint
>>>> 35
>>>>     frame #0: 0x0000000100102067
>>>> Squeak`byteAtPointerput(ptr="H???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????,
>>>> val=72) at sqMemoryAccess.h:136
>>>>    133  /* Use static inline functions when the compiler produces
>>>> efficient code for small accessors.
>>>>    134     These are preferred because static type checking will
>>>> prevent inadvertent confusion of pointers and oops. */
>>>>    135  static inline sqInt byteAtPointer(char *ptr) { return
>>>> (sqInt)(*((unsigned char *)ptr)); }
>>>> -> 136  static inline sqInt byteAtPointerput(char *ptr, int val) {
>>>> return (sqInt)(*((unsigned char *)ptr)= (unsigned char)val); }
>>>>    137  static inline sqInt shortAtPointer(char *ptr) { return
>>>> (sqInt)(*((short *)ptr)); }
>>>>    138  static inline sqInt shortAtPointerput(char *ptr, int val) {
>>>> return (sqInt)(*((short *)ptr)= (short)val); }
>>>>    139  static inline sqInt intAtPointer(char *ptr) { return
>>>> (sqInt)(*((int *)ptr)); }
>>>> Target 0: (Squeak) stopped.
>>>> (lldb)
>>>> Process 98164 resuming
>>>>
>>>> Watchpoint 37 hit:
>>>> old value: -3689912642157150198
>>>> new value: -5066732099722018553
>>>> Process 98164 stopped
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint
>>>> 37
>>>>     frame #0: 0x000000010002e801 Squeak`forceInterruptCheck at
>>>> gcc3x-cointerp.c:64151
>>>>    64148 /* StackInterpreter>>#forceInterruptCheck */
>>>>    64149 sqInt
>>>>    64150 forceInterruptCheck(void)
>>>> -> 64151 {   DECL_MAYBE_SQ_GLOBAL_STRUCT
>>>>    64152    void (*iccFunc)();
>>>>    64153    StackPage *thePage;
>>>>    64154
>>>> Target 0: (Squeak) stopped.
>>>> (lldb) bt
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint
>>>> 37
>>>>   * frame #0: 0x000000010002e801 Squeak`forceInterruptCheck at
>>>> gcc3x-cointerp.c:64151
>>>>     frame #1: 0x00007fff6ab9af5a libsystem_platform.dylib`_sigtramp + 26
>>>>     frame #2: 0x0000000107c10699
>>>>     frame #3: 0x0000000107c9646f
>>>>     frame #4: 0x0000000107c9633e
>>>>     frame #5: 0x0000000107c962ac
>>>>     frame #6: 0x0000000107c961e9
>>>>     frame #7: 0x0000000107c8fcf8
>>>>     frame #8: 0x0000000107c98605
>>>>     frame #9: 0x0000000107c8bbfe
>>>>     frame #10: 0x0000000107c1100e
>>>>     frame #11: 0x0000000107c8c4ad
>>>>     frame #12: 0x0000000107c0cef6
>>>>     frame #13: 0x0000000107c8c2a5
>>>>     frame #14: 0x0000000107c8bb4c
>>>>     frame #15: 0x0000000107c0cef6
>>>>     frame #16: 0x0000000107c8ba05
>>>>     frame #17: 0x0000000107c8b85b
>>>>     frame #18: 0x0000000107c1100e
>>>>     frame #19: 0x0000000107c8b7e1
>>>>     frame #20: 0x0000000107c57018
>>>>     frame #21: 0x0000000107c018c8
>>>> (lldb) call printCallStack()
>>>>
>>>> Watchpoint 34 hit:
>>>> old value: -3689348814741910456
>>>> new value: 4435376368
>>>> error: Execution was interrupted, reason: watchpoint 34.
>>>> The process has been returned to the state before expression evaluation.
>>>> (lldb) call printCogMethodFor((void*)0x0000000107c10699)
>>>>
>>>> Watchpoint 34 hit:
>>>> old value: 4435376368
>>>> new value: 0
>>>> error: Execution was interrupted, reason: watchpoint 34.
>>>> The process has been returned to the state before expression evaluation.
>>>>
>>>> I don't know how to proceed, but maybe there's enough clue for you...
>>>>
>>>> Le lun. 30 sept. 2019 à 01:28, Eliot Miranda <eliot.miranda at gmail.com>
>>>> a écrit :
>>>>
>>>>>
>>>>> Hi Nicolas,
>>>>>
>>>>> On Wed, Sep 25, 2019 at 12:32 AM Nicolas Cellier <
>>>>> nicolas.cellier.aka.nice at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> I noticed that many crash happen in testSendTimeout.
>>>>>> This test is organizing a race:
>>>>>> - Smalltalk fill the socket send buffer
>>>>>> - the OS tries to drain
>>>>>>
>>>>>> The other thing I noticed is that [SocketTest suite run] takes 7 to
>>>>>> 8s on macos while only 2 to 3s on ubuntu and windows
>>>>>> So it might be that some test times out on macos, while it doesn't on
>>>>>> other OS.
>>>>>>
>>>>>> This would explain that many crash also happen in JITted Timer loop,
>>>>>> and that we cannot observe it in other OSes.
>>>>>>
>>>>>
>>>>> Yes, that is, I think, an important clue.  The other one is that,
>>>>> given that the crash always happens only early in a run, plus the address
>>>>> at which it crashes in machine code, suggests to me that the crash occurs
>>>>> when the interpreter invokes a block in the JITted startTimerLoop.
>>>>>
>>>>> What worries me is that the JITted code for startTimerLoop looks
>>>>> completely corrupted, not merely moved or overwritten (the method header
>>>>> looks OK up to the selector field, which is the last field in the header).
>>>>> So I'm wondering whether it could be a bug in support code overwriting
>>>>> machine code, and hence that not might be related to a 32-bit vs 64-bit
>>>>> code issue.  I was hoping your recent sqUnixSocket.c changes might work but
>>>>> I'm sure they have no bearing.
>>>>>
>>>>> The access to OS ressources and the race may also explain some
>>>>>> randomization of the crash...
>>>>>>
>>>>>> So one idea would be to make the test timeout in linux too, see if we
>>>>>> can make it crash then try using rr.
>>>>>> Maybe it's possible with a huge Socket buffer, a smaller image side
>>>>>> buffer (we could reduce the size from 1000 to 1 so as to increase overhead).
>>>>>>
>>>>>
>>>>> Yes, that's a good idea.  But I do think we need to set up a reliable
>>>>> remote server, som thing that acts like an infinite sink for receiving a
>>>>> par4ticular large Monticello package, so that ev very time the example runs
>>>>> the same package is indeed copied to a remote server.  Would it be possible
>>>>> to construct such a fake Monticello server?
>>>>>
>>>>> What I'm wanting the server to do is
>>>>> a) take a while to respond to the Monticello package contents request
>>>>> b) to never include a particular package which the example would
>>>>> always try and upload
>>>>>
>>>>> The test case would be to copy a particular version of VMMaker from
>>>>> package-cache to the remote "fake" server
>>>>>
>>>>>
>>>>>>
>>>>>> It's just theories I tried to construct while biking, not in front of
>>>>>> a keyboard, and I won't be able to verify until this evening...
>>>>>> But the watchpoint strategy is too much vain shooting in the dark, we
>>>>>> need more ellaborated plans ;)
>>>>>>
>>>>>
>>>>> Does anyone see the crash on macOS in 32 bits?
>>>>>
>>>>> Le mar. 24 sept. 2019 à 00:28, Nicolas Cellier <
>>>>>> nicolas.cellier.aka.nice at gmail.com> a écrit :
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Le mar. 24 sept. 2019 à 00:25, Nicolas Cellier <
>>>>>>> nicolas.cellier.aka.nice at gmail.com> a écrit :
>>>>>>>
>>>>>>>> Follow up: so we have a problem of repeatability...
>>>>>>>> I found nothing like rr for macos, and CPU watchpoint capability is
>>>>>>>> too restricted to be usefull.
>>>>>>>>
>>>>>>>> So what I did is save a crash.image attempting to crash at
>>>>>>>> resumption time:
>>>>>>>>     Smalltalk saveAs: 'crash'.
>>>>>>>>     Smalltalk snapshot: true andQuit: true.
>>>>>>>>     5 timesRepeat: [SocketTest suite run].
>>>>>>>>
>>>>>>>> Then I launched the vm under lldb and gathered $pc stats to see if
>>>>>>>> I could observe some reproducibility
>>>>>>>> (lldb) p/x $pc - methodZoneBase
>>>>>>>> 0x00000000000a0e94
>>>>>>>> 0x0000000000090f70
>>>>>>>> 0x000000000009f824
>>>>>>>> 0x000000000009bfc8
>>>>>>>> 0x000000000009d030
>>>>>>>> 0x000000000009c7c0
>>>>>>>> 0x00000000000951e0
>>>>>>>> 0x00000000000949c8 <- once
>>>>>>>> 0x0000000000094418
>>>>>>>> 0x00000000000949c8 <— twice
>>>>>>>> 0x000000000009b848
>>>>>>>> 0x00000000000a0e94 <- twice
>>>>>>>> 0x000000000009a176
>>>>>>>> 0x000000006dfff2e0 ???
>>>>>>>> 0x00000000000996d1
>>>>>>>> 0x00000000000949c8 <- thrice good candidate for watchpoint set
>>>>>>>> 0x000000000009ae70
>>>>>>>> 0x000000000009ae70
>>>>>>>> 0x00000000000a18e7
>>>>>>>> 0x0000000000095470
>>>>>>>> 0x000000000009c458
>>>>>>>> 0x00000000000a0a7c
>>>>>>>> 0x0000000000093e89
>>>>>>>> 0x000000000009beb0
>>>>>>>> 0x0000000000095a37
>>>>>>>> 0x0000000000095408
>>>>>>>> 0x00000000000a06bc
>>>>>>>> 0x000000000009cd48
>>>>>>>> 0x0000000000095408 <- twice
>>>>>>>> 0x000000000009ba41
>>>>>>>> 0x0000000000095408 <- thrice new candidate for watchpoint set
>>>>>>>> 0x0000000000045258   <— smallest ever
>>>>>>>> 0x000000000009c7c0
>>>>>>>> 0x0000000000095408 <- fourth ! the watch point was set here, why
>>>>>>>> didn’t it stop??? argh methodZoneBase was incremented after I set the
>>>>>>>> watchpoint!!!
>>>>>>>> 0x00000000000a12dc
>>>>>>>> 0x00000000000951e0
>>>>>>>> 0x00000000000a0e24
>>>>>>>> 0x000000000009ff8c
>>>>>>>> 0x000000000009c7f6
>>>>>>>> 0x000000000009a626
>>>>>>>>
>>>>>>>> Then I have chosen to watch the most frequent crash address,
>>>>>>>> that is methodZoneBase+0x0000000000095408,
>>>>>>>> Unfortunately, s commented above, I missed one (I set the
>>>>>>>> watchpoint too early, before methodZoneBase took its definitive address).
>>>>>>>>
>>>>>>>> And after a few (a bunch of)  trials, here is what I found:
>>>>>>>>
>>>>>>>> (lldb) run ../../image/crash.image
>>>>>>>> There is a running process, kill it and restart?: [Y/n]
>>>>>>>> Process 28175 exited with status = 9 (0x00000009)
>>>>>>>> Process 28185 launched:
>>>>>>>> '/Users/nicolas/Smalltalk/OpenSmalltalk/opensmalltalk-vm/build.macos64x64/squeak.cog.spur/Squeak.app/Contents/MacOS/Squeak'
>>>>>>>> (x86_64)
>>>>>>>> Squeak was compiled with optimization - stepping may behave oddly;
>>>>>>>> variables may not be available.
>>>>>>>> Process 28185 stopped
>>>>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason =
>>>>>>>> breakpoint 6.1
>>>>>>>>     frame #0: 0x000000010007bc22 Squeak`-[sqSqueakOSXApplication
>>>>>>>> doHeadlessSetup](self=0x000000010066fc00, _cmd="doHeadlessSetup") at
>>>>>>>> sqSqueakOSXApplication.m:125 [opt]
>>>>>>>>    122 //No super call here as we've already set headless from
>>>>>>>> command line or info.plist
>>>>>>>>    123 extern BOOL gSqueakHeadless;
>>>>>>>>    124    // Notice that setActivationPolicy: is available in OSX
>>>>>>>> 10.6 and later
>>>>>>>> -> 125    if ([NSApp respondsToSelector:@selector(setActivationPolicy:)])
>>>>>>>> {
>>>>>>>>    126        if (gSqueakHeadless) {
>>>>>>>>    127            [NSApp
>>>>>>>> setActivationPolicy:NSApplicationActivationPolicyProhibited];
>>>>>>>>    128        } else {
>>>>>>>> Target 0: (Squeak) stopped.
>>>>>>>> (lldb) wa s e -s 1 -- methodZoneBase+0x0000000000095408
>>>>>>>> Watchpoint created: Watchpoint 33: addr = 0x107c58138 size = 1
>>>>>>>> state = enabled type = w
>>>>>>>>     new value: -3689348814741910324
>>>>>>>> (lldb) cont
>>>>>>>> Process 28185 resuming
>>>>>>>>
>>>>>>>> Watchpoint 33 hit:
>>>>>>>> old value: -3689348814741910324
>>>>>>>> new value: -3689348814741910836
>>>>>>>> Process 28185 stopped
>>>>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason =
>>>>>>>> watchpoint 33
>>>>>>>>     frame #0: 0x00000001000731dc
>>>>>>>> Squeak`fillInMethodHeadersizeselector(method=0x0000000107c58130, size=544,
>>>>>>>> selector=4435118304) at cogitX64SysV.c:6320 [opt]
>>>>>>>>    6317    sqInt rawHeader;
>>>>>>>>    6318
>>>>>>>>    6319 (method->cmType = CMMethod);
>>>>>>>> -> 6320 (method->objectHeader = nullHeaderForMachineCodeMethod());
>>>>>>>>    6321 (method->blockSize = size);
>>>>>>>>    6322 (method->methodObject = methodObj);
>>>>>>>>    6323
>>>>>>>> Target 0: (Squeak) stopped.
>>>>>>>> (lldb)
>>>>>>>> Process 28185 resuming
>>>>>>>>
>>>>>>>> Watchpoint 33 hit:
>>>>>>>> old value: -3689348814741910836
>>>>>>>> new value: -3689571654825360894
>>>>>>>> Process 28185 stopped
>>>>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason =
>>>>>>>> watchpoint 33
>>>>>>>>     frame #0: 0x0000000100073255
>>>>>>>> Squeak`fillInMethodHeadersizeselector(method=0x0000000107c58130, size=544,
>>>>>>>> selector=4435118304) at cogitX64SysV.c:6339 [opt]
>>>>>>>>    6336 (method->selector = selector);
>>>>>>>>    6337 (method->cmNumArgs =
>>>>>>>> argumentCountOfMethodHeader(methodHeader));
>>>>>>>>    6338 (method->cmHasMovableLiteral = hasMovableLiteral);
>>>>>>>> -> 6339 if ((method->cmRefersToYoung = hasYoungReferent)) {
>>>>>>>>    6340 addToYoungReferrers(method);
>>>>>>>>    6341 }
>>>>>>>>    6342 (method->cmUsageCount = initialMethodUsageCount());
>>>>>>>> Target 0: (Squeak) stopped.
>>>>>>>> (lldb) bt
>>>>>>>> * thread #1, queue = 'com.apple.main-thread', stop reason =
>>>>>>>> watchpoint 33
>>>>>>>>   * frame #0: 0x0000000100073255
>>>>>>>> Squeak`fillInMethodHeadersizeselector(method=0x0000000107c58130, size=544,
>>>>>>>> selector=4435118304) at cogitX64SysV.c:6339 [opt]
>>>>>>>>     frame #1: 0x000000010004f90c Squeak`compileCogMethod [inlined]
>>>>>>>> generateCogMethod(selector=<unavailable>) at cogitX64SysV.c:7098 [opt]
>>>>>>>>     frame #2: 0x000000010004f7a5
>>>>>>>> Squeak`compileCogMethod(selector=4435118304) at cogitX64SysV.c:27780 [opt]
>>>>>>>>     frame #3: 0x000000010004bc54
>>>>>>>> Squeak`cogselector(aMethodObj=4468481712, aSelectorOop=4435118304) at
>>>>>>>> cogitX64SysV.c:4833 [opt]
>>>>>>>>     frame #4: 0x0000000100013d4d Squeak`primitiveClosureValue at
>>>>>>>> gcc3x-cointerp.c:27237 [opt]
>>>>>>>>     frame #5: 0x0000000107bd1925
>>>>>>>>     frame #6: 0x000000010000236b Squeak`interpret at
>>>>>>>> gcc3x-cointerp.c:2767 [opt]
>>>>>>>>     frame #7: 0x000000010008ed55 Squeak`-[sqSqueakMainApplication
>>>>>>>> runSqueak](self=0x000000010066fc00, _cmd=<unavailable>) at
>>>>>>>> sqSqueakMainApplication.m:201 [opt]
>>>>>>>>     frame #8: 0x00007fff44b077b8
>>>>>>>> Foundation`__NSFirePerformWithOrder + 360
>>>>>>>>     frame #9: 0x00007fff4297cf57
>>>>>>>> CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__
>>>>>>>> + 23
>>>>>>>>     frame #10: 0x00007fff4297ce7f
>>>>>>>> CoreFoundation`__CFRunLoopDoObservers + 527
>>>>>>>>     frame #11: 0x00007fff4295f3f8 CoreFoundation`__CFRunLoopRun +
>>>>>>>> 1240
>>>>>>>>     frame #12: 0x00007fff4295ec93
>>>>>>>> CoreFoundation`CFRunLoopRunSpecific + 483
>>>>>>>>     frame #13: 0x00007fff41c49d96
>>>>>>>> HIToolbox`RunCurrentEventLoopInMode + 286
>>>>>>>>     frame #14: 0x00007fff41c49a0f HIToolbox`ReceiveNextEventCommon
>>>>>>>> + 366
>>>>>>>>     frame #15: 0x00007fff41c49884
>>>>>>>> HIToolbox`_BlockUntilNextEventMatchingListInModeWithFilter + 64
>>>>>>>>     frame #16: 0x00007fff3fef9a73 AppKit`_DPSNextEvent + 2085
>>>>>>>>     frame #17: 0x00007fff4068fe34 AppKit`-[NSApplication(NSEvent)
>>>>>>>> _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044
>>>>>>>>     frame #18: 0x00007fff3feee885 AppKit`-[NSApplication run] + 764
>>>>>>>>     frame #19: 0x00007fff3febda72 AppKit`NSApplicationMain + 804
>>>>>>>>     frame #20: 0x00007fff6a88c015 libdyld.dylib`start + 1
>>>>>>>> (lldb)
>>>>>>>>
>>>>>>>> Here I don't know how to proceed, because I have no knowledge of
>>>>>>>> this part of the VM, it's also quite late now...
>>>>>>>> Eliot, does it ring some bell?
>>>>>>>>
>>>>>>>>
>>>>>>> Ah, forget it, I thought it was overwriting, but it is the first
>>>>>>> write on yet virgin methodeZone...
>>>>>>> (it's just that the scenario was different from all the previous run)
>>>>>>>
>>>>>>> Le dim. 22 sept. 2019 à 11:21, Jan Vrany <jan.vrany at fit.cvut.cz> a
>>>>>>>> écrit :
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2019-09-22 10:57, Nicolas Cellier wrote:
>>>>>>>>> > Hi Jan,
>>>>>>>>> > Thanks, a recording debugger would indeed be a tremendous help!
>>>>>>>>> > Unfortunately, I'm unable to reproduce the bug under linux, only
>>>>>>>>> > macosx...
>>>>>>>>> > Do you know something equivalent to rr for mac?
>>>>>>>>>
>>>>>>>>> No, I don't. There might be something though, I never had a mac...
>>>>>>>>>
>>>>>>>>> Jan
>>>>>>>>>
>>>>>>>>> >
>>>>>>>>> > Le dim. 22 sept. 2019 à 09:55, Jan Vrany <jan.vrany at fit.cvut.cz>
>>>>>>>>> a
>>>>>>>>> > écrit :
>>>>>>>>> >
>>>>>>>>> >> Hi,
>>>>>>>>> >>
>>>>>>>>> >>>
>>>>>>>>> >>> - can anyone think of any other strategies I might take to try
>>>>>>>>> and
>>>>>>>>> >>> reproduce this?
>>>>>>>>> >>>
>>>>>>>>> >> at the risk of stating the obvious, in cases similar to this rr
>>>>>>>>> (+
>>>>>>>>> >> VDB)
>>>>>>>>> >> helped
>>>>>>>>> >> me many times. See https://www.youtube.com/watch?v=Qbu4ED8wPYE
>>>>>>>>> >>
>>>>>>>>> >> Never used with opensmalltalk VM, though.
>>>>>>>>> >>
>>>>>>>>> >> HTH, Jan
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>> --
>>>>> _,,,^..^,,,_
>>>>> best, Eliot
>>>>>
>>>>

-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20191004/fa96662b/attachment-0001.html>


More information about the Vm-dev mailing list