Hi Eliot, so I have set watchpoint to 4 previous crash and got this:
(lldb) run ../../image/crash.image There is a running process, kill it and restart?: [Y/n] Process 98150 exited with status = 9 (0x00000009) Process 98164 launched: '/Users/nicolas/Smalltalk/OpenSmalltalk/opensmalltalk-vm/build.macos64x64/squeak.cog.spur/SqueakDebug.app/Contents/MacOS/Squeak' (x86_64) Process 98164 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x00000001001618ec Squeak`-[sqSqueakMainApplication setupMenus](self=0x000000010076e1a0, _cmd="setupMenus") at sqSqueakMainApplication.m:123 120 121 - (void) setupMenus { 122 // nothing to do so far since the menu is setup in the MainMenu.nib file -> 123 } 124 125 - (void) setupTimers { 126 extern void SetUpTimers(void); Target 0: (Squeak) stopped. (lldb) wa se e -s 1 -- 0x107c992b4 Watchpoint created: Watchpoint 35: addr = 0x107c992b4 size = 1 state = enabled type = w (lldb) wa se e -s 1 -- 0x107c929e0 Watchpoint created: Watchpoint 34: addr = 0x107c929e0 size = 1 state = enabled type = w (lldb) wa se e -s 1 -- 0x0000000107c945df Watchpoint created: Watchpoint 36: addr = 0x107c945df size = 1 state = enabled type = w (lldb) wa se e -s 1 -- 0x0000000107c92da3 Watchpoint created: Watchpoint 37: addr = 0x107c92da3 size = 1 state = enabled type = w new value: -3689348814741910324 (lldb) c Process 98164 resuming
Watchpoint 34 hit: new value: -3689348814741910456 Process 98164 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint 34 frame #0: 0x0000000100102067 Squeak`byteAtPointerput(ptr="H???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????, val=72) at sqMemoryAccess.h:136 133 /* Use static inline functions when the compiler produces efficient code for small accessors. 134 These are preferred because static type checking will prevent inadvertent confusion of pointers and oops. */ 135 static inline sqInt byteAtPointer(char *ptr) { return (sqInt)(*((unsigned char *)ptr)); } -> 136 static inline sqInt byteAtPointerput(char *ptr, int val) { return (sqInt)(*((unsigned char *)ptr)= (unsigned char)val); } 137 static inline sqInt shortAtPointer(char *ptr) { return (sqInt)(*((short *)ptr)); } 138 static inline sqInt shortAtPointerput(char *ptr, int val) { return (sqInt)(*((short *)ptr)= (short)val); } 139 static inline sqInt intAtPointer(char *ptr) { return (sqInt)(*((int *)ptr)); } Target 0: (Squeak) stopped. (lldb) Process 98164 resuming
Watchpoint 37 hit: old value: -3689348814741910324 new value: -3689912642157150198 Process 98164 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint 37 frame #0: 0x000000010012de84 Squeak`fillInMethodHeadersizeselector(method=0x0000000107c92da0, size=1240, selector=4435376352) at cogitX64SysV.c:6321 6318 6319 (method->cmType = CMMethod); 6320 (method->objectHeader = nullHeaderForMachineCodeMethod()); -> 6321 (method->blockSize = size); 6322 (method->methodObject = methodObj); 6323 6324 /* If the method has already been cogged (e.g. Newspeak accessors) then Target 0: (Squeak) stopped. (lldb) Process 98164 resuming
Watchpoint 36 hit: new value: -3689348814741910520 Process 98164 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint 36 frame #0: 0x0000000100102067 Squeak`byteAtPointerput(ptr="\b???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????, val=8) at sqMemoryAccess.h:136 133 /* Use static inline functions when the compiler produces efficient code for small accessors. 134 These are preferred because static type checking will prevent inadvertent confusion of pointers and oops. */ 135 static inline sqInt byteAtPointer(char *ptr) { return (sqInt)(*((unsigned char *)ptr)); } -> 136 static inline sqInt byteAtPointerput(char *ptr, int val) { return (sqInt)(*((unsigned char *)ptr)= (unsigned char)val); } 137 static inline sqInt shortAtPointer(char *ptr) { return (sqInt)(*((short *)ptr)); } 138 static inline sqInt shortAtPointerput(char *ptr, int val) { return (sqInt)(*((short *)ptr)= (short)val); } 139 static inline sqInt intAtPointer(char *ptr) { return (sqInt)(*((int *)ptr)); } Target 0: (Squeak) stopped. (lldb) Process 98164 resuming
Watchpoint 35 hit: new value: -3689348814741910456 Process 98164 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint 35 frame #0: 0x0000000100102067 Squeak`byteAtPointerput(ptr="H???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????, val=72) at sqMemoryAccess.h:136 133 /* Use static inline functions when the compiler produces efficient code for small accessors. 134 These are preferred because static type checking will prevent inadvertent confusion of pointers and oops. */ 135 static inline sqInt byteAtPointer(char *ptr) { return (sqInt)(*((unsigned char *)ptr)); } -> 136 static inline sqInt byteAtPointerput(char *ptr, int val) { return (sqInt)(*((unsigned char *)ptr)= (unsigned char)val); } 137 static inline sqInt shortAtPointer(char *ptr) { return (sqInt)(*((short *)ptr)); } 138 static inline sqInt shortAtPointerput(char *ptr, int val) { return (sqInt)(*((short *)ptr)= (short)val); } 139 static inline sqInt intAtPointer(char *ptr) { return (sqInt)(*((int *)ptr)); } Target 0: (Squeak) stopped. (lldb) Process 98164 resuming
Watchpoint 37 hit: old value: -3689912642157150198 new value: -5066732099722018553 Process 98164 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint 37 frame #0: 0x000000010002e801 Squeak`forceInterruptCheck at gcc3x-cointerp.c:64151 64148 /* StackInterpreter>>#forceInterruptCheck */ 64149 sqInt 64150 forceInterruptCheck(void) -> 64151 { DECL_MAYBE_SQ_GLOBAL_STRUCT 64152 void (*iccFunc)(); 64153 StackPage *thePage; 64154 Target 0: (Squeak) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint 37 * frame #0: 0x000000010002e801 Squeak`forceInterruptCheck at gcc3x-cointerp.c:64151 frame #1: 0x00007fff6ab9af5a libsystem_platform.dylib`_sigtramp + 26 frame #2: 0x0000000107c10699 frame #3: 0x0000000107c9646f frame #4: 0x0000000107c9633e frame #5: 0x0000000107c962ac frame #6: 0x0000000107c961e9 frame #7: 0x0000000107c8fcf8 frame #8: 0x0000000107c98605 frame #9: 0x0000000107c8bbfe frame #10: 0x0000000107c1100e frame #11: 0x0000000107c8c4ad frame #12: 0x0000000107c0cef6 frame #13: 0x0000000107c8c2a5 frame #14: 0x0000000107c8bb4c frame #15: 0x0000000107c0cef6 frame #16: 0x0000000107c8ba05 frame #17: 0x0000000107c8b85b frame #18: 0x0000000107c1100e frame #19: 0x0000000107c8b7e1 frame #20: 0x0000000107c57018 frame #21: 0x0000000107c018c8 (lldb) call printCallStack()
Watchpoint 34 hit: old value: -3689348814741910456 new value: 4435376368 error: Execution was interrupted, reason: watchpoint 34. The process has been returned to the state before expression evaluation. (lldb) call printCogMethodFor((void*)0x0000000107c10699)
Watchpoint 34 hit: old value: 4435376368 new value: 0 error: Execution was interrupted, reason: watchpoint 34. The process has been returned to the state before expression evaluation.
I don't know how to proceed, but maybe there's enough clue for you...
Le lun. 30 sept. 2019 à 01:28, Eliot Miranda eliot.miranda@gmail.com a écrit :
Hi Nicolas,
On Wed, Sep 25, 2019 at 12:32 AM Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:
I noticed that many crash happen in testSendTimeout. This test is organizing a race:
- Smalltalk fill the socket send buffer
- the OS tries to drain
The other thing I noticed is that [SocketTest suite run] takes 7 to 8s on macos while only 2 to 3s on ubuntu and windows So it might be that some test times out on macos, while it doesn't on other OS.
This would explain that many crash also happen in JITted Timer loop, and that we cannot observe it in other OSes.
Yes, that is, I think, an important clue. The other one is that, given that the crash always happens only early in a run, plus the address at which it crashes in machine code, suggests to me that the crash occurs when the interpreter invokes a block in the JITted startTimerLoop.
What worries me is that the JITted code for startTimerLoop looks completely corrupted, not merely moved or overwritten (the method header looks OK up to the selector field, which is the last field in the header). So I'm wondering whether it could be a bug in support code overwriting machine code, and hence that not might be related to a 32-bit vs 64-bit code issue. I was hoping your recent sqUnixSocket.c changes might work but I'm sure they have no bearing.
The access to OS ressources and the race may also explain some
randomization of the crash...
So one idea would be to make the test timeout in linux too, see if we can make it crash then try using rr. Maybe it's possible with a huge Socket buffer, a smaller image side buffer (we could reduce the size from 1000 to 1 so as to increase overhead).
Yes, that's a good idea. But I do think we need to set up a reliable remote server, som thing that acts like an infinite sink for receiving a par4ticular large Monticello package, so that ev very time the example runs the same package is indeed copied to a remote server. Would it be possible to construct such a fake Monticello server?
What I'm wanting the server to do is a) take a while to respond to the Monticello package contents request b) to never include a particular package which the example would always try and upload
The test case would be to copy a particular version of VMMaker from package-cache to the remote "fake" server
It's just theories I tried to construct while biking, not in front of a keyboard, and I won't be able to verify until this evening... But the watchpoint strategy is too much vain shooting in the dark, we need more ellaborated plans ;)
Does anyone see the crash on macOS in 32 bits?
Le mar. 24 sept. 2019 à 00:28, Nicolas Cellier <
nicolas.cellier.aka.nice@gmail.com> a écrit :
Le mar. 24 sept. 2019 à 00:25, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> a écrit :
Follow up: so we have a problem of repeatability... I found nothing like rr for macos, and CPU watchpoint capability is too restricted to be usefull.
So what I did is save a crash.image attempting to crash at resumption time: Smalltalk saveAs: 'crash'. Smalltalk snapshot: true andQuit: true. 5 timesRepeat: [SocketTest suite run].
Then I launched the vm under lldb and gathered $pc stats to see if I could observe some reproducibility (lldb) p/x $pc - methodZoneBase 0x00000000000a0e94 0x0000000000090f70 0x000000000009f824 0x000000000009bfc8 0x000000000009d030 0x000000000009c7c0 0x00000000000951e0 0x00000000000949c8 <- once 0x0000000000094418 0x00000000000949c8 <— twice 0x000000000009b848 0x00000000000a0e94 <- twice 0x000000000009a176 0x000000006dfff2e0 ??? 0x00000000000996d1 0x00000000000949c8 <- thrice good candidate for watchpoint set 0x000000000009ae70 0x000000000009ae70 0x00000000000a18e7 0x0000000000095470 0x000000000009c458 0x00000000000a0a7c 0x0000000000093e89 0x000000000009beb0 0x0000000000095a37 0x0000000000095408 0x00000000000a06bc 0x000000000009cd48 0x0000000000095408 <- twice 0x000000000009ba41 0x0000000000095408 <- thrice new candidate for watchpoint set 0x0000000000045258 <— smallest ever 0x000000000009c7c0 0x0000000000095408 <- fourth ! the watch point was set here, why didn’t it stop??? argh methodZoneBase was incremented after I set the watchpoint!!! 0x00000000000a12dc 0x00000000000951e0 0x00000000000a0e24 0x000000000009ff8c 0x000000000009c7f6 0x000000000009a626
Then I have chosen to watch the most frequent crash address, that is methodZoneBase+0x0000000000095408, Unfortunately, s commented above, I missed one (I set the watchpoint too early, before methodZoneBase took its definitive address).
And after a few (a bunch of) trials, here is what I found:
(lldb) run ../../image/crash.image There is a running process, kill it and restart?: [Y/n] Process 28175 exited with status = 9 (0x00000009) Process 28185 launched: '/Users/nicolas/Smalltalk/OpenSmalltalk/opensmalltalk-vm/build.macos64x64/squeak.cog.spur/Squeak.app/Contents/MacOS/Squeak' (x86_64) Squeak was compiled with optimization - stepping may behave oddly; variables may not be available. Process 28185 stopped
- thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint
6.1 frame #0: 0x000000010007bc22 Squeak`-[sqSqueakOSXApplication doHeadlessSetup](self=0x000000010066fc00, _cmd="doHeadlessSetup") at sqSqueakOSXApplication.m:125 [opt] 122 //No super call here as we've already set headless from command line or info.plist 123 extern BOOL gSqueakHeadless; 124 // Notice that setActivationPolicy: is available in OSX 10.6 and later -> 125 if ([NSApp respondsToSelector:@selector(setActivationPolicy:)]) { 126 if (gSqueakHeadless) { 127 [NSApp setActivationPolicy:NSApplicationActivationPolicyProhibited]; 128 } else { Target 0: (Squeak) stopped. (lldb) wa s e -s 1 -- methodZoneBase+0x0000000000095408 Watchpoint created: Watchpoint 33: addr = 0x107c58138 size = 1 state = enabled type = w new value: -3689348814741910324 (lldb) cont Process 28185 resuming
Watchpoint 33 hit: old value: -3689348814741910324 new value: -3689348814741910836 Process 28185 stopped
- thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint
33 frame #0: 0x00000001000731dc Squeak`fillInMethodHeadersizeselector(method=0x0000000107c58130, size=544, selector=4435118304) at cogitX64SysV.c:6320 [opt] 6317 sqInt rawHeader; 6318 6319 (method->cmType = CMMethod); -> 6320 (method->objectHeader = nullHeaderForMachineCodeMethod()); 6321 (method->blockSize = size); 6322 (method->methodObject = methodObj); 6323 Target 0: (Squeak) stopped. (lldb) Process 28185 resuming
Watchpoint 33 hit: old value: -3689348814741910836 new value: -3689571654825360894 Process 28185 stopped
- thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint
33 frame #0: 0x0000000100073255 Squeak`fillInMethodHeadersizeselector(method=0x0000000107c58130, size=544, selector=4435118304) at cogitX64SysV.c:6339 [opt] 6336 (method->selector = selector); 6337 (method->cmNumArgs = argumentCountOfMethodHeader(methodHeader)); 6338 (method->cmHasMovableLiteral = hasMovableLiteral); -> 6339 if ((method->cmRefersToYoung = hasYoungReferent)) { 6340 addToYoungReferrers(method); 6341 } 6342 (method->cmUsageCount = initialMethodUsageCount()); Target 0: (Squeak) stopped. (lldb) bt
- thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint
33
- frame #0: 0x0000000100073255
Squeak`fillInMethodHeadersizeselector(method=0x0000000107c58130, size=544, selector=4435118304) at cogitX64SysV.c:6339 [opt] frame #1: 0x000000010004f90c Squeak`compileCogMethod [inlined] generateCogMethod(selector=<unavailable>) at cogitX64SysV.c:7098 [opt] frame #2: 0x000000010004f7a5 Squeak`compileCogMethod(selector=4435118304) at cogitX64SysV.c:27780 [opt] frame #3: 0x000000010004bc54 Squeak`cogselector(aMethodObj=4468481712, aSelectorOop=4435118304) at cogitX64SysV.c:4833 [opt] frame #4: 0x0000000100013d4d Squeak`primitiveClosureValue at gcc3x-cointerp.c:27237 [opt] frame #5: 0x0000000107bd1925 frame #6: 0x000000010000236b Squeak`interpret at gcc3x-cointerp.c:2767 [opt] frame #7: 0x000000010008ed55 Squeak`-[sqSqueakMainApplication runSqueak](self=0x000000010066fc00, _cmd=<unavailable>) at sqSqueakMainApplication.m:201 [opt] frame #8: 0x00007fff44b077b8 Foundation`__NSFirePerformWithOrder + 360 frame #9: 0x00007fff4297cf57 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__
- 23 frame #10: 0x00007fff4297ce7f CoreFoundation`__CFRunLoopDoObservers
- 527 frame #11: 0x00007fff4295f3f8 CoreFoundation`__CFRunLoopRun + 1240 frame #12: 0x00007fff4295ec93 CoreFoundation`CFRunLoopRunSpecific +
483 frame #13: 0x00007fff41c49d96 HIToolbox`RunCurrentEventLoopInMode + 286 frame #14: 0x00007fff41c49a0f HIToolbox`ReceiveNextEventCommon + 366 frame #15: 0x00007fff41c49884 HIToolbox`_BlockUntilNextEventMatchingListInModeWithFilter + 64 frame #16: 0x00007fff3fef9a73 AppKit`_DPSNextEvent + 2085 frame #17: 0x00007fff4068fe34 AppKit`-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044 frame #18: 0x00007fff3feee885 AppKit`-[NSApplication run] + 764 frame #19: 0x00007fff3febda72 AppKit`NSApplicationMain + 804 frame #20: 0x00007fff6a88c015 libdyld.dylib`start + 1 (lldb)
Here I don't know how to proceed, because I have no knowledge of this part of the VM, it's also quite late now... Eliot, does it ring some bell?
Ah, forget it, I thought it was overwriting, but it is the first write on yet virgin methodeZone... (it's just that the scenario was different from all the previous run)
Le dim. 22 sept. 2019 à 11:21, Jan Vrany jan.vrany@fit.cvut.cz a
écrit :
On 2019-09-22 10:57, Nicolas Cellier wrote:
Hi Jan, Thanks, a recording debugger would indeed be a tremendous help! Unfortunately, I'm unable to reproduce the bug under linux, only macosx... Do you know something equivalent to rr for mac?
No, I don't. There might be something though, I never had a mac...
Jan
Le dim. 22 sept. 2019 à 09:55, Jan Vrany jan.vrany@fit.cvut.cz a écrit :
> Hi, > >> >> - can anyone think of any other strategies I might take to try and >> reproduce this? >> > at the risk of stating the obvious, in cases similar to this rr (+ > VDB) > helped > me many times. See https://www.youtube.com/watch?v=Qbu4ED8wPYE > > Never used with opensmalltalk VM, though. > > HTH, Jan
-- _,,,^..^,,,_ best, Eliot