[squeak-dev] Difficult to debug VM crash with full blocks and Sista V1

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Mon Sep 16 21:48:36 UTC 2019


Attached a bunch of crash dumps, some of them with same stack as your last
example...
You definitely want my internet connection (or not...)!

Le lun. 16 sept. 2019 à 23:05, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> a écrit :

> Forgot to say: it happens in spur only, not stack.
>
> Le lun. 16 sept. 2019 à 22:52, Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> a écrit :
>
>> No idea if this is related or not,
>> but i got regular crash on macos own compiled x64 artefact by just
>> running:
>>     SocketTest suite run.
>>
>> I first though about inquiring some UB, but this happens in debug
>> version, so it's probably something else.
>> I did not try to simulate (but can we simulate socket tests?).
>> This does not happen in windows 64 nor linux 64 (wsl) where I can run all
>> the tests.
>> Note that on wsl i had to evaluate (Compiler recompileAll) in
>> trunk6-64.image, otherwise a bunch of tests are failing... Mysterious.
>> I also had 2 ByteSymbol differing from their interned version.
>>
>> Le sam. 14 sept. 2019 à 09:33, Tobias Pape <Das.Linux at gmx.de> a écrit :
>>
>>>
>>> > On 14.09.2019, at 06:03, Nicola Mingotti <nmingotti at gmail.com> wrote:
>>> >
>>> >
>>> > I can help you a bit only on this point:
>>> > "- is there a way of introducing network delays in Mac OS that might
>>> help me induce the bug?"
>>>
>>> Yea it is called "network link conditioner.prefpane" :D
>>>
>>>
>>> >
>>> > Yes, in theory it is possible. Some time ago I red the documentation
>>> of 'dummynet' in FreeBSD for the firewall 'ipfw', it seemed to be very
>>> interesting but I never had occasion to use it.
>>> >
>>> > Now, Apple Unix is in large part taken from FreeBSD => I check if they
>>> took also dummynet:
>>> > macOS> apropos dummynet
>>> > dummynet(4) ....
>>> >
>>> > So, yes, it is there.
>>> >
>>> > HTH
>>> >
>>> > bye
>>> > Nicola
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On 9/13/19 8:15 PM, Eliot Miranda wrote:
>>> >> Hi All,
>>> >>
>>> >>     there is a VM bug in 64-bit Spur with the Sista V1 bytecode set
>>> and full blocks.  The symptom is that when waiting for a remote Monticello
>>> repository to update and/or deliver a package version the system crashes in
>>> JITTED code after what appears to be some kind of wait.
>>> >>
>>> >> This is a reliably occurring bug b ut maddeningly difficult to
>>> reproduce.  The bug reliably occurs when interacting with a remote
>>> rep[ository (e.g. http://source.squeak.org/VMMaker) when the server is
>>> "cold", and hence makes the image wait.  Every time I have tried to repeat
>>> the failing sequence the crash has not occurre3d, I think because the
>>> server is now "hot" and serves up the version quickly.  Today I even tried
>>> shutting down my machine for over an hour and rebooting.  But I could not
>>> get the crash to occur even though it seems to me that every time I try it
>>> the first time in the4 day it does crash.
>>> >>
>>> >> This is an important bug to fix.  If it cannot be fixed then full
>>> blocks and Sista V1 are not ready for use in the upcoming Squeak release.
>>> I am looking for help in debugging this.
>>> >>
>>> >> - is anyone else uising the 64-bit VM with full blocks and Sista V1
>>> who sees hard VM crashes?  If so, under what circumstances?
>>> >>
>>> >> - is it possible to flush caches in the
>>> http://source.squeak.org/VMMaker server, or could people tolerate me
>>> rebooting the server?
>>> >>
>>> >> - is there a way of introducing network delays in Mac OS that might
>>> help me induce the bug?
>>> >>
>>> >> - can anyone think of any other strategies I might take to try and
>>> reproduce this?
>>> >>
>>> >> I may have to try and reproduce e the bug in the simulator to have a
>>> chance of identifying the bug.  Does anyone have a good enough mental model
>>> of the Monticello server interaction and have energy to help me figure this
>>> one out?
>>> >>
>>> >> Here is some information from the last crash I did see in the
>>> debugger (alas it is incomplete; there are a number of additional pieces of
>>> info I could have collected).
>>> >>
>>> >> (lldb) thr b
>>> >> * thread #1, queue = 'com.apple.main-thread', stop reason =
>>> EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
>>> >>   * frame #0: 0x000000010de5700a
>>> >>     frame #1: 0x000000010dd7b174
>>> >>     frame #2: 0x000000010dd45f1c
>>> >>     frame #3: 0x000000010dd44534
>>> >>     frame #4: 0x000000010dd44c60
>>> >> (lldb) x/10i 0x000000010de5700a
>>> >>
>>> >> (lldb) call printStackCallStackOf($rbp)
>>> >>     0x7ffeefbdfc30 M Heap>upHeap: 0x11273ca90: a(n) Heap
>>> >>     0x7ffeefbdfc68 M Heap>add: 0x11273ca90: a(n) Heap
>>> >>     0x7ffeefbdfca0 M Delay class>scheduleDelay:from: 0x1123ebfb8:
>>> a(n) Delay class
>>> >>     0x7ffeefbdfcf0 M Delay class>handleTimerEvent 0x1123ebfb8: a(n)
>>> Delay class
>>> >>     0x7ffeefbdfd20 M Delay class>runTimerEventLoop 0x1123ebfb8: a(n)
>>> Delay class
>>> >>
>>> >> (lldb) x/10i 0x000000010dd7b174
>>> >>     0x10dd7b174: 48 8b 55 10  movq   0x10(%rbp), %rdx
>>> >>     0x10dd7b178: 48 89 ec     movq   %rbp, %rsp
>>> >>     0x10dd7b17b: 5d           popq   %rbp
>>> >>     0x10dd7b17c: c2 10 00     retq   $0x10
>>> >>     0x10dd7b17f: cc           int3
>>> >>     0x10dd7b180: cc           int3
>>> >>     0x10dd7b181: cc           int3
>>> >>     0x10dd7b182: cc           int3
>>> >>     0x10dd7b183: cc           int3
>>> >>     0x10dd7b184: cc           int3
>>> >> (lldb) print whereIs(0x000000010dd7b174)
>>> >> (char *) $0 = 0x00000001000f83ff " is in generated methods"
>>> >> (lldb) call printCogMethodFor((void *)0x000000010dd7b174)
>>> >>        0x10dd7afc0 <->        0x10dd7b198: method:        0x112f23c10
>>> selector:        0x112232c20 add:
>>> >> (lldb) print whereIs(0x000000010de5700a)
>>> >> (char *) $1 = 0x00000001000f83ff " is in generated methods"
>>> >> (lldb) call printCogMethodFor((void *)0x000000010de5700a)
>>> >>        0x10de56ba0 <->        0x10de57078: method:        0x1126ec218
>>> prim 23856 selector:     0x7ffeefbf3d20
>>> >>
>>> >> this method ends up being the fitted version of Delay class>>
>>> startTimerEventLoop
>>> >> _,,,^..^,,,_
>>> >> best, Eliot
>>> >>
>>> >>
>>> >
>>> >
>>>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20190916/7e3c58a6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: crash.dmp
Type: application/octet-stream
Size: 249584 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20190916/7e3c58a6/attachment-0001.obj>


More information about the Squeak-dev mailing list