[Vm-dev] squeak.cog.spur x64 macos & linux reproducible VM crash

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue Sep 10 21:47:02 UTC 2019


This was due to my own bug in JIT and it was triggered by this:
[0 = $0] bench.

There are big mistakes in Squeak WebUtils class>>jsonNumberFrom: triggering
this bug
I presume Newspeak triggers some similar Integer = Character test.
I don't know if it is considered correct in Newspeak, but it would be good
to inquire.
I will publish corrections and see how the CI goes (unless it still when
attempting to update Ubuntu packages...)

Le jeu. 5 sept. 2019 à 23:57, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> a écrit :

> Hi Eliot,
> I can reproduce with a simple `WebClientServerTest new testNumbers` with
> VMMaker.oscog-nice.2550 in up-to-date trunk6-64.image.
> So putting that snippet in a file, and running the Simulator like below
> also reproduce it
>
> | cos |
> cos := CogVMSimulator newWithOptions: #(Cogit StackToRegisterMappingCogit
> "SimpleStackBasedCogit"
> ObjectMemory Spur64BitCoMemoryManager
> "ISA ARMv5" "ISA IA32").
> "cos initializeThreadSupport."
> cos desiredNumStackPages: 8. "Speeds up scavenging when simulating.  Set
> to e.g. 64 for something like the real VM."
> cos openOn: 'trunk6-64.image'.
> cos systemAttributes
> at: 2 put: 'testWebNum.st'.
> cos openAsMorph; run
>
> There, I don't know how to debug further...
>
> Le jeu. 5 sept. 2019 à 00:30, Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> a écrit :
>
>> Hmm, it's probably me who introduced the bug with float/int comparison
>> hacks
>>
>> https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/e839adea0f2a8e2c8e04706b24283c996aeff1ae
>>
>> Thanks to Ubuntu Xenial causing random failure of travis_install.sh for
>> uncompatibility of libpango and libcairo2,
>> this got un-noticed and alas probably broke the Newspeak bootstrap.
>>
>> I hope that Eliot eyes will be more sharp than mine, I fail to see the
>> exact cause and need some rest...
>>
>> Le mer. 4 sept. 2019 à 22:07, Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com> a écrit :
>>
>>> hint: if I disable SmallInteger->Float conversion in VM with more recent
>>> VMMaker.oscog-nice.2547
>>>
>>>     Smalltalk vmParameterAt: 75 put: false.
>>>     Smalltalk voidCogVMState.
>>>
>>> then the `WebClientServerTest suite run` snippet does not trigger the
>>> bug...
>>>
>>> I noticed another assert failure though
>>>
>>>     (((self_in_concretizeAt->maxSize)) == null) ||
>>> (((self_in_concretizeAt->maxSize)) >=
>>> ((self_in_concretizeAt->machineCodeSize))) 2124
>>>
>>> I first thought that I failed to count the number of machine code bytes
>>> generated for new ops ClzRR and BSR...
>>> But I did not hook new primitive 75 in this image, so it must be due to
>>> another instruction...
>>>
>>> Whether or not this specific VM crash is due to my own recent changes
>>> has to be inquired... need to bissect.
>>>
>>> Note that the incorrect register RDI and RCX are mapped to abstract
>>> register Arg0Reg...
>>> Some we might also query usage of this specific register.
>>>
>>> Le mer. 4 sept. 2019 à 11:37, Nicolas Cellier <
>>> nicolas.cellier.aka.nice at gmail.com> a écrit :
>>>
>>>> MBP-de-Nicolas:nsboot nicolas$ lldb
>>>> /Users/nicolas/Smalltalk/OpenSmalltalk/opensmalltalk-vm/build.macos64x64/squeak.cog.spur/SqueakDebug.app/Contents/MacOS/Squeak
>>>>
>>>> (lldb) target create
>>>> "/Users/nicolas/Smalltalk/OpenSmalltalk/opensmalltalk-vm/build.macos64x64/squeak.cog.spur/SqueakDebug.app/Contents/MacOS/Squeak"
>>>> Current executable set to
>>>> '/Users/nicolas/Smalltalk/OpenSmalltalk/opensmalltalk-vm/build.macos64x64/squeak.cog.spur/SqueakDebug.app/Contents/MacOS/Squeak'
>>>> (x86_64).
>>>> (lldb) run ../../image/trunk6-64.image
>>>> Process 76862 launched:
>>>> '/Users/nicolas/Smalltalk/OpenSmalltalk/opensmalltalk-vm/build.macos64x64/squeak.cog.spur/SqueakDebug.app/Contents/MacOS/Squeak'
>>>> (x86_64)
>>>>
>>>> Process 76862 stopped
>>>> * thread #1, queue = 'com.apple.main-thread', stop reason =
>>>> EXC_BAD_ACCESS (code=1, address=0x16a)
>>>>     frame #0: 0x0000000107c02436
>>>> ->  0x107c02436: movq   (%rdi), %r9
>>>>     0x107c02439: andq   $0x3fffff, %r9            ; imm = 0x3FFFFF
>>>>     0x107c02440: cmpq   $0x22, %r9
>>>>     0x107c02444: jne    0x107c0244d
>>>> Target 0: (Squeak) stopped.
>>>> (lldb) print $rdi
>>>> (unsigned long) $0 = 362
>>>> (lldb) call printCallStack()
>>>>     0x7ffeefbd54c8 M WebUtils class>jsonDecode: 0x109098170: a(n)
>>>> WebUtils class
>>>>     0x7ffeefbd5500 M WebClientServerTest>decode: 0x107eb0c38: a(n)
>>>> WebClientServerTest
>>>>     0x7ffeefbd5548 M WebClientServerTest>testNumbers 0x107eb0c38: a(n)
>>>> WebClientServerTest
>>>>     0x7ffeefbd5578 M WebClientServerTest(TestCase)>performTest
>>>> 0x107eb0c38: a(n) WebClientServerTest
>>>>     0x7ffeefbd55a8 M [] in WebClientServerTest(TestCase)>runCase
>>>> 0x107eb0c38: a(n) WebClientServerTest
>>>>     0x7ffeefbd55e0 M BlockClosure>on:do: 0x1081f30f8: a(n) BlockClosure
>>>>     0x7ffeefbd5630 M [] in WebClientServerTest(TestCase)>timeout:after:
>>>> 0x107eb0c38: a(n) WebClientServerTest
>>>>     0x7ffeefbd5670 M BlockClosure>ensure: 0x1081f3a08: a(n) BlockClosure
>>>>     0x7ffeefbd56c0 M WebClientServerTest(TestCase)>timeout:after:
>>>> 0x107eb0c38: a(n) WebClientServerTest
>>>>     0x7ffeefbd5700 M [] in WebClientServerTest(TestCase)>runCase
>>>> 0x107eb0c38: a(n) WebClientServerTest
>>>>     0x7ffeefbd1f60 M BlockClosure>ensure: 0x1081efb68: a(n) BlockClosure
>>>>     0x7ffeefbd1f98 M WebClientServerTest(TestCase)>runCase 0x107eb0c38:
>>>> a(n) WebClientServerTest
>>>>     0x7ffeefbd1fd0 M [] in TestResult>runCase: 0x107e66628: a(n)
>>>> TestResult
>>>>     0x7ffeefbd2008 M BlockClosure>on:do: 0x1081efa58: a(n) BlockClosure
>>>>     0x7ffeefbd2058 M [] in TestResult>runCase: 0x107e66628: a(n)
>>>> TestResult
>>>>     0x7ffeefbd2090 M BlockClosure>on:do: 0x1081ef940: a(n) BlockClosure
>>>>     0x7ffeefbd20d8 M TestResult>runCase: 0x107e66628: a(n) TestResult
>>>>     0x7ffeefbd2110 M WebClientServerTest(TestCase)>run: 0x107eb0c38:
>>>> a(n) WebClientServerTest
>>>>     0x7ffeefbd2148 M TestRunner>runTest: 0x10c7e9c98: a(n) TestRunner
>>>>     0x7ffeefbd2180 M [] in TestRunner>runSuite: 0x10c7e9c98: a(n)
>>>> TestRunner
>>>>     0x7ffeefbd21f0 M [] in
>>>> OrderedCollection(Collection)>do:displayingProgress:every: 0x107e689b8:
>>>> a(n) OrderedCollection
>>>>     0x7ffeefbd2230 M OrderedCollection>do: 0x107e689b8: a(n)
>>>> OrderedCollection
>>>>     0x7ffeefbd2290 M [] in
>>>> OrderedCollection(Collection)>do:displayingProgress:every: 0x107e689b8:
>>>> a(n) OrderedCollection
>>>>     0x7ffeefbd22e0 M [] in
>>>> MorphicUIManager>displayProgress:at:from:to:during: 0x109d48eb8: a(n)
>>>> MorphicUIManager
>>>>     0x7ffeefbd2318 M BlockClosure>on:do: 0x107e68bf0: a(n) BlockClosure
>>>>     0x7ffeefbd2370 M [] in
>>>> MorphicUIManager>displayProgress:at:from:to:during: 0x109d48eb8: a(n)
>>>> MorphicUIManager
>>>>     0x7ffeefbd23b0 M BlockClosure>ensure: 0x107e68dc0: a(n) BlockClosure
>>>>     0x7ffeefbd2408 I
>>>> MorphicUIManager>displayProgress:at:from:to:during: 0x109d48eb8: a(n)
>>>> MorphicUIManager
>>>>     0x7ffeefbd2470 I ProgressInitiationException>defaultResumeValue
>>>> 0x107e68ed8: a(n) ProgressInitiationException
>>>>     0x7ffeefbd24b8 I ProgressInitiationException(Exception)>resume
>>>> 0x107e68ed8: a(n) ProgressInitiationException
>>>>     0x7ffeefbd24f8 I ProgressInitiationException>defaultAction
>>>> 0x107e68ed8: a(n) ProgressInitiationException
>>>>     0x7ffeefbd2530 M UndefinedObject>handleSignal: 0x1085e78e0: a(n)
>>>> UndefinedObject
>>>>     0x7ffeefbd2578 I ProgressInitiationException(Exception)>signal
>>>> 0x107e68ed8: a(n) ProgressInitiationException
>>>>     0x7ffeefbd25b8 I
>>>> ProgressInitiationException>display:at:from:to:during: 0x107e68ed8: a(n)
>>>> ProgressInitiationException
>>>>     0x7ffeefbd2620 I ProgressInitiationException
>>>> class>display:at:from:to:during: 0x108b34888: a(n)
>>>> ProgressInitiationException class
>>>>     0x7ffeefbd2688 I
>>>> ByteString(String)>displayProgressAt:from:to:during: 0x108ed25d0: a(n)
>>>> ByteString
>>>>     0x7ffeefbd26e8 I ByteString(String)>displayProgressFrom:to:during:
>>>> 0x108ed25d0: a(n) ByteString
>>>>        0x107e69158 s
>>>> OrderedCollection(Collection)>do:displayingProgress:every:
>>>>        0x107e736a8 s [] in TestRunner>basicRunSuite:do:
>>>>        0x107eb01c0 s BlockClosure>ensure:
>>>>        0x107ebae20 s TestRunner>basicRunSuite:do:
>>>>        0x107e735f0 s TestRunner>runSuite:
>>>>        0x107eb00e8 s TestRunner>runAll
>>>>        0x107ebad38 s
>>>> PluggableButtonMorphPlus(PluggableButtonMorph)>performAction
>>>>        0x107ec0030 s PluggableButtonMorphPlus>performAction
>>>>        0x107ec5028 s [] in
>>>> PluggableButtonMorphPlus(PluggableButtonMorph)>mouseUp:
>>>>        0x107ec9e00 s Array(SequenceableCollection)>do:
>>>>        0x107ecb020 s
>>>> PluggableButtonMorphPlus(PluggableButtonMorph)>mouseUp:
>>>>        0x107eccab0 s PluggableButtonMorphPlus(Morph)>handleMouseUp:
>>>>        0x107ecd4f0 s MouseButtonEvent>sentTo:
>>>>        0x107ece660 s PluggableButtonMorphPlus(Morph)>handleEvent:
>>>>        0x107ecee40 s PluggableButtonMorphPlus(Morph)>handleFocusEvent:
>>>>        0x107ed1838 s
>>>> MorphicEventDispatcher>doHandlingForFocusEvent:with:
>>>>        0x107edbe30 s MorphicEventDispatcher>dispatchFocusEvent:with:
>>>>        0x107edc568 s
>>>> PluggableButtonMorphPlus(Morph)>processFocusEvent:using:
>>>>        0x107ededf8 s PluggableButtonMorphPlus(Morph)>processFocusEvent:
>>>>        0x107edf1f8 s [] in HandMorph>sendFocusEvent:to:clear:
>>>>        0x107edf458 s BlockClosure>ensure:
>>>>        0x107edf5f0 s MouseButtonEvent(MorphicEvent)>becomeActiveDuring:
>>>>        0x107edf6d0 s [] in HandMorph>sendFocusEvent:to:clear:
>>>>        0x107edf7a0 s BlockClosure>ensure:
>>>>        0x107edf8b0 s HandMorph>becomeActiveDuring:
>>>>        0x107edf990 s [] in HandMorph>sendFocusEvent:to:clear:
>>>>        0x107edfa48 s BlockClosure>ensure:
>>>>        0x107edfb38 s PasteUpMorph>becomeActiveDuring:
>>>>        0x107edfc18 s HandMorph>sendFocusEvent:to:clear:
>>>>        0x107edfcd0 s HandMorph>sendEvent:focus:clear:
>>>>        0x107edfda8 s HandMorph>sendMouseEvent:
>>>>        0x107edfe70 s HandMorph>handleEvent:
>>>>        0x107edff28 s HandMorph>processEvents
>>>>        0x107edffe0 s [] in WorldState>doOneCycleNowFor:
>>>>        0x107ee0098 s Array(SequenceableCollection)>do:
>>>>        0x107ee0188 s WorldState>handsDo:
>>>>        0x107ee0240 s WorldState>doOneCycleNowFor:
>>>>        0x107ee02f8 s WorldState>doOneCycleFor:
>>>>        0x107ee03b0 s PasteUpMorph>doOneCycle
>>>>        0x10a1bcb48 s [] in MorphicProject>spawnNewProcess
>>>>        0x10a1bc3e8 s [] in BlockClosure>newProcess
>>>> (lldb)
>>>>
>>>> To trigger this, just start an updated trunk64 squeak image and run
>>>> this: WebClientServerTest suite run.
>>>>
>>>> You might have to accept connections in macos dialog box at first run.
>>>>
>>>> I always have $RDI register = 362, same in linux x64...
>>>>
>>>> (gdb) print $rdi
>>>> $1 = 362
>>>> (gdb) x /4i $pc
>>>> => 0x8803c33: mov    (%rdi),%r9
>>>>    0x8803c36: and    $0x3fffff,%r9
>>>>    0x8803c3d: cmp    $0x22,%r9
>>>>    0x8803c41: jne    0x8803c4a
>>>>
>>>> Could it be a conflict in register allocation?
>>>>
>>>> On Win64, I could not get anything useful from gdb (call
>>>> printCallStack() sigsegv gdb itself...).
>>>>
>>>> Though, the instructions are quite similar (the register are different
>>>> due to Win64 ABI)...
>>>> (gdb) x /4i $pc
>>>> => 0x4f02043:   mov    (%rcx),%r10
>>>>    0x4f02046:   and    $0x3fffff,%r10
>>>>    0x4f0204d:   cmp    $0x22,%r10
>>>>    0x4f02051:   jne    0x4f0205a
>>>> (gdb) print $rcx
>>>> $1 = 362
>>>>
>>>> This might well be related to recurring Newspeak bootstrap crash on the
>>>> CI side...
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190910/8bf62261/attachment-0001.html>


More information about the Vm-dev mailing list