[Vm-dev] corruption of PC in context objects or not (?)
Eliot Miranda
eliot.miranda at gmail.com
Fri Sep 11 23:42:47 UTC 2020
Hi Andrei,
On Fri, Sep 11, 2020 at 11:48 AM Andrei Chis <chisvasileandrei at gmail.com>
wrote:
>
> Hi Eliot,
>
> Thanks for the answer. That helps to understand what is going on and it
> can explain why just adding a call to `self pc` makes the crash disappear.
>
> Just what was maybe not obvious in my previous email is that we get this
> problem more or less randomly. We have tests for verifying that tools work
> when various extensions raise exceptions (these tests copy the stack).
> Sometimes they work correctly and sometimes they crash. These crashes
> happen in various tests and until now the only common thing we noticed is
> that the pc of the contexts where the crash happens looks off. Also the
> contexts in which this happens are at the beginning of the stack so part of
> a long computation (it gets copied multiple times).
>
> Initially we suspected that there is some memory corruption somewhere due
> to external calls/memory. Just the fact that calling `self pc` before seems
> to fix the issue reduces those chances. But who knows.
>
Well, it does look like a VM bug. The VM is somehow failing to intercept
some access, perhaps in shallow copy. Weird. I shall try and reproduce.
Is there anything special about the process you copy using copyTo: ?
(see below)
On Fri, Sep 11, 2020 at 6:36 PM Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
>
>>
>> Hi Andrei,
>>
>> On Fri, Sep 11, 2020 at 8:58 AM Andrei Chis <chisvasileandrei at gmail.com>
>> wrote:
>>
>>>
>>> Hi,
>>>
>>> We are getting often crashes on our CI when calling `Context>copyTo:` in
>>> a GT image and a vm build from
>>> https://github.com/feenkcom/opensmalltalk-vm.
>>>
>>> To sum up during `Context>copyTo:`, `Object>>#copy` is called on a
>>> context leading to a segmentation fault crash. Looking at that context in
>>> lldb the pc looks off. It has the value `0xfffffffffea7f6e1`.
>>>
>>> (lldb) call (void *) printOop(0x1206b6990)
>>> 0x1206b6990: a(n) Context
>>> 0x1206b6a48 0xfffffffffea7f6e1 0x9 0x1146b2e08 0x1206b6b00
>>> 0x1206b6b28 0x1206b6b50
>>>
>>>
>>> Can this indicate some corruption or is it expected to have such values?
>>> `CoInterpreter>>ensureContextHasBytecodePC:` has code that also handles
>>> negative values for the pc which suggests that this might be expected.
>>>
>>
>> The issue is that that value is expected *inside* the VM. It is the
>> frame pointer for the context. But above the Vm this value should be
>> hidden. The VM should intercept all accesses to such fields in contexts and
>> automatically map them back to the appropriate values that the image
>> expects to see. [The same thing is true for CompiledMethods; inside the VM
>> methods may refer to their JITted code, but this is invisible from the
>> image]. Intercepting access to Context state already happens with inst var
>> access in methods, with the shallowCopy primitive, with instVarAt: et al,
>> etc.
>>
>> So I expect the issue here is that copyTo: invokes some primitive which
>> does not (yet) check for a context receiver and/or argument, and hence
>> accidentally it reveals the hidden state to the image and a crash results.
>> What I need to know are the definitions for copyTo: and copy, etc all the
>> way down to primitives.
>>
>
> Here is the source code:
>
Cool, nothing unusual here. This should all work perfectly. Tis a VM bug.
However...
> Context >> copyTo: aContext
> "Copy self and my sender chain down to, but not including, aContext. End
> of copied chain will have nil sender."
> | copy |
> self == aContext ifTrue: [^ nil].
> copy := self copy.
> self sender ifNotNil: [
> copy privSender: (self sender copyTo: aContext)].
> ^ copy
>
Let me suggest
Context >> copyTo: aContext
"Copy self and my sender chain down to, but not including, aContext.
End of copied chain will have nil sender."
| copy |
self == aContext ifTrue: [^ nil].
copy := self copy.
self sender ifNotNil:
[:mySender| copy privSender: (mySender copyTo: aContext)].
^ copy
Object>>#copy
> ^self shallowCopy postCopy
>
> Object >> shallowCopy
> | class newObject index |
> <primitive: 148>
> class := self class.
> class isVariable
> ifTrue:
> [index := self basicSize.
> newObject := class basicNew: index.
> [index > 0]
> whileTrue:
> [newObject basicAt: index put: (self basicAt: index).
> index := index - 1]]
> ifFalse: [newObject := class basicNew].
> index := class instSize.
> [index > 0]
> whileTrue:
> [newObject instVarAt: index put: (self instVarAt: index).
> index := index - 1].
> ^ newObject
>
> The code of the primitiveClone looks the same [1]
>
>
>> Changing `Context>copyTo:` by adding a `self pc` before calling `self
>>> copy` leads to no more crashes. Not sure if there is a reason for that or
>>> just plain luck.
>>>
>>> A simple reduced stack is below (more details in this issue [1]). The
>>> crash happens always with contexts reified as objects (in this case
>>> 0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages).
>>> Could this suggest some kind of issue in the vm when reifying contexts,
>>> or just some other problem with memory corruption?
>>>
>>
>> This looks like an oversight in some primitive. Here for example is the
>> implementation of the shallowCopy primitive, a.k.a. clone, and you can see
>> where it explcitly intercepts access to a context.
>>
>> primitiveClone
>> "Return a shallow copy of the receiver.
>> Special-case non-single contexts (because of context-to-stack mapping).
>> Can't fail for contexts cuz of image context instantiation code (sigh)."
>>
>> | rcvr newCopy |
>> rcvr := self stackTop.
>> (objectMemory isImmediate: rcvr)
>> ifTrue:
>> [newCopy := rcvr]
>> ifFalse:
>> [(objectMemory isContextNonImm: rcvr)
>> ifTrue:
>> [newCopy := self cloneContext: rcvr]
>> ifFalse:
>> [(argumentCount = 0
>> or: [(objectMemory isForwarded: rcvr) not])
>> ifTrue: [newCopy := objectMemory clone: rcvr]
>> ifFalse: [newCopy := 0]].
>> newCopy = 0 ifTrue:
>> [^self primitiveFailFor: PrimErrNoMemory]].
>> self pop: argumentCount + 1 thenPush: newCopy
>>
>> But since Squeak doesn't have copyTo: I have no idea what primitive is
>> being used. I'm guessing 168 primitiveCopyObject, which seems to check for
>> a Context receiver, but not for a CompiledCode receiver. What does the
>> primitive failure code look like? Can you post the copyTo: implementations
>> here please?
>>
>
> The code is above. I also see Context>>#copyTo: in Squeak calling also
> Object>>copy for contexts.
>
> When a crash happens we don't get the exact same error all the time. For
> example we get most often on mac:
>
> Process 35690 stopped
>
> * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS
> (code=EXC_I386_GPFLT)
>
> frame #0: 0x00000001100b1004
>
> -> 0x1100b1004: inl $0x4c, %eax
>
> 0x1100b1006: leal -0x5c(%rip), %eax
>
> 0x1100b100c: pushq %r8
>
> 0x1100b100e: movabsq $0x1109e78e0, %r9 ; imm = 0x1109E78E0
>
> Target 0: (GlamorousToolkit) stopped.
>
>
> Process 29929 stopped
>
> * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT
> (code=EXC_I386_BPT, subcode=0x0)
>
> frame #0: 0x00000001100fe7ed
>
> -> 0x1100fe7ed: int3
>
> 0x1100fe7ee: int3
>
> 0x1100fe7ef: int3
>
> 0x1100fe7f0: int3
>
> Target 0: (GlamorousToolkit) stopped.
>
>
> [1]
> https://github.com/feenkcom/opensmalltalk-vm/blob/5f7d49227c9599a35fcb93892b727c93a573482c/smalltalksrc/VMMaker/StackInterpreterPrimitives.class.st#L325
>
> Cheers,
> Andrei
>
>
>>
>> 0x7ffeefbb4380 M Context(Object)>copy 0x1206b6990: a(n) Context
>>> 0x7ffeefbb43b8 M Context>copyTo: 0x1206b6990: a(n) Context
>>> 0x7ffeefbb4400 M Context>copyTo: 0x1206b5ae0: a(n) Context
>>> ...
>>> 0x7ffeefba6078 M Context>copyTo: 0x110548b28: a(n) Context
>>> 0x7ffeefba60d0 I Context>copyTo: 0x110548a70: a(n) Context
>>> 0x7ffeefba6118 I MessageNotUnderstood(Exception)>freezeUpTo: 0x110548a20: a(n) MessageNotUnderstood
>>> 0x7ffeefba6160 I MessageNotUnderstood(Exception)>freeze 0x110548a20: a(n) MessageNotUnderstood
>>> 0x7ffeefba6190 M [] in GtExampleEvaluator>result 0x110544fb8: a(n) GtExampleEvaluator
>>> 0x7ffeefba61c8 M BlockClosure>cull: 0x110545188: a(n) BlockClosure
>>> 0x7ffeefba6208 M Context>evaluateSignal: 0x110548c98: a(n) Context
>>> 0x7ffeefba6240 M Context>handleSignal: 0x110548c98: a(n) Context
>>> 0x7ffeefba6278 M Context>handleSignal: 0x110548be0: a(n) Context
>>> 0x7ffeefba62b0 M MessageNotUnderstood(Exception)>signal 0x110548a20: a(n) MessageNotUnderstood
>>> 0x7ffeefba62f0 M GtDummyExamplesWithInheritanceSubclassB(Object)>doesNotUnderstand: exampleH 0x1105487d8: a(n) GtDummyExamplesWithInheritanceSubclassB
>>> 0x7ffeefba6328 M GtExampleEvaluator>primitiveProcessExample:withEvaluationContext: 0x110544fb8: a(n) GtExampleEvaluator
>>> ...
>>> 0x7ffeefbe64d0 M [] in GtExamplesHDReport class(HDReport class)>runPackages: 0x1145e41c8: a(n) GtExamplesHDReport class
>>> 0x7ffeefbe6520 M [] in Set>collect: 0x1206b5ab0: a(n) Set
>>> 0x7ffeefbe6568 M Array(SequenceableCollection)>do: 0x1206b5c50: a(n) Array
>>> 0x1206b5b98 s Set>collect:
>>> 0x1206b5ae0 s GtExamplesHDReport class(HDReport class)>runPackages:
>>> 0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages
>>> 0x1206b6a48 s BlockClosure>ensure:
>>> 0x1206b6b68 s UIManager class>nonInteractiveDuring:
>>> 0x1206b6c48 s GtExamplesCommandLineHandler>runPackages
>>> 0x1206b6d98 s GtExamplesCommandLineHandler>activate
>>> 0x1206b75d0 s GtExamplesCommandLineHandler class(CommandLineHandler class)>activateWith:
>>> 0x1207d2f00 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
>>> 0x1207e6620 s BlockClosure>on:do:
>>> 0x1207f7ab8 s PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
>>> 0x120809d40 s PharoCommandLineHandler(BasicCommandLineHandler)>handleSubcommand
>>> 0x12082ca60 s PharoCommandLineHandler(BasicCommandLineHandler)>handleArgument:
>>> 0x120789938 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
>>> 0x1207a83e0 s BlockClosure>on:do:
>>> 0x1207b57a0 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
>>> 0x1207bf830 s [] in BlockClosure>newProcess
>>>
>>> Cheers,
>>> Andrei
>>>
>>>
>>> [1] https://github.com/feenkcom/gtoolkit/issues/1440
>>>
>>>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
>>
>
--
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20200911/83df9fe1/attachment-0001.html>
More information about the Vm-dev
mailing list