[Vm-dev] corruption of PC in context objects or not (?)

Andrei Chis chisvasileandrei at gmail.com
Fri Sep 11 18:48:15 UTC 2020


Hi Eliot,

Thanks for the answer. That helps to understand what is going on and it can
explain why just adding a call to `self pc` makes the crash disappear.

Just what was maybe not obvious in my previous email is that we get this
problem more or less randomly. We have tests for verifying that tools work
when various extensions raise exceptions (these tests copy the stack).
Sometimes they work correctly and sometimes they crash. These crashes
happen in various tests and until now the only common thing we noticed is
that the pc of the contexts where the crash happens looks off. Also the
contexts in which this happens are at the beginning of the stack so part of
a long computation (it gets copied multiple times).

Initially we suspected that there is some memory corruption somewhere due
to external calls/memory. Just the fact that calling `self pc` before seems
to fix the issue reduces those chances. But who knows.


On Fri, Sep 11, 2020 at 6:36 PM Eliot Miranda <eliot.miranda at gmail.com>
wrote:

>
> Hi Andrei,
>
> On Fri, Sep 11, 2020 at 8:58 AM Andrei Chis <chisvasileandrei at gmail.com>
> wrote:
>
>>
>> Hi,
>>
>> We are getting often crashes on our CI when calling `Context>copyTo:` in
>> a GT image and a vm build from
>> https://github.com/feenkcom/opensmalltalk-vm.
>>
>> To sum up during `Context>copyTo:`, `Object>>#copy` is called on a
>> context leading to a segmentation fault crash. Looking at that context in
>> lldb the pc looks off.  It has the value `0xfffffffffea7f6e1`.
>>
>>  (lldb) call (void *) printOop(0x1206b6990)
>>     0x1206b6990: a(n) Context
>>      0x1206b6a48 0xfffffffffea7f6e1                0x9        0x1146b2e08        0x1206b6b00
>>      0x1206b6b28        0x1206b6b50
>>
>>
>> Can this indicate some corruption or is it expected to have such values?
>> `CoInterpreter>>ensureContextHasBytecodePC:` has code that also handles
>> negative values for the pc which suggests that this might be expected.
>>
>
> The issue is that that value is expected *inside* the VM.  It is the frame
> pointer for the context.  But above the Vm this value should be hidden. The
> VM should intercept all accesses to such fields in contexts and
> automatically map them back to the appropriate values that the image
> expects to see.  [The same thing is true for CompiledMethods; inside the VM
> methods may refer to their JITted code, but this is invisible from the
> image].  Intercepting access to Context state already happens with inst var
> access in methods, with the shallowCopy primitive, with instVarAt: et al,
> etc.
>
> So I expect the issue here is that copyTo: invokes some primitive which
> does not (yet) check for a context receiver and/or argument, and hence
> accidentally it reveals the hidden state to the image and a crash results.
> What I need to know are the definitions for copyTo: and copy, etc all the
> way down to primitives.
>

Here is the source code:

Context >> copyTo: aContext
"Copy self and my sender chain down to, but not including, aContext.  End
of copied chain will have nil sender."
    | copy |
    self == aContext ifTrue: [^ nil].
    copy := self copy.
    self sender ifNotNil: [
        copy privSender: (self sender copyTo: aContext)].
    ^ copy

Object>>#copy
     ^self shallowCopy postCopy

Object >> shallowCopy
    | class newObject index |
    <primitive: 148>
    class := self class.
    class isVariable
        ifTrue:
            [index := self basicSize.
            newObject := class basicNew: index.
            [index > 0]
                whileTrue:
                    [newObject basicAt: index put: (self basicAt: index).
                    index := index - 1]]
        ifFalse: [newObject := class basicNew].
    index := class instSize.
    [index > 0]
        whileTrue:
            [newObject instVarAt: index put: (self instVarAt: index).
            index := index - 1].
    ^ newObject

The code of the primitiveClone looks the same [1]


> Changing `Context>copyTo:` by adding a `self pc` before calling `self
>> copy` leads to no more crashes. Not sure if there is a reason for that or
>> just plain luck.
>>
>> A simple reduced stack is below (more details in this issue [1]). The
>> crash happens always with contexts reified as objects (in this case
>> 0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages).
>> Could this suggest some kind of issue in the vm when reifying contexts,
>> or just some other problem with memory corruption?
>>
>
> This looks like an oversight in some primitive.  Here for example is the
> implementation of the shallowCopy primitive, a.k.a. clone, and you can see
> where it explcitly intercepts access to a context.
>
> primitiveClone
> "Return a shallow copy of the receiver.
> Special-case non-single contexts (because of context-to-stack mapping).
> Can't fail for contexts cuz of image context instantiation code (sigh)."
>
> | rcvr newCopy |
> rcvr := self stackTop.
> (objectMemory isImmediate: rcvr)
> ifTrue:
> [newCopy := rcvr]
> ifFalse:
> [(objectMemory isContextNonImm: rcvr)
> ifTrue:
> [newCopy := self cloneContext: rcvr]
> ifFalse:
> [(argumentCount = 0
>  or: [(objectMemory isForwarded: rcvr) not])
> ifTrue: [newCopy := objectMemory clone: rcvr]
> ifFalse: [newCopy := 0]].
> newCopy = 0 ifTrue:
> [^self primitiveFailFor: PrimErrNoMemory]].
> self pop: argumentCount + 1 thenPush: newCopy
>
> But since Squeak doesn't have copyTo: I have no idea what primitive is
> being used.  I'm guessing 168 primitiveCopyObject, which seems to check for
> a Context receiver, but not for a CompiledCode receiver.  What does the
> primitive failure code look like?  Can you post the copyTo: implementations
> here please?
>

The code is above. I also see Context>>#copyTo: in Squeak calling also
Object>>copy for contexts.

When a crash happens we don't get the exact same error all the time. For
example we get most often on mac:

Process 35690 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS
(code=EXC_I386_GPFLT)

    frame #0: 0x00000001100b1004

->  0x1100b1004: inl    $0x4c, %eax

    0x1100b1006: leal   -0x5c(%rip), %eax

    0x1100b100c: pushq  %r8

    0x1100b100e: movabsq $0x1109e78e0, %r9         ; imm = 0x1109E78E0

Target 0: (GlamorousToolkit) stopped.


Process 29929 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT
(code=EXC_I386_BPT, subcode=0x0)

    frame #0: 0x00000001100fe7ed

->  0x1100fe7ed: int3

    0x1100fe7ee: int3

    0x1100fe7ef: int3

    0x1100fe7f0: int3

Target 0: (GlamorousToolkit) stopped.


[1]
https://github.com/feenkcom/opensmalltalk-vm/blob/5f7d49227c9599a35fcb93892b727c93a573482c/smalltalksrc/VMMaker/StackInterpreterPrimitives.class.st#L325

Cheers,
Andrei


>
>  0x7ffeefbb4380 M Context(Object)>copy 0x1206b6990: a(n) Context
>>     0x7ffeefbb43b8 M Context>copyTo: 0x1206b6990: a(n) Context
>>     0x7ffeefbb4400 M Context>copyTo: 0x1206b5ae0: a(n) Context
>>   ...
>>     0x7ffeefba6078 M Context>copyTo: 0x110548b28: a(n) Context
>>     0x7ffeefba60d0 I Context>copyTo: 0x110548a70: a(n) Context
>>     0x7ffeefba6118 I MessageNotUnderstood(Exception)>freezeUpTo: 0x110548a20: a(n) MessageNotUnderstood
>>     0x7ffeefba6160 I MessageNotUnderstood(Exception)>freeze 0x110548a20: a(n) MessageNotUnderstood
>>     0x7ffeefba6190 M [] in GtExampleEvaluator>result 0x110544fb8: a(n) GtExampleEvaluator
>>     0x7ffeefba61c8 M BlockClosure>cull: 0x110545188: a(n) BlockClosure
>>     0x7ffeefba6208 M Context>evaluateSignal: 0x110548c98: a(n) Context
>>     0x7ffeefba6240 M Context>handleSignal: 0x110548c98: a(n) Context
>>     0x7ffeefba6278 M Context>handleSignal: 0x110548be0: a(n) Context
>>     0x7ffeefba62b0 M MessageNotUnderstood(Exception)>signal 0x110548a20: a(n) MessageNotUnderstood
>>     0x7ffeefba62f0 M GtDummyExamplesWithInheritanceSubclassB(Object)>doesNotUnderstand: exampleH 0x1105487d8: a(n) GtDummyExamplesWithInheritanceSubclassB
>>     0x7ffeefba6328 M GtExampleEvaluator>primitiveProcessExample:withEvaluationContext: 0x110544fb8: a(n) GtExampleEvaluator
>>  ...
>>     0x7ffeefbe64d0 M [] in GtExamplesHDReport class(HDReport class)>runPackages: 0x1145e41c8: a(n) GtExamplesHDReport class
>>     0x7ffeefbe6520 M [] in Set>collect: 0x1206b5ab0: a(n) Set
>>     0x7ffeefbe6568 M Array(SequenceableCollection)>do: 0x1206b5c50: a(n) Array
>>        0x1206b5b98 s Set>collect:
>>        0x1206b5ae0 s GtExamplesHDReport class(HDReport class)>runPackages:
>>        0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages
>>        0x1206b6a48 s BlockClosure>ensure:
>>        0x1206b6b68 s UIManager class>nonInteractiveDuring:
>>        0x1206b6c48 s GtExamplesCommandLineHandler>runPackages
>>        0x1206b6d98 s GtExamplesCommandLineHandler>activate
>>        0x1206b75d0 s GtExamplesCommandLineHandler class(CommandLineHandler class)>activateWith:
>>        0x1207d2f00 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
>>        0x1207e6620 s BlockClosure>on:do:
>>        0x1207f7ab8 s PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
>>        0x120809d40 s PharoCommandLineHandler(BasicCommandLineHandler)>handleSubcommand
>>        0x12082ca60 s PharoCommandLineHandler(BasicCommandLineHandler)>handleArgument:
>>        0x120789938 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
>>        0x1207a83e0 s BlockClosure>on:do:
>>        0x1207b57a0 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
>>        0x1207bf830 s [] in BlockClosure>newProcess
>>
>> Cheers,
>> Andrei
>>
>>
>> [1] https://github.com/feenkcom/gtoolkit/issues/1440
>>
>>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20200911/f051cf69/attachment-0001.html>


More information about the Vm-dev mailing list