[Vm-dev] tempVectors use case and current issues

Denis Kudriashov dionisiydk at gmail.com
Mon Apr 8 08:04:42 UTC 2019


Hi Clement

пт, 5 апр. 2019 г., 12:42 Clément Béra <bera.clement at gmail.com>:

>
> Hi,
>
> Don't know if it makes sense, but although the VM does not perform
> read-only check to write into temp vector by default, it is possible to
> activate such checks through a flag I introduced last year for the
> incremental compactor. Overhead seemed to be minimal.
>

Is it a flag to compile VM or image side?
And is it a requirement for new compactor. So it will be enabled at some
point by default?


> Best,
>
> On Sat, Mar 30, 2019 at 3:50 AM Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
>
>>
>> Hi Denis,
>>
>> On Mar 28, 2019, at 5:10 PM, Denis Kudriashov <dionisiydk at gmail.com>
>> wrote:
>>
>> Hi Eliot
>>
>> чт, 28 мар. 2019 г. в 23:29, Eliot Miranda <eliot.miranda at gmail.com>:
>>
>>>
>>> Hi Denis,
>>>
>>> On Thu, Mar 28, 2019 at 2:36 PM Denis Kudriashov <dionisiydk at gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hi Nicolas.
>>>>
>>>> чт, 28 мар. 2019 г. в 19:44, Nicolas Cellier <
>>>> nicolas.cellier.aka.nice at gmail.com>:
>>>>
>>>>>
>>>>> Hi Denis,
>>>>> Special bytecodes don't have to be changed: just don't use them and
>>>>> replace by regular sends at bytecode generation (with a special compiler,
>>>>> or some IR translater).
>>>>>
>>>>
>>>> Sure, bytecode transformation will work. But it would be quite tricky
>>>> to apply in live execution context. It would require fixing context stack
>>>> to take into account updated method bytecode.
>>>> Notice that I don't search for global setting to recompile all methods
>>>> in image. I want this logic only for concrete method/block activation. In
>>>> my scenario block is serialized and transferred together with current
>>>> context. So on remote side I need to do something with materialized objects
>>>> to maintain normal block semantics.
>>>>
>>>>
>>>>> All can be done at image side then. Or did I miss something?
>>>>>
>>>>
>>>> I think my examples shows a security hole in VM execution logic which
>>>> allows to violate memory bounds from the image side.
>>>>
>>>
>>> It is no different than using an inst var access bytecode on an object
>>> which doesn't have enough net vars.  It is not a security hole, as much as
>>> it is something the system must use correctly to avoid crashes.  The same
>>> can be done by e.g.
>>>
>>>     thisContext swapSender: Point basicNew
>>>
>>> There are many such "security holes".  And if you want the VM to plug
>>> them all then the VM will become very much slower.
>>>
>>>
>>>> I did not got segfault but I would not be surprized if it would happens
>>>> in some complex real live scenarios. Maybe it looks like a specially
>>>> invented case but I think it is quite easy to get when using or developing
>>>> low level serialization library - as soon as you by mistake or
>>>> intentionally serialize context objects with some substitution logic.
>>>> And considering that this hole needs to be closed it would be good
>>>> opportunity to have another hook in execution engine which can be used like
>>>> in my remote scenario. So back to my proposal in first mail.
>>>>
>>>
>>> If you want to solve this, then build a transformation for the block
>>> method when you remote a block.  As others have suggested (Levente) you can
>>> transform the bytecodes into normal sends (my blog post on the entire
>>> scheme starts with implementing it using at: and at:put: before the special
>>> bytecodes are added).  But making a change to all blocks breaks much of the
>>> Sista adaptive optimizer.  We have to have the freedom to access indirect
>>> temp vectors via special case bytecodes if we are to be able to
>>> aggressively optimize code.  If indirect temp vectors are to be treated as
>>> general purpose objects, then we are prevented from making many significant
>>> optimizations.
>>>
>>
>> Ok. I expected such answers :) but ask for the chance that some cheap
>> trick is possible. Like my readOnly example. It shows that there is at
>> least writebarrier check during this operation. If it would signal an error
>> it could be used to do the job.
>> Method transformation would be quite complex to use because It needs to
>> be applied dynamically to live context, and it requires stack modifications
>> on the fly. Just compiling method in advance is not appropriate for my
>> goal. I don't want to change compiler globally or force user to do it for
>> concrete method/class. It would be not transparent solution.
>>
>>
>> Well maybe.  But transforming a block and its activations is
>> straightforward:
>>
>> - it is easy to construct a transformation from tempVector bytecode
>> blocks (TVBB) to tempVector message blocks (TVMB) because there are no
>> suspension points in the bytecodes and the stack heights at the start and
>> end of the bytecodes are the same as for the message versions.  So some
>> form of
>>    store indirect temp bytecode =>
>>    dup (now value exists twice)
>>    dup (now value exists thrice)
>>    push indirect temp
>>    pop store into value location that was duped
>>    push index
>>    pop store into 2nd value
>>    send at:put:
>> will reimplement.  And then it’s just a matter of remapping PCs from one
>> to the other and lengthening jumps.  A day’s work or two at most
>>
>>  If the transformation is done in the marshaller that remotes objects
>> then it will be easy to substitute the transformed method and map any PCs
>> in contexts and closures (the JIT does this kind of mapping routinely).
>>
>> The only problem transforming in the other direction (if you ever need
>> to) is in advancing computation past the message send sequence for the TVMB
>> access.  That can be coding with Context’s single step facility along with
>> the pc map from which you can detect where the end of the sequence is.
>>
>> Anyway thanks all for answers.
>>
>>
>>> So, as the doctor said, "don't do that".
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Le jeu. 28 mars 2019 à 20:05, Denis Kudriashov <dionisiydk at gmail.com>
>>>>> a écrit :
>>>>>
>>>>>>
>>>>>> Hi.
>>>>>>
>>>>>> I found interesting case where tempVectors can be used in remote
>>>>>> scenarios. The store into remote temp can be really remote (not just about
>>>>>> outer context).
>>>>>> I played with following example:
>>>>>>
>>>>>> | temp |
>>>>>> temp := 10.
>>>>>> remote evaluate: [temp := temp + 1].
>>>>>> temp.
>>>>>>
>>>>>>
>>>>>> For the moment forget about remote thing and look into it as a normal
>>>>>> local case:
>>>>>> temp var here is managed indirectly through tempVector. You can see
>>>>>> it using expression after first assignment:
>>>>>>
>>>>>> thisContext at: 1 "=>#(10)"
>>>>>>
>>>>>>
>>>>>> So the value in fact is stored in the array instance and read from
>>>>>> it.
>>>>>> But because of optimization it happens out of the array control. No
>>>>>> #at: and #at:put: messages are sent during this code. VM magically changes
>>>>>> the state of this array (there are special bytecodes for this).
>>>>>>
>>>>>> Now my remote use case. Imagine that vm actually sends #at: and
>>>>>> #at:put: messages to tempVector. Then remoting engine can transfer temp
>>>>>> vector (as part of context) as a proxy. So on remote side the block [temp
>>>>>> := temp + 1] will actually ask the sender (client) for the value and for
>>>>>> the storage. So all block semantics will be supported. Temp in remote outer
>>>>>> context will be modified. I think it would be super cool if such
>>>>>> transparency would be possible.
>>>>>>
>>>>>> I played with this example using Seamless in Pharo. It already works
>>>>>> in the way I described but due to VM optimization it does not provide
>>>>>> expected behavior. And worse than that it actually corrupts transferred
>>>>>> proxy because in place of array the proxy instance is materialized.
>>>>>>
>>>>>> This leads us to the issue with safety of tempVector operations.
>>>>>> Following example shows how we can affect the state of tempVector using
>>>>>> reflection:
>>>>>>
>>>>>> | temp |
>>>>>> temp := 10.
>>>>>> (thisContext at: 1) at: 1 put: 50.
>>>>>> [temp := temp + 1] value.
>>>>>> temp. "==>51"
>>>>>>
>>>>>> It is cool that we can do it. But there is no any safety check in the
>>>>>> VM level over tempVector object:
>>>>>>
>>>>>> | temp |
>>>>>> temp := 10.
>>>>>> thisContext at: 1 put: Object new.
>>>>>> [temp := temp + 1] value.
>>>>>> temp.
>>>>>>
>>>>>>
>>>>>> It breaks with DNU: #+ is sent to nil. Temp became nil.
>>>>>>
>>>>>>
>>>>>> | temp |
>>>>>> temp := 10.
>>>>>> thisContext at: 1 put: #() copy.
>>>>>> [temp := temp + 1] value.
>>>>>> temp.
>>>>>>
>>>>>>
>>>>>> Sometimes it breaks with same error. Sometimes it returns random
>>>>>> number.
>>>>>> I guess in these cases VM breaks memory boundary of tempVector.
>>>>>>
>>>>>> And two exotic cases:
>>>>>>
>>>>>>
>>>>>> | temp |
>>>>>> temp := 10.
>>>>>> (thisContext at: 1) beReadOnlyObject.
>>>>>> [temp := temp + 1] value.
>>>>>> temp.
>>>>>>
>>>>>>
>>>>>> It silently return 11. It does not break read only protection. But no
>>>>>> error is signalled.
>>>>>>
>>>>>> | temp |
>>>>>> temp := 10.
>>>>>> (thisContext at: 1) become: #() copy.
>>>>>> [temp := temp + 1] value.
>>>>>> temp.
>>>>>>
>>>>>>
>>>>>> It returns #().  (In Pharo  #() + 1 = #()  ).
>>>>>> I use become to check how forwarding is working in that case. (it
>>>>>> works fine when array has correct size)
>>>>>>
>>>>>> How we can improve this behavior? How it would effect performance?
>>>>>> My proposal is to send real messages to tempVector when it is not an
>>>>>> array instance. Then image will decide what to do.
>>>>>>
>>>>>> Best regards,
>>>>>> Denis
>>>>>>
>>>>> _,,,^..^,,,_
>>> best, Eliot
>>>
>>
>> _,,,^..^,,,_ (phone)
>>
>
>
> --
> Clément Béra
> https://clementbera.github.io/
> https://clementbera.wordpress.com/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190408/f4a471c1/attachment-0001.html>


More information about the Vm-dev mailing list