[Vm-dev] tempVectors use case and current issues

Fri Apr 5 11:42:12 UTC 2019

Hi,

Don't know if it makes sense, but although the VM does not perform
read-only check to write into temp vector by default, it is possible to
activate such checks through a flag I introduced last year for the
incremental compactor. Overhead seemed to be minimal.

Best,

On Sat, Mar 30, 2019 at 3:50 AM Eliot Miranda <eliot.miranda at gmail.com>
wrote:

>
> Hi Denis,
>
> On Mar 28, 2019, at 5:10 PM, Denis Kudriashov <dionisiydk at gmail.com>
> wrote:
>
> Hi Eliot
>
> чт, 28 мар. 2019 г. в 23:29, Eliot Miranda <eliot.miranda at gmail.com>:
>
>>
>> Hi Denis,
>>
>> On Thu, Mar 28, 2019 at 2:36 PM Denis Kudriashov <dionisiydk at gmail.com>
>> wrote:
>>
>>>
>>> Hi Nicolas.
>>>
>>> чт, 28 мар. 2019 г. в 19:44, Nicolas Cellier <
>>> nicolas.cellier.aka.nice at gmail.com>:
>>>
>>>>
>>>> Hi Denis,
>>>> Special bytecodes don't have to be changed: just don't use them and
>>>> replace by regular sends at bytecode generation (with a special compiler,
>>>> or some IR translater).
>>>>
>>>
>>> Sure, bytecode transformation will work. But it would be quite tricky to
>>> apply in live execution context. It would require fixing context stack to
>>> take into account updated method bytecode.
>>> Notice that I don't search for global setting to recompile all methods
>>> in image. I want this logic only for concrete method/block activation. In
>>> my scenario block is serialized and transferred together with current
>>> context. So on remote side I need to do something with materialized objects
>>> to maintain normal block semantics.
>>>
>>>
>>>> All can be done at image side then. Or did I miss something?
>>>>
>>>
>>> I think my examples shows a security hole in VM execution logic which
>>> allows to violate memory bounds from the image side.
>>>
>>
>> It is no different than using an inst var access bytecode on an object
>> which doesn't have enough net vars.  It is not a security hole, as much as
>> it is something the system must use correctly to avoid crashes.  The same
>> can be done by e.g.
>>
>>     thisContext swapSender: Point basicNew
>>
>> There are many such "security holes".  And if you want the VM to plug
>> them all then the VM will become very much slower.
>>
>>
>>> I did not got segfault but I would not be surprized if it would happens
>>> in some complex real live scenarios. Maybe it looks like a specially
>>> invented case but I think it is quite easy to get when using or developing
>>> low level serialization library - as soon as you by mistake or
>>> intentionally serialize context objects with some substitution logic.
>>> And considering that this hole needs to be closed it would be good
>>> opportunity to have another hook in execution engine which can be used like
>>> in my remote scenario. So back to my proposal in first mail.
>>>
>>
>> If you want to solve this, then build a transformation for the block
>> method when you remote a block.  As others have suggested (Levente) you can
>> transform the bytecodes into normal sends (my blog post on the entire
>> scheme starts with implementing it using at: and at:put: before the special
>> bytecodes are added).  But making a change to all blocks breaks much of the
>> Sista adaptive optimizer.  We have to have the freedom to access indirect
>> temp vectors via special case bytecodes if we are to be able to
>> aggressively optimize code.  If indirect temp vectors are to be treated as
>> general purpose objects, then we are prevented from making many significant
>> optimizations.
>>
>
> Ok. I expected such answers :) but ask for the chance that some cheap
> trick is possible. Like my readOnly example. It shows that there is at
> least writebarrier check during this operation. If it would signal an error
> it could be used to do the job.
> Method transformation would be quite complex to use because It needs to be
> applied dynamically to live context, and it requires stack modifications on
> the fly. Just compiling method in advance is not appropriate for my goal. I
> don't want to change compiler globally or force user to do it for concrete
> method/class. It would be not transparent solution.
>
>
> Well maybe.  But transforming a block and its activations is
> straightforward:
>
> - it is easy to construct a transformation from tempVector bytecode blocks
> (TVBB) to tempVector message blocks (TVMB) because there are no suspension
> points in the bytecodes and the stack heights at the start and end of the
> bytecodes are the same as for the message versions.  So some form of
>    store indirect temp bytecode =>
>    dup (now value exists twice)
>    dup (now value exists thrice)
>    push indirect temp
>    pop store into value location that was duped
>    push index
>    pop store into 2nd value
>    send at:put:
> will reimplement.  And then it’s just a matter of remapping PCs from one
> to the other and lengthening jumps.  A day’s work or two at most
>
>  If the transformation is done in the marshaller that remotes objects then
> it will be easy to substitute the transformed method and map any PCs in
> contexts and closures (the JIT does this kind of mapping routinely).
>
> The only problem transforming in the other direction (if you ever need to)
> is in advancing computation past the message send sequence for the TVMB
> access.  That can be coding with Context’s single step facility along with
> the pc map from which you can detect where the end of the sequence is.
>
> Anyway thanks all for answers.
>
>
>> So, as the doctor said, "don't do that".
>>
>>
>>>
>>>
>>>>
>>>> Le jeu. 28 mars 2019 à 20:05, Denis Kudriashov <dionisiydk at gmail.com>
>>>> a écrit :
>>>>
>>>>>
>>>>> Hi.
>>>>>
>>>>> I found interesting case where tempVectors can be used in remote
>>>>> scenarios. The store into remote temp can be really remote (not just about
>>>>> outer context).
>>>>> I played with following example:
>>>>>
>>>>> | temp |
>>>>> temp := 10.
>>>>> remote evaluate: [temp := temp + 1].
>>>>> temp.
>>>>>
>>>>>
>>>>> For the moment forget about remote thing and look into it as a normal
>>>>> local case:
>>>>> temp var here is managed indirectly through tempVector. You can see it
>>>>> using expression after first assignment:
>>>>>
>>>>> thisContext at: 1 "=>#(10)"
>>>>>
>>>>>
>>>>> So the value in fact is stored in the array instance and read from it.
>>>>> But because of optimization it happens out of the array control. No
>>>>> #at: and #at:put: messages are sent during this code. VM magically changes
>>>>> the state of this array (there are special bytecodes for this).
>>>>>
>>>>> Now my remote use case. Imagine that vm actually sends #at: and
>>>>> #at:put: messages to tempVector. Then remoting engine can transfer temp
>>>>> vector (as part of context) as a proxy. So on remote side the block [temp
>>>>> := temp + 1] will actually ask the sender (client) for the value and for
>>>>> the storage. So all block semantics will be supported. Temp in remote outer
>>>>> context will be modified. I think it would be super cool if such
>>>>> transparency would be possible.
>>>>>
>>>>> I played with this example using Seamless in Pharo. It already works
>>>>> in the way I described but due to VM optimization it does not provide
>>>>> expected behavior. And worse than that it actually corrupts transferred
>>>>> proxy because in place of array the proxy instance is materialized.
>>>>>
>>>>> This leads us to the issue with safety of tempVector operations.
>>>>> Following example shows how we can affect the state of tempVector using
>>>>> reflection:
>>>>>
>>>>> | temp |
>>>>> temp := 10.
>>>>> (thisContext at: 1) at: 1 put: 50.
>>>>> [temp := temp + 1] value.
>>>>> temp. "==>51"
>>>>>
>>>>> It is cool that we can do it. But there is no any safety check in the
>>>>> VM level over tempVector object:
>>>>>
>>>>> | temp |
>>>>> temp := 10.
>>>>> thisContext at: 1 put: Object new.
>>>>> [temp := temp + 1] value.
>>>>> temp.
>>>>>
>>>>>
>>>>> It breaks with DNU: #+ is sent to nil. Temp became nil.
>>>>>
>>>>>
>>>>> | temp |
>>>>> temp := 10.
>>>>> thisContext at: 1 put: #() copy.
>>>>> [temp := temp + 1] value.
>>>>> temp.
>>>>>
>>>>>
>>>>> Sometimes it breaks with same error. Sometimes it returns random
>>>>> number.
>>>>> I guess in these cases VM breaks memory boundary of tempVector.
>>>>>
>>>>> And two exotic cases:
>>>>>
>>>>>
>>>>> | temp |
>>>>> temp := 10.
>>>>> (thisContext at: 1) beReadOnlyObject.
>>>>> [temp := temp + 1] value.
>>>>> temp.
>>>>>
>>>>>
>>>>> It silently return 11. It does not break read only protection. But no
>>>>> error is signalled.
>>>>>
>>>>> | temp |
>>>>> temp := 10.
>>>>> (thisContext at: 1) become: #() copy.
>>>>> [temp := temp + 1] value.
>>>>> temp.
>>>>>
>>>>>
>>>>> It returns #().  (In Pharo  #() + 1 = #()  ).
>>>>> I use become to check how forwarding is working in that case. (it
>>>>> works fine when array has correct size)
>>>>>
>>>>> How we can improve this behavior? How it would effect performance?
>>>>> My proposal is to send real messages to tempVector when it is not an
>>>>> array instance. Then image will decide what to do.
>>>>>
>>>>> Best regards,
>>>>> Denis
>>>>>
>>>> _,,,^..^,,,_
>> best, Eliot
>>
>
> _,,,^..^,,,_ (phone)
>

-- 
Clément Béra
https://clementbera.github.io/
https://clementbera.wordpress.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190405/336f9614/attachment-0001.html>