[Vm-dev] [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Tue Feb 7 05:51:59 UTC 2017

Try the following experiment.
Copy Object>>shallowCopy to Object>>monitorShallowCopy
and after the pragma add...
     Smalltalk at: #Monitor put: #Failed.

Then in Playground...
    lastfail := 0.
    1 to: 100000 do: [ :n |
|src copy|
src := Array new: n.
Smalltalk at: #Monitor put: #Succeeded.
copy := src monitorShallowCopy.
(Smalltalk at: #Monitor) == #Failed ifTrue: [
Transcript crShow: n; tab; show: n - lastfail.
lastfail := n.
].
].

Produces the following interesting result....

RUN1...

65559 65559
67670 2111
67685 15
67700 15
67715 15
67730 15
...
69860 15
69875 15
69890 15
69905 15
72334 2429
72348 14
72362 14
72376 14
72390 14
...
74854 14
74868 14
74882 14
74896 14
77681 2785
77694 13
77707 13
77720 13
77733 13
...
80619 13
80632 13
80645 13
80658 13
83894 3236
83906 12
83918 12
83930 12
83942 12
...
87338 12
87350 12
87362 12
87374 12
91189 3815
91200 11
91211 11
91222 11
91233 11
...
95292 11
95303 11
95314 11
95325 11
99867 4542
99877 10
99887 10
99897 10
99907 10
99917 10
99927 10
99937 10
99947 10
99957 10
99967 10
99977 10
99987 10
99997 10

RUN2...

67660 67660
67675 15
67690 15
67705 15
67720 15
....
69865 15
69880 15
69895 15
69910 15
72324 2414
72338 14
72352 14
72366 14
72380 14
....
74858 14
74872 14
74886 14
74900 14
77685 2785
77698 13
77711 13
77724 13
77737 13
....
80623 13
80636 13
80649 13
80662 13
83898 3236
83910 12
83922 12
83934 12
83946 12
...
87342 12
87354 12
87366 12
87378 12
91193 3815
91204 11
91215 11
91226 11
91237 11
...
95285 11
95296 11
95307 11
95318 11
99871 4553
99881 10
99891 10
99901 10
99911 10
99921 10
99931 10
99941 10
99951 10
99961 10
99971 10
99981 10
99991 10

This is with
* 60375-64.zip
* cog_win64x64_squeak.stack.spur_201702021058.zip
* Windows 7 Professional SP1

cheers -ben

On Tue, Feb 7, 2017 at 10:15 AM, Ciprian Teodorov <
ciprian.teodorov at gmail.com> wrote:
>
>
> Thanks Ben,
>
> the <primitive: 148> seems to fail something like 4-5 % with my bench
(osx 10.11.6, the latest Pharo/Cog)
>
>     # of copy calls        Failing primitive 148      Failing rate
> 1710 77 4,50%
> 3049 133 4,36%
> 51562 2947 5,72%
>
> and it does not seem to fail at all with something like:
>
> 1 to: 1000 do: [:i |
>     (1 to: 100000) asArray copy.
> ]
>
> cheers
>
> On Tue, Feb 7, 2017 at 12:42 AM, Ben Coman <btc at openinworld.com> wrote:
>>
>>
>>
>>
>> On Tue, Feb 7, 2017 at 3:05 AM, Ciprian Teodorov <
ciprian.teodorov at gmail.com> wrote:
>>>
>>>
>>> It is strange, to me it seems like the <primitive: 148> fails back to
the smalltalk implementation (http://bit.ly/2kjYdHv).
>>> However when trying to copy a small array like #(1 2 3 4) copy I cannot
step-into the #shallowCopy
>>> nor when I try to copy a big array like   (1 to: 100000) asArray copy
>>>
>>> However, when I do cmd+. while running my bench the debugger stops in
the shallowCopy
>>>
>>> is this a debugger thing ?
>>
>>
>> To check, can you add a transcript output next line after the primitive
pragma?
>> cheers -ben
>>
>>
>>>
>>> or the primitive really fails ? -- which can explain the > 2.6 slowdown
>>>
>>> best regards,
>>> cip
>>>
>>> On Mon, Feb 6, 2017 at 7:36 PM, Ciprian Teodorov <
ciprian.teodorov at gmail.com> wrote:
>>>>
>>>> Thanks guys I'll will try with the latest version and I'll come back
with updates.
>>>>
>>>>
>>>> On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge <tim at rowledge.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> > On 05-02-2017, at 5:08 AM, Clément Bera <bera.clement at gmail.com>
wrote:
>>>>> >
>>>>> > I remember there was a discussion about that somewhere but I can't
find it. I cc vm-dev they may have a clue.
>>>>> >
>>>>> > When copying a pointer object in 64 bits instead of 32 bits, you
need to copy twice many data, so it is going to be slower in any case.
>>>>>
>>>>> Err, not really. Probably. Assuming you have a 64 bit cpu etc, of
course. And dependent on details of the memory architecture outside the cpu
too - after all many systems do not need the memory chip organisation to
match the cpu word size, having multiple lanes, burst read cache loading,
even heterogenous regions (I suspect mostly in embedded systems for that,
but y’never know).
>>>>>
>>>>> Yes, you’re moving twice as much stuff but it will still be a single
read & write per word. After that you’re at the mercy of cache lines, write
buffers, chip specs and not to mention the Hamsters.
>>>>>
>>>>> tim
>>>>> --
>>>>> tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
>>>>> We can rescue a hostage or bankrupt a system. Now, what would you
like us to do?
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dr. Ciprian TEODOROV
>>>> Enseignant-chercheur
>>>> ENSTA Bretagne
>>>>
>>>> tél : 06 08 54 73 48
>>>> mail : ciprian.teodorov at gmail.com
>>>> www.teodorov.ro
>>>
>>>
>>>
>>>
>>> --
>>> Dr. Ciprian TEODOROV
>>> Enseignant-chercheur
>>> ENSTA Bretagne
>>>
>>> tél : 06 08 54 73 48
>>> mail : ciprian.teodorov at gmail.com
>>> www.teodorov.ro
>>>
>>
>>
>
>
>
> --
> Dr. Ciprian TEODOROV
> Enseignant-chercheur
> ENSTA Bretagne
>
> tél : 06 08 54 73 48
> mail : ciprian.teodorov at gmail.com
> www.teodorov.ro
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170207/ead027b8/attachment-0001.html>