[Vm-dev] Primitive replaceFrom:to:with:startingAt: in the JIT

Mon Dec 25 21:57:25 UTC 2017

Hi Clémont,

I finally found the time to write some benchmarks.
I compared the output of the script below on sqcogspur64linuxht vm 
201710061559 and 201712221331 avaiable on bintray.

result := { ByteArray. DoubleByteArray. WordArray. DoubleWordArray. ByteString. WideString. FloatArray. Array } collect: [ :class |
 	| collection |
 	Smalltalk garbageCollect.
 	collection := class basicNew: 10000.
 	class -> (#(0 1 2 5 10 20 50 100 200 500 1000 2000 5000 10000) collect: [ :size |
 		| iterations time overhead |
 		iterations := (40000000 // (size max: 1) sqrt) floor.
 		overhead := [ 1 to: iterations do: [ :i | ] ] timeToRun.
 		time := [ 1 to: iterations do: [ :i |
 			collection replaceFrom: 1 to: size with: collection startingAt: 1 ] ] timeToRun.
 		{ size. iterations. time - overhead } ]) ].

I found that the quick paths are probably only implented for bytes and 
pointers collections, because there was no significant difference for 
DoubleByteArray, WordArray, DoubleWordArray, WideString and FloatArray.

For pointers and bytes collections, there's significant speedup when the 
copied portion is small. However, somewhere between 50 and 100 copied 
elements, the copying of bytes collections becomes slower (up to 1.5x @ 
100k elements) with the newer VM.
It's interesting that this doesn't happen to pointers classes. Instead of 
slowdown there's still 1.5x speedup even at 100k elements.

Levente

On Mon, 23 Oct 2017, Clément Bera wrote:

> Hi all,
> For a long time I was willing to add primitive #replaceFrom:to:with:startingAt: in the JIT but did not take time to do it. These days I am showing the JIT to one of my students and as an example of how one would write code in the JIT we implemented this primitive
> together, Spur-only. This is part of commit 2273.
> 
> I implemented quick paths for byte objects and array-like objects only. The rationale behind this is that the most common cases I see in Pharo user benchmarks in the profiler is copy of arrays and byteStrings. Typically some application benchmarks would show 3-5% of
> time spent in copying small things, and switching from the JIT runtime to C runtime is an important part of the cost.
> 
> First evaluation shows the following speed-ups, but I've just done that quickly in my machine:
> 
> Copy of size 0
>     Array 2.85x
>     ByteString 2.7x
> Copy of size 1
>     Array 2.1x
>     ByteString 2x
> Copy of size 3
>     Array 2x
>     ByteString 1.9x
> Copy of size 8
>     Array 1.8x
>     ByteString 1.8x
> Copy of size 64
>    Array 1.1x
>    ByteString 1.1x
> Copy of size 1000
>    Array 1x
>    ByteString 1x
> 
> So I would expect some macro benchmarks to get 1 to 3% percent speed-up. Not as much as I expected but it's there.
> 
> Can someone who is good at benchmarks such as Levente have a look and provide us with a better evaluation of the performance difference ?
> 
> Thanks.
> 
> --
> Clément BéraPharo consortium engineer
> https://clementbera.wordpress.com/
> Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
> 
>