[squeak-dev] sorting array-like collections

Eliot Miranda eliot.miranda at gmail.com
Thu Oct 20 18:23:21 UTC 2016


Hi All,

     sorted: for Array is OK, but for Array-like subclasses of
ArrayedCollection it's suboptimal.  For example:


32-bit Mac OS X 2.2GHz Core i7 MBP

[| samples r samplesArray |
samples := Bitmap new: 256 * 1024.
r := Random new.
1 to: samples size do:
[:i|
samples at: i put: (r next * (1 << 32)) asInteger - 1].
samplesArray := samples asArray.
(1 to: 3) collect: [:i| {[samples sorted] timeToRun. [samplesArray sorted]
timeToRun}] #(#(266 243) #(293 239) #(413 247))]

64-bit Mac OS X 2.2GHz Core i7 MBP
| samples r samplesArray |
samples := Bitmap new: 256 * 1024.
r := Random new.
1 to: samples size do:
[:i|
samples at: i put: (r next * (1 << 32)) asInteger - 1].
samplesArray := samples asArray.
(1 to: 3) collect: [:i| {[samples sorted] timeToRun. [samplesArray sorted]
timeToRun}] #(#(143 134) #(300 133) #(141 137))

My point is about the difference in sorting speed for Array vs Bitmap, but
before that note that 64-bit is faster to sort because there are fewer
32-bit to LargePositiveInteger conversions, but the difference between the
Bitmap vs Array sort is larger on 64-bit because the 64-bit generation
scavenger has a performance problem (which is why I'm looking at the above,
fixing the VM profiler to work for 64-bit address spaces).


What's going on here is that Array's sorted: is

Array>>#sorted: aSortBlockOrNil
^self copy sort: aSortBlockOrNil

but for other subclasses of ArrayedCollection it is

Collection>>#sorted: aSortBlockOrNil
^self asArray sort: aSortBlockOrNil

so there is a slow copy of the Bitmap or ByteArray to an Array with
equivalent elements, but a shallow copy (which is fast) in Array>>#sorted:
to sort an Array.

When you look at the timings above you can see that when a GC occurs due to
the slow copy it takes much longer to sort the Bitmap, and that it always
takes longer to sort the Bitmap than the Array.

For many subclasses of ArrayedCollection, essentially everything other than
the RunArrays and CompiledMethod, which are specialised, there is no need
to convert to an Array, and a direct access to a shallow copy would work
just as well.

So shouldn't we instead implement this as

ArrayedCollection>>#sorted: aSortBlockOrNil
^self copy sort: aSortBlockOrNil

and in subclasses that are exceptions override as e.g.
RunArray>>#sorted: aSortBlockOrNil
^self asArray sort: aSortBlockOrNil

SparseLargeArray>>#sorted: aSortBlockOrNil
    something much cleverer....


?
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20161020/8ec3847b/attachment-0001.htm


More information about the Squeak-dev mailing list