[squeak-dev] Collection's #fold: vs #reduce:

Levente Uzonyi leves at elte.hu
Tue Nov 2 22:17:51 UTC 2010


On Tue, 2 Nov 2010, Eliot Miranda wrote:

> On Tue, Nov 2, 2010 at 12:25 PM, Eliot Miranda <eliot.miranda at gmail.com>wrote:
>
>>
>>
>> On Tue, Nov 2, 2010 at 11:50 AM, Eliot Miranda <eliot.miranda at gmail.com>wrote:
>>
>>>

snip

>>> The difference is due to the instantiation "Object new", and instantiation
>>> is slow on the current Cog because I haven't implemented the instantiation
>>> primitives in machine code, and this limitation is hopefully temporary.  So
>>> you might leave it be.  However, changing the code to use {} for teh
>>> instantiation speeds things up markedly:

Good to know. :)

>>>
>>> Collection>>newFold: binaryBlock
>>> | firstValue nextValue |
>>> firstValue := nextValue := {}.
>>>  self do:
>>> [:each |
>>> nextValue := firstValue == nextValue
>>>  ifTrue: [each]
>>> ifFalse: [binaryBlock value: nextValue value: each]].
>>>  ^nextValue == firstValue
>>> ifTrue: [self errorEmptyCollection]
>>> ifFalse: [nextValue]
>>>
>>> | c r |
>>> c := #('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me').
>>> { Time millisecondsToRun: [1 to: 1000000 do: [:i| c fold: [:a :b | a, ' ',
>>> b]]].
>>>  Time millisecondsToRun: [1 to: 1000000 do: [:i| c newFold: [:a :b | a, '
>>> ', b]]].
>>> Time millisecondsToRun: [1 to: 1000000 do: [:i| c reduce: [:a :b | a, ' ',
>>> b]]] }
>>>  #(5076 5008 5052)
>>>
>>>
>>> | c |
>>> c := #('if').
>>> { Time millisecondsToRun: [1 to: 1000000 do: [:i| c fold: [:a :b | a, ' ',
>>> b]]].
>>>    Time millisecondsToRun: [1 to: 1000000 do: [:i| c newFold: [:a :b | a,
>>> ' ', b]]].
>>>    Time millisecondsToRun: [1 to: 1000000 do: [:i| c reduce: [:a :b | a, '
>>> ', b]]] }
>>>  #(247 226 220)
>>>
>>>
>>> (N.B.  the above are rough measurements!!).
>>>
>>> So how about changing fold: to replace Object new with {} and keep it?
>>>
>>
>> And of course one can consider using thisContext instead of Object new or
>> {}, since thisContext has to be instantiated anyway for the closure in
>> fold:, and so results in less memory pressure.  This needs careful
>> measurement; you have the test bed set up for that; my workspace hacks are
>> showing too much variability to distinguish.

You're right, it's about the allocation costs. I came up with the 
following benchmark:

#(1 10 100 1000 10000 100000 1000000) collect: [ :arraySize |
 	(1 to: 5) collect: [ :run |
 		| collection |
 		collection := Array new: arraySize.
 		Smalltalk garbageCollect.
 		[ 1 to: 10000000 // arraySize do: [ :i | collection fold: [ :a :b | ] ] ] timeToRun ] ].

But the noise is too high on my pc to get reliable results for larger 
array sizes. For smaller sizes allocation/gc time gives the difference. 
For larges sizes I can't decide if there's a difference at all. Probably 
there's none.


Levente



More information about the Squeak-dev mailing list