[squeak-dev] Re: Optimizing RunArray

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue Aug 2 10:11:38 UTC 2011


To complete myself, the fast #collect: already exists and is named
#mapValues: except that it modifies the RunArray in place and also
won't coalesce...

I also gain a huge factor for #collect:as: be defining this method:

RunArray>>fillFrom: aCollection with: aBlock
	"Evaluate aBlock with each of aCollections's elements as the argument.
	Collect the resulting values into self. Answer self."
	| newRuns newValues lastLength lastValue |
	newRuns := (Array new: aCollection size) writeStream.
	newValues := (Array new: aCollection size) writeStream.
	lastLength := 0.
	lastValue := Object new.
	aCollection do: [:each |
		| value |
		value := aBlock value: each.
		lastValue = value
			ifTrue: [lastLength := lastLength + 1]
			ifFalse:
				[lastLength > 0
					ifTrue:
						[newRuns nextPut: lastLength.
						newValues nextPut: lastValue].
				lastLength := 1.
				lastValue := value]].
	lastLength > 0
		ifTrue:
			[newRuns nextPut: lastLength.
			newValues nextPut: lastValue].
	self setRuns: newRuns contents setValues: newValues contents

[ (Array new: 1000) collect: [:e | 4 atRandom] as: RunArray] bench.
BEFORE: '25.1 per second.'
AFTER:  '1,080 per second.'

It's worth a few lines of code.

Nicolas

2011/8/2 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
> I played a bit with RunArray, and found some un-optimized features.
> First, I don't know why RunArray is an ArrayedCollection. It cannot
> #add: but it can #addFirst: and #addLast:.
> It cannot #add:withOccurrences: but it can #addLast:times:. Why
> inventing new selectors for old behaviours ?
> These operations will cost a realloc it the last value is different,
> so the underlying runs/values could better be an OrderedCollection if
> these operations are used often.
> A RunArray cannot remove at all.
> Very weird collection species, I don't like the implementation too much.
>
> Then, #do: loops could be far faster. They rely on ArrayedCollection
> which inlines do: loops with #to:do: and #at:
> But #at: is not that fast. Scanning the runs and counting elements
> would result in a n^2 cost.
> Fortunately there is a cache lastIndex,lastRun,lastOffset to keep a cost n.
> Nonetheless, all the tests cost, and the loop is suboptimal.
> Let use see:
>
> version 1:
> RunArray>>fastDo: aBlock
>        runs with: values do: [:r :v |
>                r timesRepeat: [aBlock value: v]].
>
> | tmp |
> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray.
> {
> [ tmp do: [:e |]] bench.
> [ tmp fastDo: [:e |]] bench.
> }
> #('3,220 per second.' '6,290 per second.')
>
> But timesRepeat: is slow, it is unoptimized by the compiler and costs
> a message send.
> I think we should implement BlockClosure>>repeat: and optimize that
> call in Compiler.
> But let's not do it, and rather inline by ourself:
>
> version 2:
>        runs with: values do: [:r :v |
>                1 to: r do: [:i | aBlock value: v]].
>
> | tmp |
> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray.
> {
> [ tmp do: [:e |]] bench.
> [ tmp do2: [:e |]] bench.
> }
> #('3,070 per second.' '25,500 per second.')
>
> We can even inline the with:do: loop itself:
> version 3:
>        1 to: runs size do: [:i |
>                | r v |
>                v := values at: i.
>                r := runs at: i.
>                [( r := r - 1) >= 0]
>                        whileTrue: [aBlock value: v]].
>
> | tmp |
> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray.
> {
> [ tmp do: [:e |]] bench.
> [ tmp do2: [:e |]] bench.
> }
> #('3,370 per second.' '32,200 per second.')
>
> Now the operation I wanted to use was reverseDo: so I implemented:
> RunArray>>fastReverseDo: aBlock
>        | i |
>        i := runs size.
>        [i > 0]
>                whileTrue:
>                        [ | r v |
>                        v := values at: i.
>                        r := runs at: i.
>                        i := i - 1.
>                        [( r := r - 1) >= 0]
>                                whileTrue: [aBlock value: v]].
> | tmp |
> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray.
> {
> [ tmp reverseDo: [:e |]] bench.
> [ tmp reverseDo2: [:e |]] bench.
> }
>  #('83.9 per second.' '32,600 per second.')
>
> Ouch! The cache is missing a lot of indices, and our loop turns into a n^2 cost.
> I know, premature optimization bla bla bla, but a factor x400 is worth
> some inlining no?
>
> I guess these features are never used.
> By now RunArray is kind of private utility for Text implementation.
> But it could / should be generic.
>
> I also have proposals for count: / select: / collect:. etc...
> It would be to evaluate the block only once per group of values.
> For example
> RunArray>>collect: aBlock
>        "Beware, the block will be evaluated only once per group of values."
>        ^(self class runs: (runs collect: aBlock) contents values: values
> copy) coalesce
> But that's controversial, it would make the RunArray behave
> differently if the block has side effects...
>
> | i tmp tmp2 tmp3 |
> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]).
> i := 0.
> tmp2 := tmp collect: [:e | i := i + 1].
> i := 0.
> tmp3 := (tmp as: RunArray) collect: [:e | i := i + 1].
> tmp2 = tmp3 asArray
> ==> false
>
> Nicolas
>



More information about the Squeak-dev mailing list