[Pharo-project] [squeak-dev] Re: Optimizing RunArray

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue Aug 2 21:08:55 UTC 2011


I have tried again with huge text (my 45 MByte change log).
Main contributor seems to be Text composition.
A marginal gain (2 to 3% when resizing the window) is to remove the
indirection in TextStopConditions, and replace instance variable stops
with direct integer slots (variableSubclass)

No guaranty the fileIn of attachment will work, modifying Text
processing is dangerous...

Nicolas

2011/8/2 Stéphane Ducasse <stephane.ducasse at inria.fr>:
> Yes I was wondering about the same (for athens too :))
>
> Stef
>
> On Aug 2, 2011, at 2:16 PM, Igor Stasenko wrote:
>
>> Does it impacts the text rendering/processing speed?
>>
>>
>> On 2 August 2011 12:11, Nicolas Cellier
>> <nicolas.cellier.aka.nice at gmail.com> wrote:
>>> To complete myself, the fast #collect: already exists and is named
>>> #mapValues: except that it modifies the RunArray in place and also
>>> won't coalesce...
>>>
>>> I also gain a huge factor for #collect:as: be defining this method:
>>>
>>> RunArray>>fillFrom: aCollection with: aBlock
>>>        "Evaluate aBlock with each of aCollections's elements as the argument.
>>>        Collect the resulting values into self. Answer self."
>>>        | newRuns newValues lastLength lastValue |
>>>        newRuns := (Array new: aCollection size) writeStream.
>>>        newValues := (Array new: aCollection size) writeStream.
>>>        lastLength := 0.
>>>        lastValue := Object new.
>>>        aCollection do: [:each |
>>>                | value |
>>>                value := aBlock value: each.
>>>                lastValue = value
>>>                        ifTrue: [lastLength := lastLength + 1]
>>>                        ifFalse:
>>>                                [lastLength > 0
>>>                                        ifTrue:
>>>                                                [newRuns nextPut: lastLength.
>>>                                                newValues nextPut: lastValue].
>>>                                lastLength := 1.
>>>                                lastValue := value]].
>>>        lastLength > 0
>>>                ifTrue:
>>>                        [newRuns nextPut: lastLength.
>>>                        newValues nextPut: lastValue].
>>>        self setRuns: newRuns contents setValues: newValues contents
>>>
>>> [ (Array new: 1000) collect: [:e | 4 atRandom] as: RunArray] bench.
>>> BEFORE: '25.1 per second.'
>>> AFTER:  '1,080 per second.'
>>>
>>> It's worth a few lines of code.
>>>
>>> Nicolas
>>>
>>> 2011/8/2 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>>>> I played a bit with RunArray, and found some un-optimized features.
>>>> First, I don't know why RunArray is an ArrayedCollection. It cannot
>>>> #add: but it can #addFirst: and #addLast:.
>>>> It cannot #add:withOccurrences: but it can #addLast:times:. Why
>>>> inventing new selectors for old behaviours ?
>>>> These operations will cost a realloc it the last value is different,
>>>> so the underlying runs/values could better be an OrderedCollection if
>>>> these operations are used often.
>>>> A RunArray cannot remove at all.
>>>> Very weird collection species, I don't like the implementation too much.
>>>>
>>>> Then, #do: loops could be far faster. They rely on ArrayedCollection
>>>> which inlines do: loops with #to:do: and #at:
>>>> But #at: is not that fast. Scanning the runs and counting elements
>>>> would result in a n^2 cost.
>>>> Fortunately there is a cache lastIndex,lastRun,lastOffset to keep a cost n.
>>>> Nonetheless, all the tests cost, and the loop is suboptimal.
>>>> Let use see:
>>>>
>>>> version 1:
>>>> RunArray>>fastDo: aBlock
>>>>        runs with: values do: [:r :v |
>>>>                r timesRepeat: [aBlock value: v]].
>>>>
>>>> | tmp |
>>>> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray.
>>>> {
>>>> [ tmp do: [:e |]] bench.
>>>> [ tmp fastDo: [:e |]] bench.
>>>> }
>>>> #('3,220 per second.' '6,290 per second.')
>>>>
>>>> But timesRepeat: is slow, it is unoptimized by the compiler and costs
>>>> a message send.
>>>> I think we should implement BlockClosure>>repeat: and optimize that
>>>> call in Compiler.
>>>> But let's not do it, and rather inline by ourself:
>>>>
>>>> version 2:
>>>>        runs with: values do: [:r :v |
>>>>                1 to: r do: [:i | aBlock value: v]].
>>>>
>>>> | tmp |
>>>> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray.
>>>> {
>>>> [ tmp do: [:e |]] bench.
>>>> [ tmp do2: [:e |]] bench.
>>>> }
>>>> #('3,070 per second.' '25,500 per second.')
>>>>
>>>> We can even inline the with:do: loop itself:
>>>> version 3:
>>>>        1 to: runs size do: [:i |
>>>>                | r v |
>>>>                v := values at: i.
>>>>                r := runs at: i.
>>>>                [( r := r - 1) >= 0]
>>>>                        whileTrue: [aBlock value: v]].
>>>>
>>>> | tmp |
>>>> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray.
>>>> {
>>>> [ tmp do: [:e |]] bench.
>>>> [ tmp do2: [:e |]] bench.
>>>> }
>>>> #('3,370 per second.' '32,200 per second.')
>>>>
>>>> Now the operation I wanted to use was reverseDo: so I implemented:
>>>> RunArray>>fastReverseDo: aBlock
>>>>        | i |
>>>>        i := runs size.
>>>>        [i > 0]
>>>>                whileTrue:
>>>>                        [ | r v |
>>>>                        v := values at: i.
>>>>                        r := runs at: i.
>>>>                        i := i - 1.
>>>>                        [( r := r - 1) >= 0]
>>>>                                whileTrue: [aBlock value: v]].
>>>> | tmp |
>>>> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]) as: RunArray.
>>>> {
>>>> [ tmp reverseDo: [:e |]] bench.
>>>> [ tmp reverseDo2: [:e |]] bench.
>>>> }
>>>>  #('83.9 per second.' '32,600 per second.')
>>>>
>>>> Ouch! The cache is missing a lot of indices, and our loop turns into a n^2 cost.
>>>> I know, premature optimization bla bla bla, but a factor x400 is worth
>>>> some inlining no?
>>>>
>>>> I guess these features are never used.
>>>> By now RunArray is kind of private utility for Text implementation.
>>>> But it could / should be generic.
>>>>
>>>> I also have proposals for count: / select: / collect:. etc...
>>>> It would be to evaluate the block only once per group of values.
>>>> For example
>>>> RunArray>>collect: aBlock
>>>>        "Beware, the block will be evaluated only once per group of values."
>>>>        ^(self class runs: (runs collect: aBlock) contents values: values
>>>> copy) coalesce
>>>> But that's controversial, it would make the RunArray behave
>>>> differently if the block has side effects...
>>>>
>>>> | i tmp tmp2 tmp3 |
>>>> tmp := ((Array new: 1000) collect: [:e | 4 atRandom]).
>>>> i := 0.
>>>> tmp2 := tmp collect: [:e | i := i + 1].
>>>> i := 0.
>>>> tmp3 := (tmp as: RunArray) collect: [:e | i := i + 1].
>>>> tmp2 = tmp3 asArray
>>>> ==> false
>>>>
>>>> Nicolas
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TextStopConditions.st
Type: application/octet-stream
Size: 5981 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20110802/8be96f8e/TextStopConditions.obj


More information about the Squeak-dev mailing list