Trailers speedup (Re: [squeak-dev] The Trunk: Kernel-ul.362.mcz)

Sun Jan 3 22:16:27 UTC 2010

2010/1/3 Levente Uzonyi <leves at elte.hu>:
> On Sun, 3 Jan 2010, Igor Stasenko wrote:
>
>> 2010/1/3 Levente Uzonyi <leves at elte.hu>:
>>>
>>> On Sun, 3 Jan 2010, Igor Stasenko wrote:
>>>
>>>> Levente,
>>>> could you give us a comparison , how much this speeding up the
>>>> source code fetching?
>>>>
>>>> [ Object selectors do: [:each | Object sourceCodeAt: each ] ] timeToRun
>>>>
>>>> Image with no trailers:
>>>> 482  481 478
>>>
>>> This must be a really old image (or non-trunk image).
>>>
>> To be precise, this is trunk image from September 2009, with updates
>> from nov/dec 2009.
>>
>
> Okay, it was unfair to say old. The read buffers were added on 6 December
> 2009.
>
>>>>
>>>> Image with trailers:
>>>> 196  197 206
>>>>
>>>> (i tested against the rather old image, which seems having different
>>>> number of selectors in Object , and
>>>> places where it fetching them, of course).
>>>>
>>>> But it actually shows that your efforts to get speed there is likely
>>>> will be unnoticed, because most of the time
>>>> is consumed by file operations, which working with orders of magnitude
>>>> slower. So, no matter how fast a compiled method trailers will work,
>>>> all such optimizations will be unnoticeable.
>>>
>>> Sure. I used the following benchmark:
>>> [
>>>   SystemNavigation default
>>>      allMethodsWithSourceString: '== 0'
>>>      matchCase: true ] timeToRun
>>>
>>> I don't have the exact numbers (~12 seconds before my changes and ~9.5
>>> after), but the speedup was 1.39x. The reason for this was that 3 trailer
>>> objects were created for one method. Trailer creation took ~30% of the
>>> total
>>> runtime, because of the #asSymbol send.
>>>
>> Here is my measurements, running the above code 2 times in a row for
>> just fired up image:
>> 35576
>> 29567
>>
>> the difference is 6 seconds! And speedup what we observing here is the not
>> related to squeak at all, but to the way, how OS file cache working.
>> When image just loaded, the OS cache is not saturated with .sources and
>> .changes
>> so it takes more time to fill it with chunks, which accessed in random
>> order.
>> Once OS realizing, that you using these files for random access, it
>> optimizing the cache
>> to amortize the access time.
>> On third run  i got  26548 milliseconds.
>> So, i conclude that given benchmark proves nothing because its not
>> representative for testing a
>> trailer speed and its variance is too high (35 - 26 sec) even for
>> running the same code without any changes in smalltalk code.
>>
>> How i can be sure, that speedup you observed was because of your
>> changes, but not because of underlaying OS behavior?
>>
>
> Running the test several times and the use of TimeProfileBrowser helps. (If
> you're using a notebook machine, you may want to evaluate something that
> makes sure that the cpu is running at maximum speed, like Smalltalk
> garbageCollect or 0 tinyBenchmarks)
>
> (1 to: 3) collect: [ :run |
>   [
>      SystemNavigation default
>         allMethodsWithSourceString: '== 0'
>         matchCase: true ] timeToRun ]
> Before speedup*: #(11735 11732 11747)
> Only CompiledMethod changes from speedup**: #(9567 9432 9518)
> Actual: #(8378 8366 8284)
>
> Narrowed benchmark (no file operations involved):
>
> (1 to: 5) collect: [ :run |
>        [ CompiledMethod allInstancesDo: #trailer ] timeToRun ]
> Before speedup**: #(1073 1067 1063 1072 1065)
> Actual: #(92 92 91 95 95)
>
> *All methods reverted in CompiledMethod and CompiledMethodTrailer
> **Only CompiledMethodTrailer >> #method: reverted
>
>>> (Note that file operations are not that slow since the FileStreams are
>>> read
>>> buffered)
>>>
>> so, that's the main difference (between 482 and  206), because of
>> introduction of streams buffering,
>> but not presence or absence of trailers.
>>
>
> Well, trailers made a difference too:
>
> (1 to: 5) collect: [ :run |
>   [ Object selectorsDo: [:each |
>      Object sourceCodeAt: each ] ] timeToRun ]
>
> Before trailers*: #(62 61 63 63 63)
> Before speedup**: #(95 99 99 96 96)
> Actual: #(65 64 62 62 63)
>
> *Using image version 8472 which has read buffers but not trailers.
> **All methods reverted in CompiledMethod and CompiledMethodTrailer
>

They make a difference.
Actually, when coding the stuff, i was more concerned about #endPC
than anything else,
since now it will be calculated much slower, because trailer decoding
the data , even if its not used by sender which might be interested in
getting #endPC only.
To optimize this, i though that maybe it worth to decode data lazily,
while in #method: , calculate only size field.
But i'm not sure, if it worth spending time optimizing it, since i'm
not measured the impact.

My nitpick was about workarounds in using #perform: with pregenerated
selectors. Other changes, which
focused on avoiding generating temporary trailer instance is
definitely worth doing.

>
> Levente
>

-- 
Best regards,
Igor Stasenko AKA sig.