EndOfStream performance cost [was EndOfStream unused]

nicolas cellier ncellier at ifrance.com
Fri Nov 9 01:18:00 UTC 2007

```Hi Andreas

- what performance drop/gain can we expect (trade off)?
- should we have to rewrite anything apart core #next methods?

you can see I completed your tests at
http://bugs.squeak.org/view.php?id=6755

Here I take rationale again with a similar more systematic approach.

PROCEDURE:
=========

Analyzing residual costs with Micro bench should tell us the price:
1) == nil test costs = p (like pastEndTest)
2) atEnd test costs = a
3) exception handling cost = e when not handled and = h when handled

4) stream size is n

We get three main ways to write loops:
5) (P) a pastEnd loop costs (residual only)
n*p   before,
n*p+e after EndOfStream change.
6) (A) a atEnd test loop costs
n*a before,
n*a after.
7) (E) an exception loop
is impossible before,
costs h after.

RESULTS:
=======

On my machine:

[200000 timesRepeat: []] timeToRun. 148
[200000 timesRepeat: [\$a == nil ifFalse: [nil]]] timeToRun. 167

[5000 timesRepeat: []] timeToRun. 3
[5000 timesRepeat: [EndOfStream signal]] timeToRun. 159

[5000 timesRepeat: []] timeToRun. 3
[5000 timesRepeat: [[EndOfStream signal] on: EndOfStream do: [:exc | exc
return]]] timeToRun. 209

s := ReadStream on: (1 to: 1000) asArray.
[200000 timesRepeat: []] timeToRun. 148
[200000 timesRepeat: [s atEnd ifFalse: [nil]]] timeToRun.  223

e=0.03 h=0.04 a=0.00037 p=0.0001

IS IT WORTH REWRITING (A) loops ?
---------------------------------

Obviously, no hurry...
Since no exception is raised, cost is unchanged
It would be worth only if average n > h/a.
(ON MY MACHINE n>120)

WHAT THE CHANGE COST ON (P) loops?
---------------------------------

- Introducing the change costs a penalty e on each pastEnd loop (P)
- This is neglectable as soon as n >> e/p
- But does de-optimize small pastEnd loops as noticed by Andreas.

IS IT WORTH REWRITING (P) loops ?
---------------------------------

- rewrite (P) in (E) for small pastEnd loops is no use
because n*p + e and h are same for small n
- rewrite (P) in (A) would be viable if n*p+e > n*a, n < e/(a-p)
(ON MY MACHINE n<110)
- rewrite (P) in (E) would be worth for n > h/p.
(ON MY MACHINE n>400)
But no hurry, we have seen that penalty is small for such loops.

CONCLUSIONS
-----------

Introducing the change potentially cost a penalty on pastEnd short
stream tight loops.
If necessary, we might re-optimize a little most critical loops with a
(P)->(A) rewrite, but without reaching previous performance.

Suggested by Paolo, my guess however is that:
- Most pastEnd loops are written for files. files are long and
performance drop would be neglectable. A rewrite would benefit but no hurry.
- Most String processing loops use atEnd loop (upper level ReadStream
does not use == nil trick, client code might however uses few).

POSSIBLE PLAN to confirm: trap EndOfStream to trace usage of pastEnd
loops and run maximum activity in squeak...

Introducing this change does not seem worth per se, except maybe for
huge files.
However - TO BE CONFIRMED, it might as well be neutral toward
performances with current VMs.

It would be worth in this case for enabling Stream extensions having
costly atEnd tests and no possible == nil trick.

Apart core #next, no rewrite of basic method is necessary.
No hurry to do massive rewrite of code anyway.

Cheers

Nicolas

```