EndOfStream unused

Wed Nov 7 05:29:13 UTC 2007

nicolas cellier wrote:
> Common, I could expect better from you.
> This test is totally biased.

How so? I used collections of sizes between 5 and 100 elements. That 
doesn't seem "totally biased" for internal collections to me. If you 
have evidence that the average internal collection is 100000+ elements 
large I would like to see that evidence. Data sizes of 5-100 elements 
seem quite realistic for average situations to me.

> You could as well write
> 
> streamClass := ReadStream. "vs. ReadStreamWithNil"
> data := (1 to: 100000) asArray.
> [1 to: 5 do:[:i|
>     stream := streamClass on: data.
>     [stream next == nil] whileFalse.
> ]] timeToRun.
> 
> And obtain different results (260 original, 278 with EndOfStream)

Knowing your experience with Squeak, these results should strike you as 
very odd. *Five* EndOfStream signals would take 18msecs? 3+ msecs each? 
That on its own should tell you there is something wrong (even with my 
measures EndOfStream signaling took closer to .015 msecs which is still 
a very long time). I'm not sure what you measured but it cerainly got 
nothing to do with EndOfStream signaling.

> Now, I'am not spoiling so much.
> 
> And then:
> 
> streamClass := ReadStreamWithNil.
> data := (1 to: 100000) asArray.
> [1 to: 5 do:[:i|
>     [| aStream |
>     aStream := streamClass on: data.
>     [aStream next. true] whileTrue] on: EndOfStream do: [:exc | exc 
> return].
> ]] timeToRun.
> 
> 255 fine, now I am as fast if not faster than == nil.

But this just making my point. Which was that if you want to use 
EndOfStream efficiently you *will* have to rewrite code all over the 
places. Which is something that I'm certain VW does in places where that 
matters but no code that has ever written for Squeak does that today. 
Instead it uses the "stream next == nil" pattern and that is what I 
compared - a realistic test of what such a change does to the existing 
code base. I was never claiming that you can't possibly write efficient 
code with EndOfStream but that all code in existence today will have to 
be reviewed which is why introducing that pattern today without 
understanding its consequences is just a terribly bad idea.

>>> This is called optimistic programming.
>>
>> And what I do is called "measuring" ;-)
> 
> Sure it's not called unfair biased argumentation?

I don't think so. In my experience most streams operate on collections 
of a few dozen elements so testing data sizes between 5-100 seems 
totally realistic to me (again, if you have evidence to the contrary I'd 
like to see it). For example, if I just run a quick:

(SequenceableCollection allSubInstances collect:[:c| c size]) average

I end up with 45 elements. Now, granted this may not be the average size 
of internal collections used for streams but since most streams are 
transient it's hard to get an actual number for it. But it doesn't make 
me feel like running numbers between 5 and 100 seems "totally biased". 
To the contrary.

> Of course there is a trade-off in optimistic programming. It relies on 
> Exception being rare. Your example put the cursor at opposite.

I don't know. A probability between .2 to .01 is the "opposite of rare"?

> Now, I admit that i can degrade some short stream created in tight loops 
> and optimized with == nil tests. But you have to show some real example, 
> something more serious than above tricks.

Give me an example that you're interested in. And the above is no 
"trick"; it is a micro-benchmark. These are of limited use to compare 
real-world usages but they give an understanding of the baseline of 
behavior one is talking about, for example: what is the actual cost of 
EndOfStream and what is the number of elements at which point the effect 
of EndOfStream becomes negligible. That in itself is valuable information.

>>> In following mail and at http://bugs.squeak.org/view.php?id=6755 I 
>>> already noticed possible exception handling problem that caused 
>>> Marcus to retract this change. This is because EndOfStream were 
>>> declared an Error instead of a Notification.
>>
>> Have you actually *tried* it? Because you may be in for a nasty little 
>> surprise. I'm not sure if this problem is going to bite you or not but 
>> from the code it looks like it should so try it - it is just about 
>> *exactly* the kind of thing that goes wrong for "no good reason" when 
>> you change something as fundamental as this.
>>
> 
> That is wise. Of course it deserve testing! who is saying the contrary?

It's not supposed to be "wise", it is supposed to be illustrating a 
point. The point is that a low-level change like this always comes with 
a *ton* of unforeseen problems. It was very interesting for me to find 
out that adding a progress bar for downloads in Croquet would break SM 
for the very reason that SM handles Exception instead of Error. Yes, 
that is a bug in SM but it's just the kind of thing that you have to 
expect with such low-level changes and which makes them as risky as they 
are.

And just in case that wasn't clear, my point here is that even if you 
*think* you "fixed the problem" with making EndOfStream a notification 
it turns out that in practice the dependencies (and sometimes bugs) are 
much more complex than what you'd assume.

> I would not like myself that some guy do impose such potentially 
> dangerous changes in my image. He has to prove first for sure.
> I do not want to impose it now. I want it to be discussed.

But I'm discussing it ;-) I just happen to question the value of that 
change in particular considering the implications that this change has.

> I passed some of the tests in my image (not all, because they hang my 
> 3.10 without the change).
> No problem so far.
> 
> Why? because I'am not that crazy, I turned the Error into a 
> Notification. Period.
> 
> Now it's still dangerous if a fool is catching Notification or even 
> better Exception!
> Or if some Exception mechanism use streaming themselves...
> Who knows...

Well, but again, that *is* my point. People *do* make mistakes and 
changes like these will trigger a whole new world of pain of previously 
hidden bugs and strangeness.

>> I don't know. I find it hard to imagine an atEnd test that could 
>> possibly be as costly as running the EH machinery. It's certainly 
>> worth measuring before conjecturing about it.
> 
> It is, because a SelectStream doing a select: operation is duplicating 
> the job calling the block once in atEnd test and another in next, AND 
> because I cannot use == nil trick in above example.

Where is that SelectStream? I'd like to have a look at it to see whether 
that complexity is really necessary in there or not.

>> As you'd like. If you are ever interested in having a serious 
>> discussion about the topic I'll be waiting here.
> 
> This was an answer to crazy.

I think you may have missed the smiley at the end of that sentence.

> 4 AM. For a really serious discussion, I now need to rest a little.

Yes, I think that's a good idea. Seriously, think about this a little 
more. I really don't think that my values are *that* biased; I was 
actually surprised about how terribly slow EndOfStream is myself. And 
micro-benchmarks measure what micro-benchmarks measure; but I would 
expect a 2x slowdown with 100 elements to show some real results in 
applications (maybe "only" by 10-20% but Squeak is slow as it stands).

> Agree that your arguments are not all wrong. But the way you push them 
> is more than unfriendly.

I apologize. I got bitten by changes like these (which got added without 
any thought about the implications) in the past and do react a little 
over the top when I see those proposals come up again (it's the "oh, no, 
not *again*" knee-jerk kind of reaction). The proposals *always* get 
made with the best of intentions are are never really thought through in 
terms of what they do to Squeak as a *system*.

This, by the way, is where I miss Dan's leadership most. He is just the 
master of assessing systems implications.

Cheers,
   - Andreas