[squeak-dev] Re: Faster FileStream experiments

Tue Dec 1 06:31:23 UTC 2009

Hi Nicolas -

I finally got around to looking at this stuff. A couple of comments:

* Regardless of what the long-term solution is, I could really, really 
use the performance improvements of BufferedFileStream. How can we bring 
this to a usable point?

* I'm not sure I like the subclassing of StandardFileStream - I would 
probably opt to subclass FileStream, adopt the primitives and write the 
stuff on top from scratch (this also allows us to keep a filePosition 
which is explicitly updated etc).

* It is highly likely that read performance is dramatically more 
important than write performance in most cases. It may be worthwhile to 
start with just buffering reads and have writes go unbuffered. This also 
preserves current semantics, allowing to gradually phase in buffered 
writes where desired (i.e., using #flushAfter: aBlock). This would make 
BufferedFileStream instantly useful for our production uses.

In any case, I *really* like the direction. If we can get this into a 
usable state it would allow us to replace the sources and changes files 
with buffered versions. As a result I would expect measurable speedups 
in some of the macro benchmarks and other common operations (Object 
compileAll for example).

Cheers,
   - Andreas

Nicolas Cellier wrote:
> 2009/11/28 Levente Uzonyi <leves at elte.hu>:
>> On Sat, 28 Nov 2009, Igor Stasenko wrote:
>>
>>> 2009/11/28 Eliot Miranda <eliot.miranda at gmail.com>:
>>>>
>>>> On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko <siguctua at gmail.com>
>>>> wrote:
>>>>> 2009/11/28 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>>>>>> 2009/11/27 Eliot Miranda <eliot.miranda at gmail.com>:
>>>>>>> An approach I like is to add an endOfStreamValue inst var to Stream
>>>>>>> and
>>>>>>> answer its value when at end.  This way nil does not have to be the
>>>>>>> endOfStreamValue, for example -1 might be much more convenient for a
>>>>>>> binary
>>>>>>> stream, and streams can answer nil without confusing their clients.
>>>>>>>  atEnd
>>>>>>> can be implemented as
>>>>>>>     atEnd
>>>>>>>         ^self peek = self endOfStreamValue
>>>>>>> You can arrange to make streams raise an end-of-stream exception
>>>>>>> instead of
>>>>>>> the endOfStreamValue by using some convention on the contents of
>>>>>>> endOfStreamValue, such as if it is == to the stream itself (although I
>>>>>>> note
>>>>>>> that in the Teleplace image the exception EndOfStrean is defined bit
>>>>>>> not
>>>>>>> used).
>>>>>>>
>>>>>>> Of course, stream primitives get in the way of adding inst vars to
>>>>>>> stream
>>>>>>> classes ;)
>>>>>>> IMO this is a much more useful scheme than making nil the only
>>>>>>> endOfStream
>>>>>>> value.
>>>>>>>
>>>>>> Last time I proposed to have an inst var endOfStreamAction was here
>>>>>>
>>>>>>
>>>>>> http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html
>>>>>> .
>>>>>> Abusing nil value -> nil, I could even let this inst var
>>>>>> un-initialized and be backward compatible
>>>>>> (initializing with a ValueHolder on nil would do as well)
>>>>>>
>>>>> Nicolas, have you considered introducing methods which allow
>>>>> graciously handle the end-of-stream while reading?
>>>>> Something like:
>>>>>
>>>>> nextIfAtEnd: aBlock
>>>>> and
>>>>> next: number ifAtEnd: aBlock
>>>>>
>>>>>
>>>>> then caller may choose to either write:
>>>>>
>>>>> char := stream nextIfAtEnd: [nil]
>>>>>
>>>>> or handle end of stream differently, like leaving the loop:
>>>>>
>>>>> char := stream nextIfAtEnd: [^ results]
>>>>>
>>>>> the benefit of such approach that code which reads the stream , don't
>>>>> needs to additionally
>>>>> test stream state (atEnd) in iteration between #next sends neither
>>>>> requires some unique value (like nil) returned by #next
>>>>> when reaching end of stream.
>>>> IMO the block creation is too expensive for streams.  The defaultHandler
>>>> approach for and EndOfStream exception is also too expensive.  The
>>>> endOfStreamValue inst var is a nice trade-off between flexibility,
>>>> efficiency and simplicity.  You can always write
>>>>      [(value := stream next) ~~ stream endOfStreamValue] whileTrue:
>>>>         [...do stuff...
>>>>
>>> hmm, can you elaborate, at what point you see an expensive block creation?
>>> A block closure is created once at compiling stage, and then passed as
>>> any other object by reading it
>>> from literal frame of method (and as well as , you can use 'stream
>> In this case the block is copied and initialized every time you send
>> #nextIfAtEnd:. It is only activated at the end of the stream, so most of the
>> time it is just garbage.
>>
>> Levente
>>
> 
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-November/122512.html
> 
> Nicolas
> 
>>> nextIfAtEnd: nil' , right?). And only if its going to be activated (by
>>> sending #value), a corresponding block context is created in order to
>>> evaluate the block. But it happens only when you reaching the end of
>>> stream.
>>>
>>> It is more expensive because of passing extra argument, i.e. use
>>> #nextIfAtEnd: instead of #next , but not because of passing block,
>>> IMO.
>>>
>>>>>> Nicolas
>>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Igor Stasenko AKA sig.
>>>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>
>>
>>
> 
>