[squeak-dev] Re: Faster FileStream experiments

Tue Dec 1 08:23:31 UTC 2009

2009/12/1 Andreas Raab <andreas.raab at gmx.de>:
> Hi Nicolas -
>
> I finally got around to looking at this stuff. A couple of comments:
>
> * Regardless of what the long-term solution is, I could really, really use
> the performance improvements of BufferedFileStream. How can we bring this to
> a usable point?
>

First, the code for read/write I provided was completely bogus, I now
have a better one passing some tests.
Meanwhile, I started to have a look at XTream and played a bit with these ideas:
- separate read/write Stream
- every ReadStream has a source, every WriteStream has a destination.
- have different kinds of Read/Write streams: Collection/File/Buffered/...
- separate IOHandle for handling basic primitives
A big part of XTream is the way to transform Streams using blocks,
especially the most powerfull  transforming: [:inputStream
:outputStream |
Another point is uniform usage of EndOfStream exception (Incomplete).
I started to play with an endOfStreamAction alternative.
Another point is usage of Buffer object: this piece allows
implementing read/write streams acting on same sequence. It also is a
key to performance...

XTream also totally change the API (put, get etc...), but it does not
have to (or maybe it does have to be XTreme to deserve its name).

> * I'm not sure I like the subclassing of StandardFileStream - I would
> probably opt to subclass FileStream, adopt the primitives and write the
> stuff on top from scratch (this also allows us to keep a filePosition which
> is explicitly updated etc).
>

My very basic approach for short term performance would be:
- intoduce IOHandle in image for handling primitives (only for files
in a first time, and without modifying StandardFileStream, but just
duplicating to be minimal)
- introduce a BufferedReadStream and a BufferedReadWriteStream under
PositionableStream using this IOHandle as source
- keep same external API, only hack a few creation methods...

In a second time we will have to decide what to do with
MultiByteFileStream: it is a performance bottleneck too.
For a start, I would simply wrap around a buffered one...

> * It is highly likely that read performance is dramatically more important
> than write performance in most cases. It may be worthwhile to start with
> just buffering reads and have writes go unbuffered. This also preserves
> current semantics, allowing to gradually phase in buffered writes where
> desired (i.e., using #flushAfter: aBlock). This would make
> BufferedFileStream instantly useful for our production uses.
>
> In any case, I *really* like the direction. If we can get this into a usable
> state it would allow us to replace the sources and changes files with
> buffered versions. As a result I would expect measurable speedups in some of
> the macro benchmarks and other common operations (Object compileAll for
> example).
>

Concerning macro benchmark, StandardFileStream reading is already
performant in case of pure Random access (upTo: is already buffered).
The gain is for more sequence oriented algorithms. However, chances
are that a loaded package has its source sequentially laid in changes,
condenseChanges also organize source code that way, so Object
compileAll might show a difference eventually.

Nicolas

> Cheers,
>  - Andreas
>
> Nicolas Cellier wrote:
>>
>> 2009/11/28 Levente Uzonyi <leves at elte.hu>:
>>>
>>> On Sat, 28 Nov 2009, Igor Stasenko wrote:
>>>
>>>> 2009/11/28 Eliot Miranda <eliot.miranda at gmail.com>:
>>>>>
>>>>> On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko <siguctua at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> 2009/11/28 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>>>>>>>
>>>>>>> 2009/11/27 Eliot Miranda <eliot.miranda at gmail.com>:
>>>>>>>>
>>>>>>>> An approach I like is to add an endOfStreamValue inst var to Stream
>>>>>>>> and
>>>>>>>> answer its value when at end.  This way nil does not have to be the
>>>>>>>> endOfStreamValue, for example -1 might be much more convenient for a
>>>>>>>> binary
>>>>>>>> stream, and streams can answer nil without confusing their clients.
>>>>>>>>  atEnd
>>>>>>>> can be implemented as
>>>>>>>>    atEnd
>>>>>>>>        ^self peek = self endOfStreamValue
>>>>>>>> You can arrange to make streams raise an end-of-stream exception
>>>>>>>> instead of
>>>>>>>> the endOfStreamValue by using some convention on the contents of
>>>>>>>> endOfStreamValue, such as if it is == to the stream itself (although
>>>>>>>> I
>>>>>>>> note
>>>>>>>> that in the Teleplace image the exception EndOfStrean is defined bit
>>>>>>>> not
>>>>>>>> used).
>>>>>>>>
>>>>>>>> Of course, stream primitives get in the way of adding inst vars to
>>>>>>>> stream
>>>>>>>> classes ;)
>>>>>>>> IMO this is a much more useful scheme than making nil the only
>>>>>>>> endOfStream
>>>>>>>> value.
>>>>>>>>
>>>>>>> Last time I proposed to have an inst var endOfStreamAction was here
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html
>>>>>>> .
>>>>>>> Abusing nil value -> nil, I could even let this inst var
>>>>>>> un-initialized and be backward compatible
>>>>>>> (initializing with a ValueHolder on nil would do as well)
>>>>>>>
>>>>>> Nicolas, have you considered introducing methods which allow
>>>>>> graciously handle the end-of-stream while reading?
>>>>>> Something like:
>>>>>>
>>>>>> nextIfAtEnd: aBlock
>>>>>> and
>>>>>> next: number ifAtEnd: aBlock
>>>>>>
>>>>>>
>>>>>> then caller may choose to either write:
>>>>>>
>>>>>> char := stream nextIfAtEnd: [nil]
>>>>>>
>>>>>> or handle end of stream differently, like leaving the loop:
>>>>>>
>>>>>> char := stream nextIfAtEnd: [^ results]
>>>>>>
>>>>>> the benefit of such approach that code which reads the stream , don't
>>>>>> needs to additionally
>>>>>> test stream state (atEnd) in iteration between #next sends neither
>>>>>> requires some unique value (like nil) returned by #next
>>>>>> when reaching end of stream.
>>>>>
>>>>> IMO the block creation is too expensive for streams.  The
>>>>> defaultHandler
>>>>> approach for and EndOfStream exception is also too expensive.  The
>>>>> endOfStreamValue inst var is a nice trade-off between flexibility,
>>>>> efficiency and simplicity.  You can always write
>>>>>     [(value := stream next) ~~ stream endOfStreamValue] whileTrue:
>>>>>        [...do stuff...
>>>>>
>>>> hmm, can you elaborate, at what point you see an expensive block
>>>> creation?
>>>> A block closure is created once at compiling stage, and then passed as
>>>> any other object by reading it
>>>> from literal frame of method (and as well as , you can use 'stream
>>>
>>> In this case the block is copied and initialized every time you send
>>> #nextIfAtEnd:. It is only activated at the end of the stream, so most of
>>> the
>>> time it is just garbage.
>>>
>>> Levente
>>>
>>
>>
>> http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-November/122512.html
>>
>> Nicolas
>>
>>>> nextIfAtEnd: nil' , right?). And only if its going to be activated (by
>>>> sending #value), a corresponding block context is created in order to
>>>> evaluate the block. But it happens only when you reaching the end of
>>>> stream.
>>>>
>>>> It is more expensive because of passing extra argument, i.e. use
>>>> #nextIfAtEnd: instead of #next , but not because of passing block,
>>>> IMO.
>>>>
>>>>>>> Nicolas
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Igor Stasenko AKA sig.
>>>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>>
>>>
>>>
>>>
>>
>>
>
>
>