Re: [squeak-dev] Re: Faster FileStream experiments

1 Dec 2009


      2009/12/1 Andreas Raab andreas.raab@gmx.de:
...
Hi Nicolas -
I finally got around to looking at this stuff. A couple of comments:

Regardless of what the long-term solution is, I could really, really use

the performance improvements of BufferedFileStream. How can we bring this to
a usable point?
First, the code for read/write I provided was completely bogus, I now
have a better one passing some tests.
Meanwhile, I started to have a look at XTream and played a bit with these ideas:
- separate read/write Stream
- every ReadStream has a source, every WriteStream has a destination.
- have different kinds of Read/Write streams: Collection/File/Buffered/...
- separate IOHandle for handling basic primitives
A big part of XTream is the way to transform Streams using blocks,
especially the most powerfull  transforming: [:inputStream
:outputStream |
Another point is uniform usage of EndOfStream exception (Incomplete).
I started to play with an endOfStreamAction alternative.
Another point is usage of Buffer object: this piece allows
implementing read/write streams acting on same sequence. It also is a
key to performance...
XTream also totally change the API (put, get etc...), but it does not
have to (or maybe it does have to be XTreme to deserve its name).
...

I'm not sure I like the subclassing of StandardFileStream - I would

probably opt to subclass FileStream, adopt the primitives and write the
stuff on top from scratch (this also allows us to keep a filePosition which
is explicitly updated etc).
My very basic approach for short term performance would be:
- intoduce IOHandle in image for handling primitives (only for files
in a first time, and without modifying StandardFileStream, but just
duplicating to be minimal)
- introduce a BufferedReadStream and a BufferedReadWriteStream under
PositionableStream using this IOHandle as source
- keep same external API, only hack a few creation methods...
In a second time we will have to decide what to do with
MultiByteFileStream: it is a performance bottleneck too.
For a start, I would simply wrap around a buffered one...
...

It is highly likely that read performance is dramatically more important

than write performance in most cases. It may be worthwhile to start with
just buffering reads and have writes go unbuffered. This also preserves
current semantics, allowing to gradually phase in buffered writes where
desired (i.e., using #flushAfter: aBlock). This would make
BufferedFileStream instantly useful for our production uses.
In any case, I *really* like the direction. If we can get this into a usable
state it would allow us to replace the sources and changes files with
buffered versions. As a result I would expect measurable speedups in some of
the macro benchmarks and other common operations (Object compileAll for
example).
Concerning macro benchmark, StandardFileStream reading is already
performant in case of pure Random access (upTo: is already buffered).
The gain is for more sequence oriented algorithms. However, chances
are that a loaded package has its source sequentially laid in changes,
condenseChanges also organize source code that way, so Object
compileAll might show a difference eventually.
Nicolas
...
Cheers,
 - Andreas
Nicolas Cellier wrote:
...
2009/11/28 Levente Uzonyi leves@elte.hu:
...
On Sat, 28 Nov 2009, Igor Stasenko wrote:
...
2009/11/28 Eliot Miranda eliot.miranda@gmail.com:
...
On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko siguctua@gmail.com
wrote:
...
2009/11/28 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
>
> 2009/11/27 Eliot Miranda eliot.miranda@gmail.com:
>>
>> An approach I like is to add an endOfStreamValue inst var to Stream
>> and
>> answer its value when at end.  This way nil does not have to be the
>> endOfStreamValue, for example -1 might be much more convenient for a
>> binary
>> stream, and streams can answer nil without confusing their clients.
>>  atEnd
>> can be implemented as
>>    atEnd
>>        ^self peek = self endOfStreamValue
>> You can arrange to make streams raise an end-of-stream exception
>> instead of
>> the endOfStreamValue by using some convention on the contents of
>> endOfStreamValue, such as if it is == to the stream itself (although
>> I
>> note
>> that in the Teleplace image the exception EndOfStrean is defined bit
>> not
>> used).
>>
>> Of course, stream primitives get in the way of adding inst vars to
>> stream
>> classes ;)
>> IMO this is a much more useful scheme than making nil the only
>> endOfStream
>> value.
>>
> Last time I proposed to have an inst var endOfStreamAction was here
>
>
>
> http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html
> .
> Abusing nil value -> nil, I could even let this inst var
> un-initialized and be backward compatible
> (initializing with a ValueHolder on nil would do as well)
>
Nicolas, have you considered introducing methods which allow
graciously handle the end-of-stream while reading?
Something like:
nextIfAtEnd: aBlock
and
next: number ifAtEnd: aBlock
then caller may choose to either write:
char := stream nextIfAtEnd: [nil]
or handle end of stream differently, like leaving the loop:
char := stream nextIfAtEnd: [^ results]
the benefit of such approach that code which reads the stream , don't
needs to additionally
test stream state (atEnd) in iteration between #next sends neither
requires some unique value (like nil) returned by #next
when reaching end of stream.
IMO the block creation is too expensive for streams.  The
defaultHandler
approach for and EndOfStream exception is also too expensive.  The
endOfStreamValue inst var is a nice trade-off between flexibility,
efficiency and simplicity.  You can always write
    [(value := stream next) ~~ stream endOfStreamValue] whileTrue:
       [...do stuff...
hmm, can you elaborate, at what point you see an expensive block
creation?
A block closure is created once at compiling stage, and then passed as
any other object by reading it
from literal frame of method (and as well as , you can use 'stream
In this case the block is copied and initialized every time you send
#nextIfAtEnd:. It is only activated at the end of the stream, so most of
the
time it is just garbage.
Levente
http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-November/122512....
Nicolas
...
...
nextIfAtEnd: nil' , right?). And only if its going to be activated (by
sending #value), a corresponding block context is created in order to
evaluate the block. But it happens only when you reaching the end of
stream.
It is more expensive because of passing extra argument, i.e. use
#nextIfAtEnd: instead of #next , but not because of passing block,
IMO.
...
...
> Nicolas
>
--
Best regards,
Igor Stasenko AKA sig.
--
Best regards,
Igor Stasenko AKA sig.