2009/12/1 Andreas Raab andreas.raab@gmx.de:
Hi Nicolas -
I finally got around to looking at this stuff. A couple of comments:
- Regardless of what the long-term solution is, I could really, really use
the performance improvements of BufferedFileStream. How can we bring this to a usable point?
First, the code for read/write I provided was completely bogus, I now have a better one passing some tests. Meanwhile, I started to have a look at XTream and played a bit with these ideas: - separate read/write Stream - every ReadStream has a source, every WriteStream has a destination. - have different kinds of Read/Write streams: Collection/File/Buffered/... - separate IOHandle for handling basic primitives A big part of XTream is the way to transform Streams using blocks, especially the most powerfull transforming: [:inputStream :outputStream | Another point is uniform usage of EndOfStream exception (Incomplete). I started to play with an endOfStreamAction alternative. Another point is usage of Buffer object: this piece allows implementing read/write streams acting on same sequence. It also is a key to performance...
XTream also totally change the API (put, get etc...), but it does not have to (or maybe it does have to be XTreme to deserve its name).
- I'm not sure I like the subclassing of StandardFileStream - I would
probably opt to subclass FileStream, adopt the primitives and write the stuff on top from scratch (this also allows us to keep a filePosition which is explicitly updated etc).
My very basic approach for short term performance would be: - intoduce IOHandle in image for handling primitives (only for files in a first time, and without modifying StandardFileStream, but just duplicating to be minimal) - introduce a BufferedReadStream and a BufferedReadWriteStream under PositionableStream using this IOHandle as source - keep same external API, only hack a few creation methods...
In a second time we will have to decide what to do with MultiByteFileStream: it is a performance bottleneck too. For a start, I would simply wrap around a buffered one...
- It is highly likely that read performance is dramatically more important
than write performance in most cases. It may be worthwhile to start with just buffering reads and have writes go unbuffered. This also preserves current semantics, allowing to gradually phase in buffered writes where desired (i.e., using #flushAfter: aBlock). This would make BufferedFileStream instantly useful for our production uses.
In any case, I *really* like the direction. If we can get this into a usable state it would allow us to replace the sources and changes files with buffered versions. As a result I would expect measurable speedups in some of the macro benchmarks and other common operations (Object compileAll for example).
Concerning macro benchmark, StandardFileStream reading is already performant in case of pure Random access (upTo: is already buffered). The gain is for more sequence oriented algorithms. However, chances are that a loaded package has its source sequentially laid in changes, condenseChanges also organize source code that way, so Object compileAll might show a difference eventually.
Nicolas
Cheers, - Andreas
Nicolas Cellier wrote:
2009/11/28 Levente Uzonyi leves@elte.hu:
On Sat, 28 Nov 2009, Igor Stasenko wrote:
2009/11/28 Eliot Miranda eliot.miranda@gmail.com:
On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko siguctua@gmail.com wrote:
2009/11/28 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com: > > 2009/11/27 Eliot Miranda eliot.miranda@gmail.com: >> >> An approach I like is to add an endOfStreamValue inst var to Stream >> and >> answer its value when at end. This way nil does not have to be the >> endOfStreamValue, for example -1 might be much more convenient for a >> binary >> stream, and streams can answer nil without confusing their clients. >> atEnd >> can be implemented as >> atEnd >> ^self peek = self endOfStreamValue >> You can arrange to make streams raise an end-of-stream exception >> instead of >> the endOfStreamValue by using some convention on the contents of >> endOfStreamValue, such as if it is == to the stream itself (although >> I >> note >> that in the Teleplace image the exception EndOfStrean is defined bit >> not >> used). >> >> Of course, stream primitives get in the way of adding inst vars to >> stream >> classes ;) >> IMO this is a much more useful scheme than making nil the only >> endOfStream >> value. >> > Last time I proposed to have an inst var endOfStreamAction was here > > > > http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html > . > Abusing nil value -> nil, I could even let this inst var > un-initialized and be backward compatible > (initializing with a ValueHolder on nil would do as well) > Nicolas, have you considered introducing methods which allow graciously handle the end-of-stream while reading? Something like:
nextIfAtEnd: aBlock and next: number ifAtEnd: aBlock
then caller may choose to either write:
char := stream nextIfAtEnd: [nil]
or handle end of stream differently, like leaving the loop:
char := stream nextIfAtEnd: [^ results]
the benefit of such approach that code which reads the stream , don't needs to additionally test stream state (atEnd) in iteration between #next sends neither requires some unique value (like nil) returned by #next when reaching end of stream.
IMO the block creation is too expensive for streams. The defaultHandler approach for and EndOfStream exception is also too expensive. The endOfStreamValue inst var is a nice trade-off between flexibility, efficiency and simplicity. You can always write [(value := stream next) ~~ stream endOfStreamValue] whileTrue: [...do stuff...
hmm, can you elaborate, at what point you see an expensive block creation? A block closure is created once at compiling stage, and then passed as any other object by reading it from literal frame of method (and as well as , you can use 'stream
In this case the block is copied and initialized every time you send #nextIfAtEnd:. It is only activated at the end of the stream, so most of the time it is just garbage.
Levente
http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-November/122512....
Nicolas
nextIfAtEnd: nil' , right?). And only if its going to be activated (by sending #value), a corresponding block context is created in order to evaluate the block. But it happens only when you reaching the end of stream.
It is more expensive because of passing extra argument, i.e. use #nextIfAtEnd: instead of #next , but not because of passing block, IMO.
> Nicolas >
-- Best regards, Igor Stasenko AKA sig.
-- Best regards, Igor Stasenko AKA sig.