Faster FileStream experiments

List overview All Threads
Download

newer

older

...

progress

Nicolas Cellier

18 Nov 2009 18 Nov '09

12:10 p.m.

I just gave a try to the BufferedFileStream. As usual, code is MIT. Implementation is rough, readOnly, partial (no support for basicNext crap & al), untested (certainly has bugs). Early timing experiments have shown a 5x to 7x speed up on [stream nextLine] and [stream next] micro benchmarks See class comment of attachment

Reminder: This bench is versus StandardFileStream. StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse! This still let some more room...

Integrating and testing a read/write version is a lot harder than this experiment, but we should really do it.

Nicolas

Attachments:

BufferedFileStream.st (application/octet-stream — 9.6 KB)

Show replies by date

Igor Stasenko

18 Nov 18 Nov

12:58 p.m.

Hello Nicolas, thanks for taking a time implementing this idea.

Since you are going to introduce something more clever than simple-minded primitive based file operations, i think its worth to think about creating a separate classes for buffering/caching. Lets call it readStrategy, or writeStrategy or cacheStrategy. The idea is to redirect all read/write/seek operations to special layer, which depending on implementation could choose, if given operation will be just dumb primitive call, or something more clever, like read-ahead etc. So, then all streams (not only file stream) could be created using choosen strategy depending on user's will.

About BufferedFileStream implementation. There are some room for improvement: cache should remember own starting position + size then at #skip: you simply doing self primSetPosition: fileID to: filePosition \ bufferSize. but not touching the buffer, because you can't predict what next operation is follows (it can be another #skip: or truncate or close), which makes your read-ahead redundant.

The cache should be refreshed only on direct read request, when some data which needs to be read is ouside the range covered by cache. Let me illustrate the case, which shows the suboptimal #skip: behavior:

........>........[..........<..........]........

Here, [ ] is enclosed cached data, and > is file position, after #skip: send. Then caller wants to read bytes up to < marker. In your case, #skip: will refresh cache, causing part of data which was already in buffer to be re-read again, while it is possible to reuse already cached data, and read only bytes between > and [ , and rest can be delivered from cache. Also, since after read request, a file pointer will point at < marker, we are still inside a cache, and don't need to refresh it.

2009/11/18 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...

I just gave a try to the BufferedFileStream. As usual, code is MIT. Implementation is rough, readOnly, partial (no support for basicNext crap & al), untested (certainly has bugs). Early timing experiments have shown a 5x to 7x speed up on [stream nextLine] and [stream next] micro benchmarks See class comment of attachment

Reminder: This bench is versus StandardFileStream. StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse! This still let some more room...

Integrating and testing a read/write version is a lot harder than this experiment, but we should really do it.

Nicolas

-- Best regards, Igor Stasenko AKA sig.

Nicolas Cellier

2:55 p.m.

2009/11/18 Igor Stasenko siguctua@gmail.com:

...

Hello Nicolas, thanks for taking a time implementing this idea.

Since you are going to introduce something more clever than simple-minded primitive based file operations, i think its worth to think about creating a separate classes for buffering/caching. Lets call it readStrategy, or writeStrategy or cacheStrategy. The idea is to redirect all read/write/seek operations to special layer, which depending on implementation could choose, if given operation will be just dumb primitive call, or something more clever, like read-ahead etc. So, then all streams (not only file stream) could be created using choosen strategy depending on user's will.

Yes, delegating is a very good idea. Quite sure other smalltalks do that already (I did not want to be tainted, so just kept away, reinventing my own wheel). This trial was a minimal proof of concept, it cannot decently pretend being a clean rewrite.

...

About BufferedFileStream implementation. There are some room for improvement: cache should remember own starting position + size then at #skip: you simply doing self primSetPosition: fileID to: filePosition \ bufferSize. but not touching the buffer, because you can't predict what next operation is follows (it can be another #skip: or truncate or close), which makes your read-ahead redundant.

The cache should be refreshed only on direct read request, when some data which needs to be read is ouside the range covered by cache. Let me illustrate the case, which shows the suboptimal #skip: behavior:

........>........[..........<..........]........

Here, [ ] is enclosed cached data, and > is file position, after #skip: send. Then caller wants to read bytes up to < marker. In your case, #skip: will refresh cache, causing part of data which was already in buffer to be re-read again, while it is possible to reuse already cached data, and read only bytes between > and [ , and rest can be delivered from cache. Also, since after read request, a file pointer will point at < marker, we are still inside a cache, and don't need to refresh it.

Agree, my current buffer implementation is not lazy enough. It does read ahead before knowing if really necessary :(

If I understand it, you would avoid throwing the buffer away until you are sure it won't be reused. Not sure if the use cases are worth the subtle complications. Two consecutive skip: should be rare... Anyway, all these tricks should better be hidden in a private policy Object indeed, otherwise future subclasses which would inevitably flourish under BufferedFileStream (the Squeak entropy) might well break this masterpiece :)

Cheers

Nicolas

...

2009/11/18 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...
I just gave a try to the BufferedFileStream. As usual, code is MIT. Implementation is rough, readOnly, partial (no support for basicNext crap & al), untested (certainly has bugs). Early timing experiments have shown a 5x to 7x speed up on [stream nextLine] and [stream next] micro benchmarks See class comment of attachment

Reminder: This bench is versus StandardFileStream. StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse! This still let some more room...

Integrating and testing a read/write version is a lot harder than this experiment, but we should really do it.

Nicolas

-- Best regards, Igor Stasenko AKA sig.

Igor Stasenko

5:01 p.m.

2009/11/18 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...

2009/11/18 Igor Stasenko siguctua@gmail.com:

...
Hello Nicolas, thanks for taking a time implementing this idea.

Since you are going to introduce something more clever than simple-minded primitive based file operations, i think its worth to think about creating a separate classes for buffering/caching. Lets call it readStrategy, or writeStrategy or cacheStrategy. The idea is to redirect all read/write/seek operations to special layer, which depending on implementation could choose, if given operation will be just dumb primitive call, or something more clever, like read-ahead etc. So, then all streams (not only file stream) could be created using choosen strategy depending on user's will.

Yes, delegating is a very good idea. Quite sure other smalltalks do that already (I did not want to be tainted, so just kept away, reinventing my own wheel). This trial was a minimal proof of concept, it cannot decently pretend being a clean rewrite.

but it shown us the potential for improvements. Seriously, 5x-7x speedup is not something which we can just forget and throw away.

...

...
About BufferedFileStream implementation. There are some room for improvement: cache should remember own starting position + size then at #skip: you simply doing self primSetPosition: fileID to: filePosition \ bufferSize. but not touching the buffer, because you can't predict what next operation is follows (it can be another #skip: or truncate or close), which makes your read-ahead redundant.

The cache should be refreshed only on direct read request, when some data which needs to be read is ouside the range covered by cache. Let me illustrate the case, which shows the suboptimal #skip: behavior:

........>........[..........<..........]........

Here, [ ] is enclosed cached data, and > is file position, after #skip: send. Then caller wants to read bytes up to < marker. In your case, #skip: will refresh cache, causing part of data which was already in buffer to be re-read again, while it is possible to reuse already cached data, and read only bytes between > and [ , and rest can be delivered from cache. Also, since after read request, a file pointer will point at < marker, we are still inside a cache, and don't need to refresh it.

Agree, my current buffer implementation is not lazy enough. It does read ahead before knowing if really necessary :(

If I understand it, you would avoid throwing the buffer away until you are sure it won't be reused. Not sure if the use cases are worth the subtle complications. Two consecutive skip: should be rare...

yes, it is rare and quite unlikely, but you catched my intent clearly: - do not throw away the buffer unless its deem necessary.

Lets keep in mind that any memory operation is orders of magnitude faster than disk operations, moreover, a filesystem could be remotely mounted drive which adds even more latency for all file-based operations. So, fighting with it using cache, is good strategy.

...

Anyway, all these tricks should better be hidden in a private policy Object indeed, otherwise future subclasses which would inevitably flourish under BufferedFileStream (the Squeak entropy) might well break this masterpiece :)

Right. A separate layer is for making a clean room for experiments, without need of rewriting a whole stream class hierarchy, especially subclasses, where things start exploding exponentially. There should be a very thin layer, based on most simple operations: read, write, seek , while rest of stream interface is based on that. So, if we can identify this thin layer and make it pluggable, then we can be sure that at least some part of stream library can be easily customized, and if this part works well, so we can be sure streams in good shape, without need of visiting and testing numerous methods in multiple (sub)classes, which is quite messy.

...

Cheers

Nicolas

-- Best regards, Igor Stasenko AKA sig.

Eliot Miranda

7:15 p.m.

On Wed, Nov 18, 2009 at 3:10 AM, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:

...

I just gave a try to the BufferedFileStream. As usual, code is MIT. Implementation is rough, readOnly, partial (no support for basicNext crap & al), untested (certainly has bugs). Early timing experiments have shown a 5x to 7x speed up on [stream nextLine] and [stream next] micro benchmarks See class comment of attachment

Reminder: This bench is versus StandardFileStream. StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse! This still let some more room...

Integrating and testing a read/write version is a lot harder than this experiment, but we should really do it.

Just want to wish you every encouragement! This is *really* useful work.

...

Nicolas

Nicolas Cellier

26 Nov 26 Nov

11:48 p.m.

2009/11/18 Eliot Miranda eliot.miranda@gmail.com:

...

On Wed, Nov 18, 2009 at 3:10 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:

...
I just gave a try to the BufferedFileStream. As usual, code is MIT. Implementation is rough, readOnly, partial (no support for basicNext crap & al), untested (certainly has bugs). Early timing experiments have shown a 5x to 7x speed up on [stream nextLine] and [stream next] micro benchmarks See class comment of attachment

Reminder: This bench is versus StandardFileStream. StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse! This still let some more room...

Integrating and testing a read/write version is a lot harder than this experiment, but we should really do it.

Just want to wish you every encouragement! This is *really* useful work.

I just throw an un-tested minimal read/write version of BufferedFileStream. Beware, I just wrote from scratch and did not even run one single method since the read/write refactoring... So far, I did rather spend my spare time in commenting the implementation (see class comment too)... If some good souls want to analyze/try it. It should be reasonably optimized for readOnly and random read/write cases. For append only, that might not be optimal due to useless attempts to read past end, but that should not cost that much. For read/append, there is probably room for more efficiency too, but major improvment vs StandardFileStream should already show up. Not sure we really need to introduce these optimizations.

The path to a cleaner/faster stream library is longer than just this little step. Beside testing, we'd have to refactor the hierarchy, insulate all instance variables, and delegate as much as possible as Igor suggested. We'd better continue on the cleaning path and not just add another FileStream subclass complexifying a bit more an unecessarily complex library.

Nicolas

...

...
Nicolas

Colin Putney

27 Nov 27 Nov

5:56 a.m.

On 26-Nov-09, at 2:48 PM, Nicolas Cellier wrote:

...

The path to a cleaner/faster stream library is longer than just this little step. Beside testing, we'd have to refactor the hierarchy, insulate all instance variables, and delegate as much as possible as Igor suggested. We'd better continue on the cleaning path and not just add another FileStream subclass complexifying a bit more an unecessarily complex library.

I've been thinking about this too. For Filesystem, I've only implemented very basic stream functionality so far. But I do intend to develop its stream functionality further, and to go in a very different direction from the existing design. Some design elements:

- Using handles to decouple the streams from the storage they're operating on. The same stream class should be able to read or write to collections, sockets, files etc.

- Separating ReadStream from WriteStream. I find code that both reads and writes to a particular stream to be very rare in practice, and in cases where it does happen, reading and writing are separate activities and using separate streams wouldn't introduce problems. On the other hand, a lot of the complexity in the existing hierarchy stems from the mingling of read and write functionality.

- Simplified protocols. The existing stream classes have accumulated a lot of cruft that should be implemented as objects use streams rather than being streams themselves. Examples include fileIn, fileOut, RefrenceStream etc.

- Composition rather than inheritance. As I go about implementing string encoding, buffering, compression etc. I plan to enable the creation of stream pipelines to provide combinations of functionality. Instead of implementing BufferedUtf8DelfateFilestream, I want to create a sequence of streams like this:

WriteStream -> Utf8Encoder-> DeflateCompressor -> Buffer -> Handle

- Grow the new streams parallel to the existing ones. Rather than trying to maintain backwards compatibility, leave the old streams in place and continue to improve them while the new ones are being developed. Migration to the new streams can happen gradually. If the new streams don't attract any users, obviously I'm on the wrong track. :-)

So I've been watching your cleanup efforts with interest, particularly the buffering stuff. Keep it up!

Colin

Nicolas Cellier

9:32 a.m.

2009/11/27 Colin Putney cputney@wiresong.ca:

...

On 26-Nov-09, at 2:48 PM, Nicolas Cellier wrote:

...
The path to a cleaner/faster stream library is longer than just this little step. Beside testing, we'd have to refactor the hierarchy, insulate all instance variables, and delegate as much as possible as Igor suggested. We'd better continue on the cleaning path and not just add another FileStream subclass complexifying a bit more an unecessarily complex library.

I've been thinking about this too. For Filesystem, I've only implemented very basic stream functionality so far. But I do intend to develop its stream functionality further, and to go in a very different direction from the existing design. Some design elements:

Using handles to decouple the streams from the storage they're operating

on. The same stream class should be able to read or write to collections, sockets, files etc.

Separating ReadStream from WriteStream. I find code that both reads and

writes to a particular stream to be very rare in practice, and in cases where it does happen, reading and writing are separate activities and using separate streams wouldn't introduce problems. On the other hand, a lot of the complexity in the existing hierarchy stems from the mingling of read and write functionality.

Yes, mostly a read-append stream usage for change log... However, a buffered implementation will be difficult with separate read/write buffers in the rare case we need read/write capabilities: writing might trash the read buffer, so they are not independent.

...

Simplified protocols. The existing stream classes have accumulated a lot

of cruft that should be implemented as objects use streams rather than being streams themselves. Examples include fileIn, fileOut, RefrenceStream etc.

Yes, packaging and modularization of core...

...

Composition rather than inheritance. As I go about implementing string

encoding, buffering, compression etc. I plan to enable the creation of stream pipelines to provide combinations of functionality. Instead of implementing BufferedUtf8DelfateFilestream, I want to create a sequence of streams like this:

WriteStream -> Utf8Encoder-> DeflateCompressor -> Buffer -> Handle

Agree again

...

Grow the new streams parallel to the existing ones. Rather than trying to

maintain backwards compatibility, leave the old streams in place and continue to improve them while the new ones are being developed. Migration to the new streams can happen gradually. If the new streams don't attract any users, obviously I'm on the wrong track. :-)

So I've been watching your cleanup efforts with interest, particularly the buffering stuff. Keep it up!

Obviously, it's just a piece of a larger puzzle.

...

Colin

Nicolas

Igor Stasenko

3:24 p.m.

2009/11/27 Colin Putney cputney@wiresong.ca:

...

On 26-Nov-09, at 2:48 PM, Nicolas Cellier wrote:

...
The path to a cleaner/faster stream library is longer than just this little step. Beside testing, we'd have to refactor the hierarchy, insulate all instance variables, and delegate as much as possible as Igor suggested. We'd better continue on the cleaning path and not just add another FileStream subclass complexifying a bit more an unecessarily complex library.

I've been thinking about this too. For Filesystem, I've only implemented very basic stream functionality so far. But I do intend to develop its stream functionality further, and to go in a very different direction from the existing design. Some design elements:

Using handles to decouple the streams from the storage they're operating

on. The same stream class should be able to read or write to collections, sockets, files etc.

Separating ReadStream from WriteStream. I find code that both reads and

writes to a particular stream to be very rare in practice, and in cases where it does happen, reading and writing are separate activities and using separate streams wouldn't introduce problems. On the other hand, a lot of the complexity in the existing hierarchy stems from the mingling of read and write functionality.

Simplified protocols. The existing stream classes have accumulated a lot

of cruft that should be implemented as objects use streams rather than being streams themselves. Examples include fileIn, fileOut, RefrenceStream etc.

Composition rather than inheritance. As I go about implementing string

encoding, buffering, compression etc. I plan to enable the creation of stream pipelines to provide combinations of functionality. Instead of implementing BufferedUtf8DelfateFilestream, I want to create a sequence of streams like this:

WriteStream -> Utf8Encoder-> DeflateCompressor -> Buffer -> Handle

+100. Just yesterday i thought about same design principle: composition. I call it StreamAdaptor. It should carry a minimal set of methods, which providing a basic set of operations (read/write/seek etc) and also should support a pipelining in same way as you illustrated above:

Lets say, initially we created a stream which works with file: Stream -> FileAdaptor

then we want it to be buffered: stream adaptor: (stream adaptor beBuffered)

Stream -> BufferAdaptor -> FileAdaptor

then we want it to be compressed:

stream adaptor: (ZipAdaptor on: stream adaptor)

Stream -> DeflateCompressor -> BufferAdaptor -> FileAdaptor

and so on..

It is easy to see, that if we may want to create same structure for socket connection, all we need is to just use a socket adaptor in the chain, while rest don't requires any modifications.

...

Grow the new streams parallel to the existing ones. Rather than trying to

maintain backwards compatibility, leave the old streams in place and continue to improve them while the new ones are being developed. Migration to the new streams can happen gradually. If the new streams don't attract any users, obviously I'm on the wrong track. :-)

So I've been watching your cleanup efforts with interest, particularly the buffering stuff. Keep it up!

Colin

-- Best regards, Igor Stasenko AKA sig.

David T. Lewis

5:03 p.m.

On Thu, Nov 26, 2009 at 08:56:08PM -0800, Colin Putney wrote:

...

I've been thinking about this too. For Filesystem, I've only implemented very basic stream functionality so far. But I do intend to develop its stream functionality further, and to go in a very different direction from the existing design. Some design elements:

Using handles to decouple the streams from the storage they're

operating on. The same stream class should be able to read or write to collections, sockets, files etc.

I implemented IOHandle for this, see http://wiki.squeak.org/squeak/996. I have not maintained it since about 2003, but the idea is straightforward. My purpose at that time was to :

* Separate the representation of external IO channels from the represention of streams and communication protocols. * Provide a uniform representation of IO channels similar to the unix notion of treating everything as a 'file'. * Simplify future refactoring of Socket and FileStream. * Provide a place for handling asynchronous IO events. Refer to the aio handling in the unix VM. Files, Sockets, and AsyncFiles could (should) use a common IO event handling mechanism (aio event signaling a Smalltalk Semaphore).

Since that time I added aio event handling for file (AioPlugin, see http://wiki.squeak.org/squeak/3384), which is a layer on top of Ian's aio event handling in the unix and OS X VMx that which is mainly useful for handling unix pipes. But I still think that a more unified view of "handles for IO channels" is a good idea. The completely separate representation of files and sockets in Squeak still feels wrong to me, maybe just because I am accustomed to unix systems.

Dave

Colin Putney

5:41 p.m.

On 27-Nov-09, at 8:03 AM, David T. Lewis wrote:

...

I implemented IOHandle for this, see http://wiki.squeak.org/squeak/ 996. I have not maintained it since about 2003, but the idea is straightforward.

Yes. I looked into IOHandle when implementing Filesystem, but decided to go with a new (simpler, but limited) implementation that would let me explore the requirements for the stream architecture I had in mind.

...

My purpose at that time was to :

Separate the representation of external IO channels from the

represention of streams and communication protocols.

Provide a uniform representation of IO channels similar to the

unix notion of treating everything as a 'file'.

Simplify future refactoring of Socket and FileStream.

Provide a place for handling asynchronous IO events. Refer to the

aio handling in the unix VM. Files, Sockets, and AsyncFiles could (should) use a common IO event handling mechanism (aio event signaling a Smalltalk Semaphore).

Indeed. Filesystem comes at this from the other direction, but I think we want to end up in the same place. For now I've done TSTTCPW, which is use the primitives from the FilePlugin. But eventually I want to improve the plumbing. You've done some important work here - perhaps Filesystem can use AioPlugin at some point.

Colin

Nicolas Cellier

11:24 p.m.

2009/11/27 Colin Putney cputney@wiresong.ca:

...

On 27-Nov-09, at 8:03 AM, David T. Lewis wrote:

...
I implemented IOHandle for this, see http://wiki.squeak.org/squeak/996. I have not maintained it since about 2003, but the idea is straightforward.

Yes. I looked into IOHandle when implementing Filesystem, but decided to go with a new (simpler, but limited) implementation that would let me explore the requirements for the stream architecture I had in mind.

...
My purpose at that time was to :

* Separate the representation of external IO channels from the represention of streams and communication protocols. * Provide a uniform representation of IO channels similar to the unix notion of treating everything as a 'file'. * Simplify future refactoring of Socket and FileStream. * Provide a place for handling asynchronous IO events. Refer to the aio handling in the unix VM. Files, Sockets, and AsyncFiles could (should) use a common IO event handling mechanism (aio event signaling a Smalltalk Semaphore).

Indeed. Filesystem comes at this from the other direction, but I think we want to end up in the same place. For now I've done TSTTCPW, which is use the primitives from the FilePlugin. But eventually I want to improve the plumbing. You've done some important work here - perhaps Filesystem can use AioPlugin at some point.

Colin

I wonder why level 3 stdio was used (FILE * fopen, fclose ...) rather than level 2 (int fid, open, close, ...) in file plugin... Better portability ?

Nicolas

Eliot Miranda

11:33 p.m.

On Fri, Nov 27, 2009 at 2:24 PM, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:

...

2009/11/27 Colin Putney cputney@wiresong.ca:

...
On 27-Nov-09, at 8:03 AM, David T. Lewis wrote:

...
I implemented IOHandle for this, see http://wiki.squeak.org/squeak/996. I have not maintained it since about 2003, but the idea is straightforward.

Yes. I looked into IOHandle when implementing Filesystem, but decided to

go

...
with a new (simpler, but limited) implementation that would let me

explore

...
the requirements for the stream architecture I had in mind.

...
My purpose at that time was to :

Separate the representation of external IO channels from the

represention of streams and communication protocols.

Provide a uniform representation of IO channels similar to the unix

notion of treating everything as a 'file'.

Simplify future refactoring of Socket and FileStream.

Provide a place for handling asynchronous IO events. Refer to the aio

handling in the unix VM. Files, Sockets, and AsyncFiles could (should) use a common IO event handling mechanism (aio event signaling a Smalltalk Semaphore).

Indeed. Filesystem comes at this from the other direction, but I think we want to end up in the same place. For now I've done TSTTCPW, which is use the primitives from the FilePlugin. But eventually I want to improve the plumbing. You've done some important work here - perhaps Filesystem can

use

...
AioPlugin at some point.

Colin

I wonder why level 3 stdio was used (FILE * fopen, fclose ...) rather than level 2 (int fid, open, close, ...) in file plugin... Better portability ?

level 2 isn't really a level, its a section of the unix manual pages. Section 2 is the system calls (which really define what unix is). Section 3 is libraries. So only the stdio library in section 3 is portable across C implementations. So yes, you're right, the use of the C library's stdio facilities was chosen for portability.

...

Nicolas

Eliot Miranda

11:42 p.m.

An approach I like is to add an endOfStreamValue inst var to Stream and answer its value when at end. This way nil does not have to be the endOfStreamValue, for example -1 might be much more convenient for a binary stream, and streams can answer nil without confusing their clients. atEnd can be implemented as atEnd ^self peek = self endOfStreamValue

You can arrange to make streams raise an end-of-stream exception instead of the endOfStreamValue by using some convention on the contents of endOfStreamValue, such as if it is == to the stream itself (although I note that in the Teleplace image the exception EndOfStrean is defined bit not used).

Of course, stream primitives get in the way of adding inst vars to stream classes ;)

IMO this is a much more useful scheme than making nil the only endOfStream value.

On Fri, Nov 27, 2009 at 2:33 PM, Eliot Miranda eliot.miranda@gmail.comwrote:

...

On Fri, Nov 27, 2009 at 2:24 PM, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:

...
2009/11/27 Colin Putney cputney@wiresong.ca:

...
On 27-Nov-09, at 8:03 AM, David T. Lewis wrote:

...
I implemented IOHandle for this, see http://wiki.squeak.org/squeak/996

.

...
...
I have not maintained it since about 2003, but the idea is straightforward.

Yes. I looked into IOHandle when implementing Filesystem, but decided to

go

...
with a new (simpler, but limited) implementation that would let me

explore

...
the requirements for the stream architecture I had in mind.

...
My purpose at that time was to :

Separate the representation of external IO channels from the

represention of streams and communication protocols.

Provide a uniform representation of IO channels similar to the unix

notion of treating everything as a 'file'.

Simplify future refactoring of Socket and FileStream.

Provide a place for handling asynchronous IO events. Refer to the

aio

...
...
handling in the unix VM. Files, Sockets, and AsyncFiles could

(should)

...
...
use a common IO event handling mechanism (aio event signaling a Smalltalk Semaphore).

Indeed. Filesystem comes at this from the other direction, but I think

we

...
want to end up in the same place. For now I've done TSTTCPW, which is

use

...
the primitives from the FilePlugin. But eventually I want to improve the plumbing. You've done some important work here - perhaps Filesystem can

use

...
AioPlugin at some point.

Colin

I wonder why level 3 stdio was used (FILE * fopen, fclose ...) rather than level 2 (int fid, open, close, ...) in file plugin... Better portability ?

level 2 isn't really a level, its a section of the unix manual pages. Section 2 is the system calls (which really define what unix is). Section 3 is libraries. So only the stdio library in section 3 is portable across C implementations. So yes, you're right, the use of the C library's stdio facilities was chosen for portability.

...
Nicolas

Nicolas Cellier

11:59 p.m.

2009/11/27 Eliot Miranda eliot.miranda@gmail.com:

...

An approach I like is to add an endOfStreamValue inst var to Stream and answer its value when at end. This way nil does not have to be the endOfStreamValue, for example -1 might be much more convenient for a binary stream, and streams can answer nil without confusing their clients. atEnd can be implemented as atEnd ^self peek = self endOfStreamValue You can arrange to make streams raise an end-of-stream exception instead of the endOfStreamValue by using some convention on the contents of endOfStreamValue, such as if it is == to the stream itself (although I note that in the Teleplace image the exception EndOfStrean is defined bit not used).

Of course, stream primitives get in the way of adding inst vars to stream classes ;) IMO this is a much more useful scheme than making nil the only endOfStream value.

Last time I proposed to have an inst var endOfStreamAction was here http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html . Abusing nil value -> nil, I could even let this inst var un-initialized and be backward compatible (initializing with a ValueHolder on nil would do as well)

Nicolas

...

On Fri, Nov 27, 2009 at 2:33 PM, Eliot Miranda eliot.miranda@gmail.com wrote:

...
On Fri, Nov 27, 2009 at 2:24 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:

...
2009/11/27 Colin Putney cputney@wiresong.ca:

...
On 27-Nov-09, at 8:03 AM, David T. Lewis wrote:

...
I implemented IOHandle for this, see http://wiki.squeak.org/squeak/996. I have not maintained it since about 2003, but the idea is straightforward.

Yes. I looked into IOHandle when implementing Filesystem, but decided to go with a new (simpler, but limited) implementation that would let me explore the requirements for the stream architecture I had in mind.

...
My purpose at that time was to :

* Separate the representation of external IO channels from the represention of streams and communication protocols. * Provide a uniform representation of IO channels similar to the unix notion of treating everything as a 'file'. * Simplify future refactoring of Socket and FileStream. * Provide a place for handling asynchronous IO events. Refer to the aio handling in the unix VM. Files, Sockets, and AsyncFiles could (should) use a common IO event handling mechanism (aio event signaling a Smalltalk Semaphore).

Indeed. Filesystem comes at this from the other direction, but I think we want to end up in the same place. For now I've done TSTTCPW, which is use the primitives from the FilePlugin. But eventually I want to improve the plumbing. You've done some important work here - perhaps Filesystem can use AioPlugin at some point.

Colin

I wonder why level 3 stdio was used (FILE * fopen, fclose ...) rather than level 2 (int fid, open, close, ...) in file plugin... Better portability ?

level 2 isn't really a level, its a section of the unix manual pages. Section 2 is the system calls (which really define what unix is). Section 3 is libraries. So only the stdio library in section 3 is portable across C implementations. So yes, you're right, the use of the C library's stdio facilities was chosen for portability.

...
Nicolas

Igor Stasenko

28 Nov 28 Nov

1:40 a.m.

2009/11/28 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...

2009/11/27 Eliot Miranda eliot.miranda@gmail.com:

...
An approach I like is to add an endOfStreamValue inst var to Stream and answer its value when at end. This way nil does not have to be the endOfStreamValue, for example -1 might be much more convenient for a binary stream, and streams can answer nil without confusing their clients. atEnd can be implemented as atEnd ^self peek = self endOfStreamValue You can arrange to make streams raise an end-of-stream exception instead of the endOfStreamValue by using some convention on the contents of endOfStreamValue, such as if it is == to the stream itself (although I note that in the Teleplace image the exception EndOfStrean is defined bit not used).

Of course, stream primitives get in the way of adding inst vars to stream classes ;) IMO this is a much more useful scheme than making nil the only endOfStream value.

Last time I proposed to have an inst var endOfStreamAction was here http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html . Abusing nil value -> nil, I could even let this inst var un-initialized and be backward compatible (initializing with a ValueHolder on nil would do as well)

Nicolas, have you considered introducing methods which allow graciously handle the end-of-stream while reading? Something like:

nextIfAtEnd: aBlock and next: number ifAtEnd: aBlock

then caller may choose to either write:

char := stream nextIfAtEnd: [nil]

or handle end of stream differently, like leaving the loop:

char := stream nextIfAtEnd: [^ results]

the benefit of such approach that code which reads the stream , don't needs to additionally test stream state (atEnd) in iteration between #next sends neither requires some unique value (like nil) returned by #next when reaching end of stream.

...

Nicolas

-- Best regards, Igor Stasenko AKA sig.

Eliot Miranda

1:45 a.m.

On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko siguctua@gmail.com wrote:

...

2009/11/28 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...
2009/11/27 Eliot Miranda eliot.miranda@gmail.com:

...
An approach I like is to add an endOfStreamValue inst var to Stream and answer its value when at end. This way nil does not have to be the endOfStreamValue, for example -1 might be much more convenient for a

binary

...
...
stream, and streams can answer nil without confusing their clients.

atEnd

...
...
can be implemented as atEnd ^self peek = self endOfStreamValue You can arrange to make streams raise an end-of-stream exception instead

of

...
...
the endOfStreamValue by using some convention on the contents of endOfStreamValue, such as if it is == to the stream itself (although I

note

...
...
that in the Teleplace image the exception EndOfStrean is defined bit not used).

Of course, stream primitives get in the way of adding inst vars to

stream

...
...
classes ;) IMO this is a much more useful scheme than making nil the only

endOfStream

...
...
value.

Last time I proposed to have an inst var endOfStreamAction was here

http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html

...
. Abusing nil value -> nil, I could even let this inst var un-initialized and be backward compatible (initializing with a ValueHolder on nil would do as well)

Nicolas, have you considered introducing methods which allow graciously handle the end-of-stream while reading? Something like:

nextIfAtEnd: aBlock and next: number ifAtEnd: aBlock

then caller may choose to either write:

char := stream nextIfAtEnd: [nil]

or handle end of stream differently, like leaving the loop:

char := stream nextIfAtEnd: [^ results]

the benefit of such approach that code which reads the stream , don't needs to additionally test stream state (atEnd) in iteration between #next sends neither requires some unique value (like nil) returned by #next when reaching end of stream.

IMO the block creation is too expensive for streams. The defaultHandler approach for and EndOfStream exception is also too expensive. The endOfStreamValue inst var is a nice trade-off between flexibility, efficiency and simplicity. You can always write [(value := stream next) ~~ stream endOfStreamValue] whileTrue: [...do stuff...

...

...
Nicolas

-- Best regards, Igor Stasenko AKA sig.

Igor Stasenko

1:54 a.m.

2009/11/28 Eliot Miranda eliot.miranda@gmail.com:

...

On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko siguctua@gmail.com wrote:

...
2009/11/28 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...
2009/11/27 Eliot Miranda eliot.miranda@gmail.com:

...
An approach I like is to add an endOfStreamValue inst var to Stream and answer its value when at end. This way nil does not have to be the endOfStreamValue, for example -1 might be much more convenient for a binary stream, and streams can answer nil without confusing their clients. atEnd can be implemented as atEnd ^self peek = self endOfStreamValue You can arrange to make streams raise an end-of-stream exception instead of the endOfStreamValue by using some convention on the contents of endOfStreamValue, such as if it is == to the stream itself (although I note that in the Teleplace image the exception EndOfStrean is defined bit not used).

Of course, stream primitives get in the way of adding inst vars to stream classes ;) IMO this is a much more useful scheme than making nil the only endOfStream value.

Last time I proposed to have an inst var endOfStreamAction was here

http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html . Abusing nil value -> nil, I could even let this inst var un-initialized and be backward compatible (initializing with a ValueHolder on nil would do as well)

Nicolas, have you considered introducing methods which allow graciously handle the end-of-stream while reading? Something like:

nextIfAtEnd: aBlock and next: number ifAtEnd: aBlock

then caller may choose to either write:

char := stream nextIfAtEnd: [nil]

or handle end of stream differently, like leaving the loop:

char := stream nextIfAtEnd: [^ results]

the benefit of such approach that code which reads the stream , don't needs to additionally test stream state (atEnd) in iteration between #next sends neither requires some unique value (like nil) returned by #next when reaching end of stream.

IMO the block creation is too expensive for streams. The defaultHandler approach for and EndOfStream exception is also too expensive. The endOfStreamValue inst var is a nice trade-off between flexibility, efficiency and simplicity. You can always write [(value := stream next) ~~ stream endOfStreamValue] whileTrue: [...do stuff...

hmm, can you elaborate, at what point you see an expensive block creation? A block closure is created once at compiling stage, and then passed as any other object by reading it from literal frame of method (and as well as , you can use 'stream nextIfAtEnd: nil' , right?). And only if its going to be activated (by sending #value), a corresponding block context is created in order to evaluate the block. But it happens only when you reaching the end of stream.

It is more expensive because of passing extra argument, i.e. use #nextIfAtEnd: instead of #next , but not because of passing block, IMO.

...

...
...
Nicolas

-- Best regards, Igor Stasenko AKA sig.

-- Best regards, Igor Stasenko AKA sig.

Levente Uzonyi

2:26 a.m.

On Sat, 28 Nov 2009, Igor Stasenko wrote:

...

2009/11/28 Eliot Miranda eliot.miranda@gmail.com:

...
On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko siguctua@gmail.com wrote:

...
2009/11/28 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...
2009/11/27 Eliot Miranda eliot.miranda@gmail.com:

...
An approach I like is to add an endOfStreamValue inst var to Stream and answer its value when at end. This way nil does not have to be the endOfStreamValue, for example -1 might be much more convenient for a binary stream, and streams can answer nil without confusing their clients. atEnd can be implemented as atEnd ^self peek = self endOfStreamValue You can arrange to make streams raise an end-of-stream exception instead of the endOfStreamValue by using some convention on the contents of endOfStreamValue, such as if it is == to the stream itself (although I note that in the Teleplace image the exception EndOfStrean is defined bit not used).

Of course, stream primitives get in the way of adding inst vars to stream classes ;) IMO this is a much more useful scheme than making nil the only endOfStream value.

Last time I proposed to have an inst var endOfStreamAction was here

http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html . Abusing nil value -> nil, I could even let this inst var un-initialized and be backward compatible (initializing with a ValueHolder on nil would do as well)

Nicolas, have you considered introducing methods which allow graciously handle the end-of-stream while reading? Something like:

nextIfAtEnd: aBlock and next: number ifAtEnd: aBlock

then caller may choose to either write:

char := stream nextIfAtEnd: [nil]

or handle end of stream differently, like leaving the loop:

char := stream nextIfAtEnd: [^ results]

the benefit of such approach that code which reads the stream , don't needs to additionally test stream state (atEnd) in iteration between #next sends neither requires some unique value (like nil) returned by #next when reaching end of stream.

IMO the block creation is too expensive for streams. The defaultHandler approach for and EndOfStream exception is also too expensive. The endOfStreamValue inst var is a nice trade-off between flexibility, efficiency and simplicity. You can always write [(value := stream next) ~~ stream endOfStreamValue] whileTrue: [...do stuff...

hmm, can you elaborate, at what point you see an expensive block creation? A block closure is created once at compiling stage, and then passed as any other object by reading it from literal frame of method (and as well as , you can use 'stream

In this case the block is copied and initialized every time you send #nextIfAtEnd:. It is only activated at the end of the stream, so most of the time it is just garbage.

Levente

...

nextIfAtEnd: nil' , right?). And only if its going to be activated (by sending #value), a corresponding block context is created in order to evaluate the block. But it happens only when you reaching the end of stream.

It is more expensive because of passing extra argument, i.e. use #nextIfAtEnd: instead of #next , but not because of passing block, IMO.

...
...
...
Nicolas

-- Best regards, Igor Stasenko AKA sig.

-- Best regards, Igor Stasenko AKA sig.

Nicolas Cellier

2:55 a.m.

2009/11/28 Levente Uzonyi leves@elte.hu:

...

On Sat, 28 Nov 2009, Igor Stasenko wrote:

...
2009/11/28 Eliot Miranda eliot.miranda@gmail.com:

...
On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko siguctua@gmail.com wrote:

...
2009/11/28 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...
2009/11/27 Eliot Miranda eliot.miranda@gmail.com:

...
An approach I like is to add an endOfStreamValue inst var to Stream and answer its value when at end. This way nil does not have to be the endOfStreamValue, for example -1 might be much more convenient for a binary stream, and streams can answer nil without confusing their clients. atEnd can be implemented as atEnd ^self peek = self endOfStreamValue You can arrange to make streams raise an end-of-stream exception instead of the endOfStreamValue by using some convention on the contents of endOfStreamValue, such as if it is == to the stream itself (although I note that in the Teleplace image the exception EndOfStrean is defined bit not used).

Of course, stream primitives get in the way of adding inst vars to stream classes ;) IMO this is a much more useful scheme than making nil the only endOfStream value.

Last time I proposed to have an inst var endOfStreamAction was here

http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html . Abusing nil value -> nil, I could even let this inst var un-initialized and be backward compatible (initializing with a ValueHolder on nil would do as well)

Nicolas, have you considered introducing methods which allow graciously handle the end-of-stream while reading? Something like:

nextIfAtEnd: aBlock and next: number ifAtEnd: aBlock

then caller may choose to either write:

char := stream nextIfAtEnd: [nil]

or handle end of stream differently, like leaving the loop:

char := stream nextIfAtEnd: [^ results]

the benefit of such approach that code which reads the stream , don't needs to additionally test stream state (atEnd) in iteration between #next sends neither requires some unique value (like nil) returned by #next when reaching end of stream.

IMO the block creation is too expensive for streams. The defaultHandler approach for and EndOfStream exception is also too expensive. The endOfStreamValue inst var is a nice trade-off between flexibility, efficiency and simplicity. You can always write [(value := stream next) ~~ stream endOfStreamValue] whileTrue: [...do stuff...

hmm, can you elaborate, at what point you see an expensive block creation? A block closure is created once at compiling stage, and then passed as any other object by reading it from literal frame of method (and as well as , you can use 'stream

In this case the block is copied and initialized every time you send #nextIfAtEnd:. It is only activated at the end of the stream, so most of the time it is just garbage.

Levente

http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-November/122512....

Nicolas

...

...
nextIfAtEnd: nil' , right?). And only if its going to be activated (by sending #value), a corresponding block context is created in order to evaluate the block. But it happens only when you reaching the end of stream.

It is more expensive because of passing extra argument, i.e. use #nextIfAtEnd: instead of #next , but not because of passing block, IMO.

...
...
...
Nicolas

-- Best regards, Igor Stasenko AKA sig.

-- Best regards, Igor Stasenko AKA sig.

Andreas Raab

1 Dec 1 Dec

7:31 a.m.

Hi Nicolas -

I finally got around to looking at this stuff. A couple of comments:

* Regardless of what the long-term solution is, I could really, really use the performance improvements of BufferedFileStream. How can we bring this to a usable point?

* I'm not sure I like the subclassing of StandardFileStream - I would probably opt to subclass FileStream, adopt the primitives and write the stuff on top from scratch (this also allows us to keep a filePosition which is explicitly updated etc).

* It is highly likely that read performance is dramatically more important than write performance in most cases. It may be worthwhile to start with just buffering reads and have writes go unbuffered. This also preserves current semantics, allowing to gradually phase in buffered writes where desired (i.e., using #flushAfter: aBlock). This would make BufferedFileStream instantly useful for our production uses.

In any case, I *really* like the direction. If we can get this into a usable state it would allow us to replace the sources and changes files with buffered versions. As a result I would expect measurable speedups in some of the macro benchmarks and other common operations (Object compileAll for example).

Cheers, - Andreas

Nicolas Cellier wrote:

...

2009/11/28 Levente Uzonyi leves@elte.hu:

...
On Sat, 28 Nov 2009, Igor Stasenko wrote:

...
2009/11/28 Eliot Miranda eliot.miranda@gmail.com:

...
On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko siguctua@gmail.com wrote:

...
2009/11/28 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...
2009/11/27 Eliot Miranda eliot.miranda@gmail.com: > An approach I like is to add an endOfStreamValue inst var to Stream > and > answer its value when at end. This way nil does not have to be the > endOfStreamValue, for example -1 might be much more convenient for a > binary > stream, and streams can answer nil without confusing their clients. > atEnd > can be implemented as > atEnd > ^self peek = self endOfStreamValue > You can arrange to make streams raise an end-of-stream exception > instead of > the endOfStreamValue by using some convention on the contents of > endOfStreamValue, such as if it is == to the stream itself (although I > note > that in the Teleplace image the exception EndOfStrean is defined bit > not > used). > > Of course, stream primitives get in the way of adding inst vars to > stream > classes ;) > IMO this is a much more useful scheme than making nil the only > endOfStream > value. > Last time I proposed to have an inst var endOfStreamAction was here

http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html . Abusing nil value -> nil, I could even let this inst var un-initialized and be backward compatible (initializing with a ValueHolder on nil would do as well)

Nicolas, have you considered introducing methods which allow graciously handle the end-of-stream while reading? Something like:

nextIfAtEnd: aBlock and next: number ifAtEnd: aBlock

then caller may choose to either write:

char := stream nextIfAtEnd: [nil]

or handle end of stream differently, like leaving the loop:

char := stream nextIfAtEnd: [^ results]

the benefit of such approach that code which reads the stream , don't needs to additionally test stream state (atEnd) in iteration between #next sends neither requires some unique value (like nil) returned by #next when reaching end of stream.

IMO the block creation is too expensive for streams. The defaultHandler approach for and EndOfStream exception is also too expensive. The endOfStreamValue inst var is a nice trade-off between flexibility, efficiency and simplicity. You can always write [(value := stream next) ~~ stream endOfStreamValue] whileTrue: [...do stuff...

hmm, can you elaborate, at what point you see an expensive block creation? A block closure is created once at compiling stage, and then passed as any other object by reading it from literal frame of method (and as well as , you can use 'stream

In this case the block is copied and initialized every time you send #nextIfAtEnd:. It is only activated at the end of the stream, so most of the time it is just garbage.

Levente

http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-November/122512....

Nicolas

...
...
nextIfAtEnd: nil' , right?). And only if its going to be activated (by sending #value), a corresponding block context is created in order to evaluate the block. But it happens only when you reaching the end of stream.

It is more expensive because of passing extra argument, i.e. use #nextIfAtEnd: instead of #next , but not because of passing block, IMO.

...
...
...
Nicolas

-- Best regards, Igor Stasenko AKA sig.

-- Best regards, Igor Stasenko AKA sig.

Nicolas Cellier

9:23 a.m.

2009/12/1 Andreas Raab andreas.raab@gmx.de:

...

Hi Nicolas -

I finally got around to looking at this stuff. A couple of comments:

Regardless of what the long-term solution is, I could really, really use

the performance improvements of BufferedFileStream. How can we bring this to a usable point?

First, the code for read/write I provided was completely bogus, I now have a better one passing some tests. Meanwhile, I started to have a look at XTream and played a bit with these ideas: - separate read/write Stream - every ReadStream has a source, every WriteStream has a destination. - have different kinds of Read/Write streams: Collection/File/Buffered/... - separate IOHandle for handling basic primitives A big part of XTream is the way to transform Streams using blocks, especially the most powerfull transforming: [:inputStream :outputStream | Another point is uniform usage of EndOfStream exception (Incomplete). I started to play with an endOfStreamAction alternative. Another point is usage of Buffer object: this piece allows implementing read/write streams acting on same sequence. It also is a key to performance...

XTream also totally change the API (put, get etc...), but it does not have to (or maybe it does have to be XTreme to deserve its name).

...

I'm not sure I like the subclassing of StandardFileStream - I would

probably opt to subclass FileStream, adopt the primitives and write the stuff on top from scratch (this also allows us to keep a filePosition which is explicitly updated etc).

My very basic approach for short term performance would be: - intoduce IOHandle in image for handling primitives (only for files in a first time, and without modifying StandardFileStream, but just duplicating to be minimal) - introduce a BufferedReadStream and a BufferedReadWriteStream under PositionableStream using this IOHandle as source - keep same external API, only hack a few creation methods...

In a second time we will have to decide what to do with MultiByteFileStream: it is a performance bottleneck too. For a start, I would simply wrap around a buffered one...

...

It is highly likely that read performance is dramatically more important

than write performance in most cases. It may be worthwhile to start with just buffering reads and have writes go unbuffered. This also preserves current semantics, allowing to gradually phase in buffered writes where desired (i.e., using #flushAfter: aBlock). This would make BufferedFileStream instantly useful for our production uses.

In any case, I *really* like the direction. If we can get this into a usable state it would allow us to replace the sources and changes files with buffered versions. As a result I would expect measurable speedups in some of the macro benchmarks and other common operations (Object compileAll for example).

Concerning macro benchmark, StandardFileStream reading is already performant in case of pure Random access (upTo: is already buffered). The gain is for more sequence oriented algorithms. However, chances are that a loaded package has its source sequentially laid in changes, condenseChanges also organize source code that way, so Object compileAll might show a difference eventually.

Nicolas

...

Cheers, - Andreas

Nicolas Cellier wrote:

...
2009/11/28 Levente Uzonyi leves@elte.hu:

...
On Sat, 28 Nov 2009, Igor Stasenko wrote:

...
2009/11/28 Eliot Miranda eliot.miranda@gmail.com:

...
On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko siguctua@gmail.com wrote:

...
2009/11/28 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com: > > 2009/11/27 Eliot Miranda eliot.miranda@gmail.com: >> >> An approach I like is to add an endOfStreamValue inst var to Stream >> and >> answer its value when at end. This way nil does not have to be the >> endOfStreamValue, for example -1 might be much more convenient for a >> binary >> stream, and streams can answer nil without confusing their clients. >> atEnd >> can be implemented as >> atEnd >> ^self peek = self endOfStreamValue >> You can arrange to make streams raise an end-of-stream exception >> instead of >> the endOfStreamValue by using some convention on the contents of >> endOfStreamValue, such as if it is == to the stream itself (although >> I >> note >> that in the Teleplace image the exception EndOfStrean is defined bit >> not >> used). >> >> Of course, stream primitives get in the way of adding inst vars to >> stream >> classes ;) >> IMO this is a much more useful scheme than making nil the only >> endOfStream >> value. >> > Last time I proposed to have an inst var endOfStreamAction was here > > > > http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html > . > Abusing nil value -> nil, I could even let this inst var > un-initialized and be backward compatible > (initializing with a ValueHolder on nil would do as well) > Nicolas, have you considered introducing methods which allow graciously handle the end-of-stream while reading? Something like:

nextIfAtEnd: aBlock and next: number ifAtEnd: aBlock

then caller may choose to either write:

char := stream nextIfAtEnd: [nil]

or handle end of stream differently, like leaving the loop:

char := stream nextIfAtEnd: [^ results]

the benefit of such approach that code which reads the stream , don't needs to additionally test stream state (atEnd) in iteration between #next sends neither requires some unique value (like nil) returned by #next when reaching end of stream.

IMO the block creation is too expensive for streams. The defaultHandler approach for and EndOfStream exception is also too expensive. The endOfStreamValue inst var is a nice trade-off between flexibility, efficiency and simplicity. You can always write [(value := stream next) ~~ stream endOfStreamValue] whileTrue: [...do stuff...

hmm, can you elaborate, at what point you see an expensive block creation? A block closure is created once at compiling stage, and then passed as any other object by reading it from literal frame of method (and as well as , you can use 'stream

In this case the block is copied and initialized every time you send #nextIfAtEnd:. It is only activated at the end of the stream, so most of the time it is just garbage.

Levente

http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-November/122512....

Nicolas

...
...
nextIfAtEnd: nil' , right?). And only if its going to be activated (by sending #value), a corresponding block context is created in order to evaluate the block. But it happens only when you reaching the end of stream.

It is more expensive because of passing extra argument, i.e. use #nextIfAtEnd: instead of #next , but not because of passing block, IMO.

...
...
> Nicolas >

-- Best regards, Igor Stasenko AKA sig.

-- Best regards, Igor Stasenko AKA sig.

Andreas Raab

9:52 a.m.

Nicolas Cellier wrote:

...

Concerning macro benchmark, StandardFileStream reading is already performant in case of pure Random access (upTo: is already buffered). The gain is for more sequence oriented algorithms. However, chances are that a loaded package has its source sequentially laid in changes, condenseChanges also organize source code that way, so Object compileAll might show a difference eventually.

Oh, it will. Here are the leaves for "Object compileAll":

**Leaves** 71.0 (1,149) StandardFileStream primRead:into:startingAt:count: 2.0 (32) ByteString at:put: 1.8 (29) CompiledMethod flushCache

That says that if you speed up #next by a factor of 5x (which is trivial using BufferedFileStream) it'll make compileAll 2-3x faster overall. I think we'll see similar 2x speedups for other common operations on source code (recent changes, browsing versions etc).

Faster I/O can make a *huge* difference in speed for the whole system.

Cheers, - Andreas

Nicolas Cellier

9:58 a.m.

2009/12/1 Andreas Raab andreas.raab@gmx.de:

...

Nicolas Cellier wrote:

...
Concerning macro benchmark, StandardFileStream reading is already performant in case of pure Random access (upTo: is already buffered). The gain is for more sequence oriented algorithms. However, chances are that a loaded package has its source sequentially laid in changes, condenseChanges also organize source code that way, so Object compileAll might show a difference eventually.

Oh, it will. Here are the leaves for "Object compileAll":

**Leaves** 71.0 (1,149) StandardFileStream primRead:into:startingAt:count: 2.0 (32) ByteString at:put: 1.8 (29) CompiledMethod flushCache

That says that if you speed up #next by a factor of 5x (which is trivial using BufferedFileStream) it'll make compileAll 2-3x faster overall. I think we'll see similar 2x speedups for other common operations on source code (recent changes, browsing versions etc).

Faster I/O can make a *huge* difference in speed for the whole system.

Cheers, - Andreas

Oh yes, but this is MultiByteFileStream that reads characters 1 by 1... A StandardFileStream would already be much more performant.

Nicolas

Igor Stasenko

28 Nov 28 Nov

3:47 a.m.

2009/11/28 Levente Uzonyi leves@elte.hu:

...

On Sat, 28 Nov 2009, Igor Stasenko wrote:

...
2009/11/28 Eliot Miranda eliot.miranda@gmail.com:

...
On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko siguctua@gmail.com wrote:

...
2009/11/28 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...
2009/11/27 Eliot Miranda eliot.miranda@gmail.com:

...
An approach I like is to add an endOfStreamValue inst var to Stream and answer its value when at end. This way nil does not have to be the endOfStreamValue, for example -1 might be much more convenient for a binary stream, and streams can answer nil without confusing their clients. atEnd can be implemented as atEnd ^self peek = self endOfStreamValue You can arrange to make streams raise an end-of-stream exception instead of the endOfStreamValue by using some convention on the contents of endOfStreamValue, such as if it is == to the stream itself (although I note that in the Teleplace image the exception EndOfStrean is defined bit not used).

Of course, stream primitives get in the way of adding inst vars to stream classes ;) IMO this is a much more useful scheme than making nil the only endOfStream value.

Last time I proposed to have an inst var endOfStreamAction was here

http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html . Abusing nil value -> nil, I could even let this inst var un-initialized and be backward compatible (initializing with a ValueHolder on nil would do as well)

Nicolas, have you considered introducing methods which allow graciously handle the end-of-stream while reading? Something like:

nextIfAtEnd: aBlock and next: number ifAtEnd: aBlock

then caller may choose to either write:

char := stream nextIfAtEnd: [nil]

or handle end of stream differently, like leaving the loop:

char := stream nextIfAtEnd: [^ results]

the benefit of such approach that code which reads the stream , don't needs to additionally test stream state (atEnd) in iteration between #next sends neither requires some unique value (like nil) returned by #next when reaching end of stream.

IMO the block creation is too expensive for streams. The defaultHandler approach for and EndOfStream exception is also too expensive. The endOfStreamValue inst var is a nice trade-off between flexibility, efficiency and simplicity. You can always write [(value := stream next) ~~ stream endOfStreamValue] whileTrue: [...do stuff...

hmm, can you elaborate, at what point you see an expensive block creation? A block closure is created once at compiling stage, and then passed as any other object by reading it from literal frame of method (and as well as , you can use 'stream

In this case the block is copied and initialized every time you send #nextIfAtEnd:. It is only activated at the end of the stream, so most of the time it is just garbage.

ah, yes.. forgot about that.

Well, you can move the block out of the loop: | block |

block := [ self foo ].

[ stream nextIfAtEnd: block .. ] repeat.

but of course, its not always possible and not first thought which comes into mind when you using blocks when coding.

Btw, i think is good field for compiler/runtime optimizations - to avoid excessive closure creation inside a loops/nested blocks.

...

Levente

...
nextIfAtEnd: nil' , right?). And only if its going to be activated (by sending #value), a corresponding block context is created in order to evaluate the block. But it happens only when you reaching the end of stream.

It is more expensive because of passing extra argument, i.e. use #nextIfAtEnd: instead of #next , but not because of passing block, IMO.

...
...
...
Nicolas

-- Best regards, Igor Stasenko AKA sig.

-- Best regards, Igor Stasenko AKA sig.

-- Best regards, Igor Stasenko AKA sig.

David T. Lewis

3 Dec 3 Dec

7:12 p.m.

New subject: HANDLE versus stdio (was: Faster FileStream experiments)

On Fri, Nov 27, 2009 at 11:24:37PM +0100, Nicolas Cellier wrote:

...

I wonder why level 3 stdio was used (FILE * fopen, fclose ...) rather than level 2 (int fid, open, close, ...) in file plugin... Better portability ?

Nicolas

This is an interesting question, and I wonder if anyone has ever measured the difference between the two approaches.

In the Windows VM, FilePlugin uses the standard win32 interface interacting with HANDLE rather than (FILE *). The other VMs use stdio in the file plugin. Loosely speaking, the Windows VM is using "level 2", while the other VMs are working at "level 3".

Has anyone ever measured the file IO performance of an image running on the Windows VM versus a Unix or Mac VM on the same hardware? The plugin could be written in either manner on any of the platforms, so it would be interesting to know if one approach delivers better overall results than the other.

Dave

Igor Stasenko

7:19 p.m.

New subject: HANDLE versus stdio (was: Faster FileStream experiments)

2009/12/3 David T. Lewis lewis@mail.msen.com:

...

On Fri, Nov 27, 2009 at 11:24:37PM +0100, Nicolas Cellier wrote:

...
I wonder why level 3 stdio was used (FILE * fopen, fclose ...) rather than level 2 (int fid, open, close, ...) in file plugin... Better portability ?

Nicolas

This is an interesting question, and I wonder if anyone has ever measured the difference between the two approaches.

In the Windows VM, FilePlugin uses the standard win32 interface interacting with HANDLE rather than (FILE *). The other VMs use stdio in the file plugin. Loosely speaking, the Windows VM is using "level 2", while the other VMs are working at "level 3".

Has anyone ever measured the file IO performance of an image running on the Windows VM versus a Unix or Mac VM on the same hardware? The plugin could be written in either manner on any of the platforms, so it would be interesting to know if one approach delivers better overall results than the other.

Well, its quite nontrivial task to compare IO performance between platforms, because there many other things involved. One could run faster than other, but you can't tell exactly it is because of use of different IO functions in primitives, because it could be because of different reasons, like memory management, the way how OS scheduling CPU etc.

...

Dave

-- Best regards, Igor Stasenko AKA sig.

Colin Putney

7:41 p.m.

New subject: HANDLE versus stdio (was: Faster FileStream experiments)

On 3-Dec-09, at 10:12 AM, David T. Lewis wrote:

...

In the Windows VM, FilePlugin uses the standard win32 interface interacting with HANDLE rather than (FILE *). The other VMs use stdio in the file plugin. Loosely speaking, the Windows VM is using "level 2", while the other VMs are working at "level 3".

Performance is certainly interesting, but to me it's mostly a question design. FILE * is a high-level API, meant for C programmers to use directly. A lot of what it provides is already implemented in the image-level code.

For example, a FileStream has a position instance variable, but the FilePlugin primitives also maintain a position, and we have to choose between keeping them in sync or ignoring the image-level position. Both of those are problematic. I'd like to see file IO primitives that are implemented in terms of pread() and pwrite() or the like.

I like to primtives to be as primitive as possible. :-)

Colin

merlyn＠stonehenge.com

27 Nov 27 Nov

6:36 a.m.

...

...
...
...
...
"Nicolas" == Nicolas Cellier nicolas.cellier.aka.nice@gmail.com writes:

Nicolas> The path to a cleaner/faster stream library is longer than just this Nicolas> little step. Beside testing, we'd have to refactor the hierarchy, Nicolas> insulate all instance variables, and delegate as much as possible as Nicolas> Igor suggested. We'd better continue on the cleaning path and not Nicolas> just add another FileStream subclass complexifying a bit more an Nicolas> unecessarily complex library.

Michael Lucas-Smith gave a nice talk on Xtreams at the Portland Linux Users Group. The most interesting thing out of this is the notion that #atEnd is just plain wrong. For some streams, computing #atEnd is impossible. For most streams, it's just expensive. Instead, Xtreams takes the approach that #do: suffices for most people, and for those that can't, an exception when you read past end-of-stream can provide the proper exit from your loop. Then, your loop can concentrate on what happens most of the time, instead of what happens rarely.

Xtreams is under a liberal license, and is currently in the Cincom public store.

Instead of reinventing yet another stream package, we should be looking at Xtreams, I think.

(As a side effect, Xtreams has as a test a very nice PEG parsing package... so we'd get DSLs for relatively free.)

-- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 merlyn@stonehenge.com URL:http://www.stonehenge.com/merlyn/ Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion

Colin Putney

9:07 a.m.

On 26-Nov-09, at 9:36 PM, Randal L. Schwartz wrote:

...

Xtreams is under a liberal license, and is currently in the Cincom public store.

Instead of reinventing yet another stream package, we should be looking at Xtreams, I think.

Very cool. Definitely need to steal ideas from them.

...and code, perhaps? I did a bit of poking around, but couldn't find anything on the web that said what the license actually is. Can you be more specific than "liberal?"

Colin

merlyn＠stonehenge.com

5:13 p.m.

...

...
...
...
...
"Colin" == Colin Putney cputney@wiresong.ca writes:

Colin> ...and code, perhaps? I did a bit of poking around, but couldn't find Colin> anything on the web that said what the license actually is. Can you be Colin> more specific than "liberal?"

MLS made it clear at the meeting that Cincom's default release model is now "open source" except for things that are business differentiating, and in fact, in particular, they would really like to see Xtreams adopted widely, so the license would have to be MIT-like for htat to happen.

I'm sure if we poked Arden or James Robertson we could get a statement of license for Xtreams available rather quickly.

Colin Putney

6:01 p.m.

On 27-Nov-09, at 8:13 AM, Randal L. Schwartz wrote:

...

...
...
...
...
...
"Colin" == Colin Putney cputney@wiresong.ca writes:

Colin> ...and code, perhaps? I did a bit of poking around, but couldn't find Colin> anything on the web that said what the license actually is. Can you be Colin> more specific than "liberal?"

MLS made it clear at the meeting that Cincom's default release model is now "open source" except for things that are business differentiating, and in fact, in particular, they would really like to see Xtreams adopted widely, so the license would have to be MIT-like for htat to happen.

I'm sure if we poked Arden or James Robertson we could get a statement of license for Xtreams available rather quickly.

I'm not going to hold my breath on that one. When Vassili wrote Announcements, I tried to get Cincom to attach an open source license to it. They loved the idea, wanted Announcements to be adopted widely, etc. Very positive, but never actually did it. Eventually, I wrote a new implementation from scratch in less time than I had already wasted dealing with Cincom.

This was a few years ago, and maybe things have changed at Cincom, but given that they haven't actually attached a license yet, I'd be very surprised if the shortest path to Xtreams-like functionality in Squeak involved the Cincom code.

Colin

Diego Gomez Deck

12:38 p.m.

...

Nicolas> The path to a cleaner/faster stream library is longer than just this Nicolas> little step. Beside testing, we'd have to refactor the hierarchy, Nicolas> insulate all instance variables, and delegate as much as possible as Nicolas> Igor suggested. We'd better continue on the cleaning path and not Nicolas> just add another FileStream subclass complexifying a bit more an Nicolas> unecessarily complex library.

Michael Lucas-Smith gave a nice talk on Xtreams at the Portland Linux Users Group. The most interesting thing out of this is the notion that #atEnd is just plain wrong. For some streams, computing #atEnd is impossible. For most streams, it's just expensive. Instead, Xtreams takes the approach that #do: suffices for most people, and for those that can't, an exception when you read past end-of-stream can provide the proper exit from your loop. Then, your loop can concentrate on what happens most of the time, instead of what happens rarely.

I think we need a common superclass for Streams and Collection named Iterable where #do: is abstract and #select:, #collect:, #reject:, #count:, #detect:, etc (and quiet a lot of the messages in enumerating category of Collection) are implemented based on #do:

Of course Stream can refine the #select:/#reject methods to answer a FilteredStream that decorates the receiver and apply the filtering on the fly. In the same way #collect: can return a TransformedStream that decorates the receiver, etc.

Just my 2 cents.

Cheers,

-- Diego

Nicolas Cellier

12:57 p.m.

2009/11/27 Diego Gomez Deck DiegoGomezDeck@consultar.com:

...

...
Nicolas> The path to a cleaner/faster stream library is longer than just this Nicolas> little step. Beside testing, we'd have to refactor the hierarchy, Nicolas> insulate all instance variables, and delegate as much as possible as Nicolas> Igor suggested. We'd better continue on the cleaning path and not Nicolas> just add another FileStream subclass complexifying a bit more an Nicolas> unecessarily complex library.

Michael Lucas-Smith gave a nice talk on Xtreams at the Portland Linux Users Group. The most interesting thing out of this is the notion that #atEnd is just plain wrong. For some streams, computing #atEnd is impossible. For most streams, it's just expensive. Instead, Xtreams takes the approach that #do: suffices for most people, and for those that can't, an exception when you read past end-of-stream can provide the proper exit from your loop. Then, your loop can concentrate on what happens most of the time, instead of what happens rarely.

I think we need a common superclass for Streams and Collection named Iterable where #do: is abstract and #select:, #collect:, #reject:, #count:, #detect:, etc (and quiet a lot of the messages in enumerating category of Collection) are implemented based on #do:

Of course Stream can refine the #select:/#reject methods to answer a FilteredStream that decorates the receiver and apply the filtering on the fly. In the same way #collect: can return a TransformedStream that decorates the receiver, etc.

Just my 2 cents.

Cheers,

-- Diego

Yes, this is gst approach, and it seems a good one.

...

Ralph Johnson

1:15 p.m.

...

I think we need a common superclass for Streams and Collection named Iterable where #do: is abstract and #select:, #collect:, #reject:, #count:, #detect:, etc (and quiet a lot of the messages in enumerating category of Collection) are implemented based on #do:

Of course Stream can refine the #select:/#reject methods to answer a FilteredStream that decorates the receiver and apply the filtering on the fly. In the same way #collect: can return a TransformedStream that decorates the receiver, etc.

Since Stream can't reuse #select: and #collect: (or #count, and #detect: on an infinite stream is risky), they shouldn't be in the superclass. In that case, what is its purpose?

i think it is fine to give Stream the same interface as Collection. I do this, too. But they will share very little code, and so there is no need to give them a common superclass.

-Ralph Johnson

Diego Gomez Deck

1:34 p.m.

El vie, 27-11-2009 a las 06:15 -0600, Ralph Johnson escribió:

...

...
I think we need a common superclass for Streams and Collection named Iterable where #do: is abstract and #select:, #collect:, #reject:, #count:, #detect:, etc (and quiet a lot of the messages in enumerating category of Collection) are implemented based on #do:

Of course Stream can refine the #select:/#reject methods to answer a FilteredStream that decorates the receiver and apply the filtering on the fly. In the same way #collect: can return a TransformedStream that decorates the receiver, etc.

Since Stream can't reuse #select: and #collect: (or #count, and #detect: on an infinite stream is risky),

Stream and Collection are just the 2 refinements of Iterable that we're talking about in this thread, but there are a lot of classes that can benefit from Iterable as a super-class.

On the other side, Stream has #do: (and #atEnd/#next pair) and it's also risky for infinite streams. To push this discussion forward, Is InfiniteStream a real Stream?

...

they shouldn't be in the superclass. In that case, what is its purpose?

i think it is fine to give Stream the same interface as Collection. I do this, too. But they will share very little code, and so there is no need to give them a common superclass.

-Ralph Johnson

Cheers,

-- Diego

Nicolas Cellier

3:22 p.m.

2009/11/27 Diego Gomez Deck DiegoGomezDeck@consultar.com:

...

El vie, 27-11-2009 a las 06:15 -0600, Ralph Johnson escribió:

...
...
I think we need a common superclass for Streams and Collection named Iterable where #do: is abstract and #select:, #collect:, #reject:, #count:, #detect:, etc (and quiet a lot of the messages in enumerating category of Collection) are implemented based on #do:

Of course Stream can refine the #select:/#reject methods to answer a FilteredStream that decorates the receiver and apply the filtering on the fly. In the same way #collect: can return a TransformedStream that decorates the receiver, etc.

Since Stream can't reuse #select: and #collect: (or #count, and #detect: on an infinite stream is risky),

Stream and Collection are just the 2 refinements of Iterable that we're talking about in this thread, but there are a lot of classes that can benefit from Iterable as a super-class.

On the other side, Stream has #do: (and #atEnd/#next pair) and it's also risky for infinite streams. To push this discussion forward, Is InfiniteStream a real Stream?

...
they shouldn't be in the superclass. In that case, what is its purpose?

i think it is fine to give Stream the same interface as Collection. I do this, too. But they will share very little code, and so there is no need to give them a common superclass.

-Ralph Johnson

Cheers,

-- Diego

#select: and #collect: are not necessarily dangerous even on infinite stream once you see them as filters and implement them with a lazy block evaluation : Stream select: aBlock should return a SelectStream (find a better name here :). Then you would use it with #next, as any other InfiniteStream.

...

Diego Gomez Deck

6:08 p.m.

El vie, 27-11-2009 a las 15:22 +0100, Nicolas Cellier escribió:

...

2009/11/27 Diego Gomez Deck DiegoGomezDeck@consultar.com:

...
El vie, 27-11-2009 a las 06:15 -0600, Ralph Johnson escribió:

...
...
I think we need a common superclass for Streams and Collection named Iterable where #do: is abstract and #select:, #collect:, #reject:, #count:, #detect:, etc (and quiet a lot of the messages in enumerating category of Collection) are implemented based on #do:

Of course Stream can refine the #select:/#reject methods to answer a FilteredStream that decorates the receiver and apply the filtering on the fly. In the same way #collect: can return a TransformedStream that decorates the receiver, etc.

Since Stream can't reuse #select: and #collect: (or #count, and #detect: on an infinite stream is risky),

Stream and Collection are just the 2 refinements of Iterable that we're talking about in this thread, but there are a lot of classes that can benefit from Iterable as a super-class.

On the other side, Stream has #do: (and #atEnd/#next pair) and it's also risky for infinite streams. To push this discussion forward, Is InfiniteStream a real Stream?

...
they shouldn't be in the superclass. In that case, what is its purpose?

i think it is fine to give Stream the same interface as Collection. I do this, too. But they will share very little code, and so there is no need to give them a common superclass.

-Ralph Johnson

Cheers,

-- Diego

#select: and #collect: are not necessarily dangerous even on infinite stream once you see them as filters and implement them with a lazy block evaluation : Stream select: aBlock should return a SelectStream (find a better name here :). Then you would use it with #next, as any other InfiniteStream.

Sure, it was my point... The only risk with InfiniteStreams is #do:

My proposal is to create Iterable class, with default implementations of #select:, #collect:, etc all based on #do: (Just like Collection implements #size based on #do: but most collections just overwrite it with a faster version). This implementation is (at the same time) naive implementations and documentation of the expected behaviour all writren in terms of #do:.

Stream implements #select:, #collect: (and those type of messages) answering a IterableDecorator that make the selection/collection/etc in lazy way.

There are also some other useful decorators to implement: like IterableComposite (an union of several iterables that can be handled like one).

The FilterIterator/CollectorIterator can also be used to select/collect lazyly on collections.

For InfiniteStreams (Random, Fibonacci Numbers, etc) I propose to create a type of "Generators" that are "less" than a Stream and less than a Iterator (they have not concept of #atEnd, #do: doesn't make sense, etc). Anyway, I'm not sure how many InfiniteStream we have in current Squeak. I remember Random was a Stream in Smalltalk/80, but not sure the current state in Squeak.

Cheers,

-- Diego

Nicolas Cellier

11:29 p.m.

2009/11/27 Diego Gomez Deck DiegoGomezDeck@consultar.com:

...

El vie, 27-11-2009 a las 15:22 +0100, Nicolas Cellier escribió:

...
2009/11/27 Diego Gomez Deck DiegoGomezDeck@consultar.com:

...
El vie, 27-11-2009 a las 06:15 -0600, Ralph Johnson escribió:

...
...
I think we need a common superclass for Streams and Collection named Iterable where #do: is abstract and #select:, #collect:, #reject:, #count:, #detect:, etc (and quiet a lot of the messages in enumerating category of Collection) are implemented based on #do:

Of course Stream can refine the #select:/#reject methods to answer a FilteredStream that decorates the receiver and apply the filtering on the fly. In the same way #collect: can return a TransformedStream that decorates the receiver, etc.

Since Stream can't reuse #select: and #collect: (or #count, and #detect: on an infinite stream is risky),

Stream and Collection are just the 2 refinements of Iterable that we're talking about in this thread, but there are a lot of classes that can benefit from Iterable as a super-class.

On the other side, Stream has #do: (and #atEnd/#next pair) and it's also risky for infinite streams. To push this discussion forward, Is InfiniteStream a real Stream?

...
they shouldn't be in the superclass. In that case, what is its purpose?

i think it is fine to give Stream the same interface as Collection. I do this, too. But they will share very little code, and so there is no need to give them a common superclass.

-Ralph Johnson

Cheers,

-- Diego

#select: and #collect: are not necessarily dangerous even on infinite stream once you see them as filters and implement them with a lazy block evaluation : Stream select: aBlock should return a SelectStream (find a better name here :). Then you would use it with #next, as any other InfiniteStream.

Sure, it was my point... The only risk with InfiniteStreams is #do:

My proposal is to create Iterable class, with default implementations of #select:, #collect:, etc all based on #do: (Just like Collection implements #size based on #do: but most collections just overwrite it with a faster version). This implementation is (at the same time) naive implementations and documentation of the expected behaviour all writren in terms of #do:.

Stream implements #select:, #collect: (and those type of messages) answering a IterableDecorator that make the selection/collection/etc in lazy way.

There are also some other useful decorators to implement: like IterableComposite (an union of several iterables that can be handled like one).

The FilterIterator/CollectorIterator can also be used to select/collect lazyly on collections.

For InfiniteStreams (Random, Fibonacci Numbers, etc) I propose to create a type of "Generators" that are "less" than a Stream and less than a Iterator (they have not concept of #atEnd, #do: doesn't make sense, etc). Anyway, I'm not sure how many InfiniteStream we have in current Squeak. I remember Random was a Stream in Smalltalk/80, but not sure the current state in Squeak.

Oh, they could have a very simple concept: atEnd ^false, do: aBlock [aBlock value: self next] repeat. But we might want to discourage such usage as well indeed

Nicolas

...

Cheers,

-- Diego

Cyrille Delaunay

1:31 p.m.

2009/11/27 Diego Gomez Deck DiegoGomezDeck@consultar.com

...

...
Nicolas> The path to a cleaner/faster stream library is longer than just

this

...
Nicolas> little step. Beside testing, we'd have to refactor the

hierarchy,

...
Nicolas> insulate all instance variables, and delegate as much as

possible as

...
Nicolas> Igor suggested. We'd better continue on the cleaning path and

not

...
Nicolas> just add another FileStream subclass complexifying a bit more an Nicolas> unecessarily complex library.

Michael Lucas-Smith gave a nice talk on Xtreams at the Portland Linux

Users

...
Group. The most interesting thing out of this is the notion that #atEnd

is

...
just plain wrong. For some streams, computing #atEnd is impossible. For

most

...
streams, it's just expensive. Instead, Xtreams takes the approach that

#do:

...
suffices for most people, and for those that can't, an exception when you

read

...
past end-of-stream can provide the proper exit from your loop. Then,

your

...
loop can concentrate on what happens most of the time, instead of what

happens

...
rarely.

I think we need a common superclass for Streams and Collection named Iterable where #do: is abstract and #select:, #collect:, #reject:, #count:, #detect:, etc (and quiet a lot of the messages in enumerating category of Collection) are implemented based on #do:

Maybe I'm wrong but I think traits are a good (better) solution for that kind of problem. #do can be a required method and you can implement remaining methods with #do

...

Of course Stream can refine the #select:/#reject methods to answer a FilteredStream that decorates the receiver and apply the filtering on the fly. In the same way #collect: can return a TransformedStream that decorates the receiver, etc.

Just my 2 cents.

Cheers,

-- Diego

Eliot Miranda

10:18 p.m.

Hi Nicholas,

here are my timings from Cog. Only the ratios correspond since the source file is of a different size, my machine is different, and Cog runs at very different speeds to the interpreter. With that in mind...

t1 is nextLine over the sources file via StandardFileStream t2 is nextLine over the sources file via BufferedFileStream t3 is next over the sources file via StandardFileStream t4 is next over the sources file via BufferedFileStream

Cog: an OrderedCollection(11101 836 9626 2306)

Normalizing to the first measurement: 1.0 0.075 0.867 0.208

Your ratios are 1.0 0.206 4.827 0.678

I'd say BufferedFileStream is waaaaay faster :)

P.S. your timing doit revealed a bug in Cog which is why it has taken a while to respond with the results :) The doit's temp names are encoded and appended to the method as extra bytes. The JIT wasn't ignoring these extra bytes, and your doit just happened to cause the JIT to follow a null pointer mistakenly scanning these extra bytes. So thank you :)

On Wed, Nov 18, 2009 at 3:10 AM, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:

...

I just gave a try to the BufferedFileStream. As usual, code is MIT. Implementation is rough, readOnly, partial (no support for basicNext crap & al), untested (certainly has bugs). Early timing experiments have shown a 5x to 7x speed up on [stream nextLine] and [stream next] micro benchmarks See class comment of attachment

Reminder: This bench is versus StandardFileStream. StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse! This still let some more room...

Integrating and testing a read/write version is a lot harder than this experiment, but we should really do it.

Nicolas

Nicolas Cellier

10:49 p.m.

2009/11/27 Eliot Miranda eliot.miranda@gmail.com:

...

Hi Nicholas, here are my timings from Cog. Only the ratios correspond since the source file is of a different size, my machine is different, and Cog runs at very different speeds to the interpreter. With that in mind... t1 is nextLine over the sources file via StandardFileStream t2 is nextLine over the sources file via BufferedFileStream t3 is next over the sources file via StandardFileStream t4 is next over the sources file via BufferedFileStream Cog: an OrderedCollection(11101 836 9626 2306) Normalizing to the first measurement: 1.0 0.075 0.867 0.208 Your ratios are 1.0 0.206 4.827 0.678

I'd say BufferedFileStream is waaaaay faster :)

Impressive. I presume every Smalltalk message is accelerated while primitive call remain expensive...

...

P.S. your timing doit revealed a bug in Cog which is why it has taken a while to respond with the results :) The doit's temp names are encoded and appended to the method as extra bytes. The JIT wasn't ignoring these extra bytes, and your doit just happened to cause the JIT to follow a null pointer mistakenly scanning these extra bytes. So thank you :)

Oh, you discovered my secret for finding bugs: (bad) luck

Nicolas

...

On Wed, Nov 18, 2009 at 3:10 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:

...
I just gave a try to the BufferedFileStream. As usual, code is MIT. Implementation is rough, readOnly, partial (no support for basicNext crap & al), untested (certainly has bugs). Early timing experiments have shown a 5x to 7x speed up on [stream nextLine] and [stream next] micro benchmarks See class comment of attachment

Reminder: This bench is versus StandardFileStream. StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse! This still let some more room...

Integrating and testing a read/write version is a lot harder than this experiment, but we should really do it.

Nicolas

Eliot Miranda

11:04 p.m.

On Fri, Nov 27, 2009 at 1:49 PM, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:

...

2009/11/27 Eliot Miranda eliot.miranda@gmail.com:

...
Hi Nicholas, here are my timings from Cog. Only the ratios correspond since the source file is of a different size, my machine is different, and Cog runs

at

...
very different speeds to the interpreter. With that in mind... t1 is nextLine over the sources file via StandardFileStream t2 is nextLine over the sources file via BufferedFileStream t3 is next over the sources file via StandardFileStream t4 is next over the sources file via BufferedFileStream Cog: an OrderedCollection(11101 836 9626 2306) Normalizing to the first measurement: 1.0 0.075 0.867 0.208 Your ratios are 1.0 0.206 4.827 0.678

I'd say BufferedFileStream is waaaaay faster :)

Impressive. I presume every Smalltalk message is accelerated while primitive call remain expensive...

Exactly. Or rather, the primitives which aren't implemented in machine code are even slower to invoke from machine code than in the interpreter. Machine code primitives exist for SmallInteger + - / * // \ % > >= < <= = ~=, for Float + - * / > >= < <= = ~=, for Object == at: ByteString at: and for BlockClosure value[:value:value:value:]. Once I reimplement the object representation I'll be happy to implement Object>>at:put: ByteString>>at:put: Behavior>>basicNew & Behavior>>basicNew: which should result in another significant step in performance.

...

...
P.S. your timing doit revealed a bug in Cog which is why it has taken a while to respond with the results :) The doit's temp names are encoded

and

...
appended to the method as extra bytes. The JIT wasn't ignoring these

extra

...
bytes, and your doit just happened to cause the JIT to follow a null

pointer

...
mistakenly scanning these extra bytes. So thank you :)

Oh, you discovered my secret for finding bugs: (bad) luck

:) :)

...

Nicolas

...
On Wed, Nov 18, 2009 at 3:10 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:

...
I just gave a try to the BufferedFileStream. As usual, code is MIT. Implementation is rough, readOnly, partial (no support for basicNext crap & al), untested (certainly has bugs). Early timing experiments have shown a 5x to 7x speed up on [stream nextLine] and [stream next] micro benchmarks See class comment of attachment

Reminder: This bench is versus StandardFileStream. StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse! This still let some more room...

Integrating and testing a read/write version is a lot harder than this experiment, but we should really do it.

Nicolas

5281

Age (days ago)

5296

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

42 comments

11 participants

tags (0)

participants (11)

Andreas Raab
Colin Putney
Cyrille Delaunay
David T. Lewis
Diego Gomez Deck
Eliot Miranda
Igor Stasenko
Levente Uzonyi
merlyn＠stonehenge.com
Nicolas Cellier
Ralph Johnson