2009/11/18 Igor Stasenko siguctua@gmail.com:
Hello Nicolas, thanks for taking a time implementing this idea.
Since you are going to introduce something more clever than simple-minded primitive based file operations, i think its worth to think about creating a separate classes for buffering/caching. Lets call it readStrategy, or writeStrategy or cacheStrategy. The idea is to redirect all read/write/seek operations to special layer, which depending on implementation could choose, if given operation will be just dumb primitive call, or something more clever, like read-ahead etc. So, then all streams (not only file stream) could be created using choosen strategy depending on user's will.
Yes, delegating is a very good idea. Quite sure other smalltalks do that already (I did not want to be tainted, so just kept away, reinventing my own wheel). This trial was a minimal proof of concept, it cannot decently pretend being a clean rewrite.
About BufferedFileStream implementation. There are some room for improvement: cache should remember own starting position + size then at #skip: you simply doing self primSetPosition: fileID to: filePosition \ bufferSize. but not touching the buffer, because you can't predict what next operation is follows (it can be another #skip: or truncate or close), which makes your read-ahead redundant.
The cache should be refreshed only on direct read request, when some data which needs to be read is ouside the range covered by cache. Let me illustrate the case, which shows the suboptimal #skip: behavior:
........>........[..........<..........]........
Here, [ ] is enclosed cached data, and > is file position, after #skip: send. Then caller wants to read bytes up to < marker. In your case, #skip: will refresh cache, causing part of data which was already in buffer to be re-read again, while it is possible to reuse already cached data, and read only bytes between > and [ , and rest can be delivered from cache. Also, since after read request, a file pointer will point at < marker, we are still inside a cache, and don't need to refresh it.
Agree, my current buffer implementation is not lazy enough. It does read ahead before knowing if really necessary :(
If I understand it, you would avoid throwing the buffer away until you are sure it won't be reused. Not sure if the use cases are worth the subtle complications. Two consecutive skip: should be rare... Anyway, all these tricks should better be hidden in a private policy Object indeed, otherwise future subclasses which would inevitably flourish under BufferedFileStream (the Squeak entropy) might well break this masterpiece :)
Cheers
Nicolas
2009/11/18 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
I just gave a try to the BufferedFileStream. As usual, code is MIT. Implementation is rough, readOnly, partial (no support for basicNext crap & al), untested (certainly has bugs). Early timing experiments have shown a 5x to 7x speed up on [stream nextLine] and [stream next] micro benchmarks See class comment of attachment
Reminder: This bench is versus StandardFileStream. StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse! This still let some more room...
Integrating and testing a read/write version is a lot harder than this experiment, but we should really do it.
Nicolas
-- Best regards, Igor Stasenko AKA sig.