[squeak-dev] Faster FileStream experiments

Wed Nov 18 13:55:36 UTC 2009

2009/11/18 Igor Stasenko <siguctua at gmail.com>:
> Hello Nicolas,
> thanks for taking a time implementing this idea.
>
> Since you are going to introduce something more clever than simple-minded
> primitive based file operations, i think its worth to think about
> creating a separate classes
> for buffering/caching. Lets call it readStrategy, or writeStrategy or
> cacheStrategy.
> The idea is to redirect all read/write/seek operations to special layer, which
> depending on implementation could choose, if given operation will be
> just dumb primitive call,
> or something more clever, like read-ahead etc.
> So, then all streams (not only file stream) could be created using
> choosen strategy
> depending on user's will.
>

Yes, delegating is a very good idea.
Quite sure other smalltalks do that already (I did not want to be
tainted, so just kept away, reinventing my own wheel).
This trial was a minimal proof of concept, it cannot decently pretend
being a clean rewrite.

> About BufferedFileStream implementation. There are some room for improvement:
> cache should remember own starting position + size
> then at #skip: you simply doing
>  self primSetPosition: fileID to: filePosition \\ bufferSize.
> but not touching the buffer, because you can't predict what next
> operation is follows (it can be another #skip: or truncate or close),
> which makes your read-ahead redundant.
>
> The cache should be refreshed only on direct read request, when some
> data which needs to be read
> is ouside the range covered by cache.
> Let me illustrate the case, which shows the suboptimal #skip: behavior:
>
> ........>........[..........<..........]........
>
> Here, [ ] is enclosed cached data,
> and > is file position, after #skip: send.
> Then caller wants to read bytes up to < marker.
> In your case, #skip: will refresh cache, causing part of data which
> was already in buffer to be re-read again,
> while it is possible to reuse already cached data, and read only bytes
> between  > and [ ,
> and rest can be delivered from cache.
> Also, since after read request, a file pointer will point at < marker,
> we are still inside a cache, and don't need to refresh it.
>

Agree, my current buffer implementation is not lazy enough.
It does read ahead before knowing if really necessary :(

If I understand it, you would avoid throwing the buffer away until you
are sure it won't be reused.
Not sure if the use cases are worth the subtle complications. Two
consecutive skip: should be rare...
Anyway, all these tricks should better be hidden in a private policy
Object indeed, otherwise future subclasses which would inevitably
flourish under BufferedFileStream (the Squeak entropy) might well
break this masterpiece :)

Cheers

Nicolas

>
> 2009/11/18 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>> I just gave a try to the BufferedFileStream.
>> As usual, code is MIT.
>> Implementation is rough, readOnly, partial (no support for basicNext
>> crap & al), untested (certainly has bugs).
>> Early timing experiments have shown a 5x to 7x speed up on [stream
>> nextLine] and [stream next] micro benchmarks
>> See class comment of attachment
>>
>> Reminder: This bench is versus StandardFileStream.
>> StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse!
>> This still let some more room...
>>
>> Integrating and testing a read/write version is a lot harder than this
>> experiment, but we should really do it.
>>
>> Nicolas
>>
>>
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>