[squeak-dev] Faster FileStream experiments

Igor Stasenko siguctua at gmail.com
Wed Nov 18 16:01:35 UTC 2009


2009/11/18 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
> 2009/11/18 Igor Stasenko <siguctua at gmail.com>:
>> Hello Nicolas,
>> thanks for taking a time implementing this idea.
>>
>> Since you are going to introduce something more clever than simple-minded
>> primitive based file operations, i think its worth to think about
>> creating a separate classes
>> for buffering/caching. Lets call it readStrategy, or writeStrategy or
>> cacheStrategy.
>> The idea is to redirect all read/write/seek operations to special layer, which
>> depending on implementation could choose, if given operation will be
>> just dumb primitive call,
>> or something more clever, like read-ahead etc.
>> So, then all streams (not only file stream) could be created using
>> choosen strategy
>> depending on user's will.
>>
>
> Yes, delegating is a very good idea.
> Quite sure other smalltalks do that already (I did not want to be
> tainted, so just kept away, reinventing my own wheel).
> This trial was a minimal proof of concept, it cannot decently pretend
> being a clean rewrite.
>

but it shown us the potential for improvements.
Seriously, 5x-7x speedup is not something which we can just forget and
throw away.

>> About BufferedFileStream implementation. There are some room for improvement:
>> cache should remember own starting position + size
>> then at #skip: you simply doing
>>  self primSetPosition: fileID to: filePosition \\ bufferSize.
>> but not touching the buffer, because you can't predict what next
>> operation is follows (it can be another #skip: or truncate or close),
>> which makes your read-ahead redundant.
>>
>> The cache should be refreshed only on direct read request, when some
>> data which needs to be read
>> is ouside the range covered by cache.
>> Let me illustrate the case, which shows the suboptimal #skip: behavior:
>>
>> ........>........[..........<..........]........
>>
>> Here, [ ] is enclosed cached data,
>> and > is file position, after #skip: send.
>> Then caller wants to read bytes up to < marker.
>> In your case, #skip: will refresh cache, causing part of data which
>> was already in buffer to be re-read again,
>> while it is possible to reuse already cached data, and read only bytes
>> between  > and [ ,
>> and rest can be delivered from cache.
>> Also, since after read request, a file pointer will point at < marker,
>> we are still inside a cache, and don't need to refresh it.
>>
>
> Agree, my current buffer implementation is not lazy enough.
> It does read ahead before knowing if really necessary :(
>
> If I understand it, you would avoid throwing the buffer away until you
> are sure it won't be reused.
> Not sure if the use cases are worth the subtle complications. Two
> consecutive skip: should be rare...

yes, it is rare and quite unlikely, but you catched my intent clearly:
 - do not throw away the buffer unless its deem necessary.

Lets keep in mind that any memory operation is orders of magnitude
faster than disk operations,
moreover, a filesystem could be remotely mounted drive which adds even
more latency for all file-based operations.
So, fighting with it using cache, is good strategy.

> Anyway, all these tricks should better be hidden in a private policy
> Object indeed, otherwise future subclasses which would inevitably
> flourish under BufferedFileStream (the Squeak entropy) might well
> break this masterpiece :)
>

Right. A separate layer is for making a clean room for experiments,
without need of rewriting a whole stream class hierarchy,
especially subclasses, where things start exploding exponentially.
There should be a very thin layer, based on most simple operations:
read, write, seek , while rest of stream interface is based on that.
So, if we can identify this thin layer and make it pluggable, then we
can be sure that at least some part of stream library can be easily
customized, and if this part works well, so we can be sure streams in
good shape, without need of visiting and testing numerous methods in
multiple (sub)classes, which is quite messy.


> Cheers
>
> Nicolas
>



-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list