[squeak-dev] The Inbox: Files-cmm.125.mcz

Chris Muller ma.chris.m at gmail.com
Thu Jul 4 16:26:05 UTC 2013


>> Good discussion thanks Levente.  You know, after more thinking I had
>> trouble understanding what is the characteristic of the FilePlugin
>> primitives that leads to better performance in a BFS?
>
> With the current DFS implementation (you use FileDirectory >> #entries from
> FileDirectory #directoryTreeDo:entries: which creates an Array with a
> DirectoryEntry object for all files in that directory) you keep all
> DirectoryEntry instances in the memory up to the root during the iteration.
> With BFS you don't have to do that, because once you start processing a
> directory, you won't start another before you finish the current one.
> You could avoid storing the entries in memory with DFS too - by using
> FileDirectory >> #entriesDo: from Files-ul.125, but that will potentially
> expose the bottleneck present in the FilePlugin. Here's a small
> demonstration:
>
> path := '/usr/bin'.
> "Single process using the FilePlugin"
> [ (FileDirectory on: path) directoryTreeDo: [ :each | Processor yield ] ]
> timeToRun.
> "1208"
>
> "2 processes using the FilePlugin"
> [
>         (FileDirectory on: path) directoryTreeDo: [ :each |
>                 Processor yield ] ] fork.
> [ (FileDirectory on: path) directoryTreeDo: [ :each |
>                 Processor yield ] ] timeToRun.
> "56716"

Amazing.

> What's happening?
> Opening a directory and getting the nth element of it is not cheap, but
> that's what #primLookupEntryIn:index: offers. To speed up things, FilePlugin
> stores the last opened directory and the index of the last accessed file in
> it. If you ask for the next file from the same directory, then it'll reuse
> the stored directory, and everything will seem fast. But if you open another
> directory, or ask for any file other than the next one from the same
> directory, then it'll be much slower.
>
> For example asking for the files from a directory, but in reversed order
> will take O(n*n) time, where n is the number of files in the directory.

I didn't follow your BFS version of directoryTreeDo:, but I did bench
it.  Given the above, shouldn't it have performed much faster than the
DFS version?

>> The only use-case I can think of would be where the doBlock of
>> directoryTreeDo: could be short-circuited by a non-local return, such
>
> I don't see how this is related to BFS vs DFS, or my suggestion to change
> the argument of directoryTreeDo:.
>
>> as a detect operation.  But in a stats operation, it seems like the
>> same requests to FilePlugin are made in each case.  That is, every
>> subdirectory gets read just once via the #entries primitive in either
>> case BFS or DFS..
>
> Indeed, the difference is in the memory usage.

Agreed -- but I was just trying to relate BFS vs. DFS to usage of
FilePlugin.  It sounds like by DFS asking for other than the very next
file, it's tripping up the inefficiency of FilePlugin, is that right?

>> But if the whole sub-tree is going to be accessed it seems like a
>> wash, and indeed I'm getting pretty close times between my and your
>> version of directoryTreeDo::
>>
>> |dir| dir:= FileDirectory default containingDirectory containingDirectory.
>> [dir directoryTreeDo: [ : e |  ]] timeToRun
>>
>>  (BFS) "444"
>>  (DFS) "428"
>
> Did you running the benchmark on /usr/ from
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2013-June/171827.html
> ?

Yes, and I got similar results as you, but that is a benchmark for the
stats operation only, not directoryTreeDo:.  I think that stats
operation which answers a 3-element Array of Array's is too-specific
to be useful.  I'm interested in directoryTreeDo: to be fast and
efficient so I can use it for stats if I want, or other things.  But I
did not get a significant difference in execution performance between
your BFS and my DFS versions.  Although we know the DFS will use more
memory, I'm puzzled why I did not get a big boost by your BFS version
of it given the inefficiency of FilePlugin..


More information about the Squeak-dev mailing list