Re: [squeak-dev] The Inbox: Files-cmm.125.mcz

4 Jul 2013


      ...
...
Good discussion thanks Levente.  You know, after more thinking I had
trouble understanding what is the characteristic of the FilePlugin
primitives that leads to better performance in a BFS?
With the current DFS implementation (you use FileDirectory >> #entries from
FileDirectory #directoryTreeDo:entries: which creates an Array with a
DirectoryEntry object for all files in that directory) you keep all
DirectoryEntry instances in the memory up to the root during the iteration.
With BFS you don't have to do that, because once you start processing a
directory, you won't start another before you finish the current one.
You could avoid storing the entries in memory with DFS too - by using
FileDirectory >> #entriesDo: from Files-ul.125, but that will potentially
expose the bottleneck present in the FilePlugin. Here's a small
demonstration:
path := '/usr/bin'.
"Single process using the FilePlugin"
[ (FileDirectory on: path) directoryTreeDo: [ :each | Processor yield ] ]
timeToRun.
"1208"
"2 processes using the FilePlugin"
[
        (FileDirectory on: path) directoryTreeDo: [ :each |
                Processor yield ] ] fork.
[ (FileDirectory on: path) directoryTreeDo: [ :each |
                Processor yield ] ] timeToRun.
"56716"
Amazing.
...
What's happening?
Opening a directory and getting the nth element of it is not cheap, but
that's what #primLookupEntryIn:index: offers. To speed up things, FilePlugin
stores the last opened directory and the index of the last accessed file in
it. If you ask for the next file from the same directory, then it'll reuse
the stored directory, and everything will seem fast. But if you open another
directory, or ask for any file other than the next one from the same
directory, then it'll be much slower.
For example asking for the files from a directory, but in reversed order
will take O(n*n) time, where n is the number of files in the directory.
I didn't follow your BFS version of directoryTreeDo:, but I did bench
it.  Given the above, shouldn't it have performed much faster than the
DFS version?
...
...
The only use-case I can think of would be where the doBlock of
directoryTreeDo: could be short-circuited by a non-local return, such
I don't see how this is related to BFS vs DFS, or my suggestion to change
the argument of directoryTreeDo:.
...
as a detect operation.  But in a stats operation, it seems like the
same requests to FilePlugin are made in each case.  That is, every
subdirectory gets read just once via the #entries primitive in either
case BFS or DFS..
Indeed, the difference is in the memory usage.
Agreed -- but I was just trying to relate BFS vs. DFS to usage of
FilePlugin.  It sounds like by DFS asking for other than the very next
file, it's tripping up the inefficiency of FilePlugin, is that right?
...
...
But if the whole sub-tree is going to be accessed it seems like a
wash, and indeed I'm getting pretty close times between my and your
version of directoryTreeDo::
|dir| dir:= FileDirectory default containingDirectory containingDirectory.
[dir directoryTreeDo: [ : e |  ]] timeToRun
(BFS) "444"
 (DFS) "428"
Did you running the benchmark on /usr/ from
http://lists.squeakfoundation.org/pipermail/squeak-dev/2013-June/171827.html
?
Yes, and I got similar results as you, but that is a benchmark for the
stats operation only, not directoryTreeDo:.  I think that stats
operation which answers a 3-element Array of Array's is too-specific
to be useful.  I'm interested in directoryTreeDo: to be fast and
efficient so I can use it for stats if I want, or other things.  But I
did not get a significant difference in execution performance between
your BFS and my DFS versions.  Although we know the DFS will use more
memory, I'm puzzled why I did not get a big boost by your BFS version
of it given the inefficiency of FilePlugin..