[squeak-dev] The Inbox: Files-cmm.125.mcz

Levente Uzonyi leves at elte.hu
Thu Jul 4 17:52:38 UTC 2013


On Thu, 4 Jul 2013, Chris Muller wrote:

>>> Good discussion thanks Levente.  You know, after more thinking I had
>>> trouble understanding what is the characteristic of the FilePlugin
>>> primitives that leads to better performance in a BFS?
>>
>> With the current DFS implementation (you use FileDirectory >> #entries from
>> FileDirectory #directoryTreeDo:entries: which creates an Array with a
>> DirectoryEntry object for all files in that directory) you keep all
>> DirectoryEntry instances in the memory up to the root during the iteration.
>> With BFS you don't have to do that, because once you start processing a
>> directory, you won't start another before you finish the current one.
>> You could avoid storing the entries in memory with DFS too - by using
>> FileDirectory >> #entriesDo: from Files-ul.125, but that will potentially
>> expose the bottleneck present in the FilePlugin. Here's a small
>> demonstration:
>>
>> path := '/usr/bin'.
>> "Single process using the FilePlugin"
>> [ (FileDirectory on: path) directoryTreeDo: [ :each | Processor yield ] ]
>> timeToRun.
>> "1208"
>>
>> "2 processes using the FilePlugin"
>> [
>>         (FileDirectory on: path) directoryTreeDo: [ :each |
>>                 Processor yield ] ] fork.
>> [ (FileDirectory on: path) directoryTreeDo: [ :each |
>>                 Processor yield ] ] timeToRun.
>> "56716"
>
> Amazing.
>
>> What's happening?
>> Opening a directory and getting the nth element of it is not cheap, but
>> that's what #primLookupEntryIn:index: offers. To speed up things, FilePlugin
>> stores the last opened directory and the index of the last accessed file in
>> it. If you ask for the next file from the same directory, then it'll reuse
>> the stored directory, and everything will seem fast. But if you open another
>> directory, or ask for any file other than the next one from the same
>> directory, then it'll be much slower.
>>
>> For example asking for the files from a directory, but in reversed order
>> will take O(n*n) time, where n is the number of files in the directory.
>
> I didn't follow your BFS version of directoryTreeDo:, but I did bench
> it.  Given the above, shouldn't it have performed much faster than the
> DFS version?

No, because it uses #entries, which fetches all the files from the 
directory.

>
>>> The only use-case I can think of would be where the doBlock of
>>> directoryTreeDo: could be short-circuited by a non-local return, such
>>
>> I don't see how this is related to BFS vs DFS, or my suggestion to change
>> the argument of directoryTreeDo:.
>>
>>> as a detect operation.  But in a stats operation, it seems like the
>>> same requests to FilePlugin are made in each case.  That is, every
>>> subdirectory gets read just once via the #entries primitive in either
>>> case BFS or DFS..
>>
>> Indeed, the difference is in the memory usage.
>
> Agreed -- but I was just trying to relate BFS vs. DFS to usage of
> FilePlugin.  It sounds like by DFS asking for other than the very next
> file, it's tripping up the inefficiency of FilePlugin, is that right?

Exactly.

>
>>> But if the whole sub-tree is going to be accessed it seems like a
>>> wash, and indeed I'm getting pretty close times between my and your
>>> version of directoryTreeDo::
>>>
>>> |dir| dir:= FileDirectory default containingDirectory containingDirectory.
>>> [dir directoryTreeDo: [ : e |  ]] timeToRun
>>>
>>>  (BFS) "444"
>>>  (DFS) "428"
>>
>> Did you running the benchmark on /usr/ from
>> http://lists.squeakfoundation.org/pipermail/squeak-dev/2013-June/171827.html
>> ?
>
> Yes, and I got similar results as you, but that is a benchmark for the
> stats operation only, not directoryTreeDo:.  I think that stats
> operation which answers a 3-element Array of Array's is too-specific
> to be useful.  I'm interested in directoryTreeDo: to be fast and
> efficient so I can use it for stats if I want, or other things.  But I
> did not get a significant difference in execution performance between
> your BFS and my DFS versions.  Although we know the DFS will use more
> memory, I'm puzzled why I did not get a big boost by your BFS version
> of it given the inefficiency of FilePlugin..
>

Because of the use of #entries. It fetches all the files in the current 
directory, so it avoids the bottleneck of FilePlugin.


Levente


More information about the Squeak-dev mailing list