[squeak-dev] The Inbox: Files-cmm.125.mcz

Chris Muller asqueaker at gmail.com
Sun Jun 30 17:19:53 UTC 2013


Hi Levente, I do not know what you mean by DFS and BFS.  My claim that
statsForDirectoryTree: "does one useless thing, and does it
inefficiently" is commentary about the _design_, not the
implementation, because it forces me to enumerate an entire directory
even if I don't want to.  For example, what if I want to exclude a
".temp" directory's stats from its containing directory's stats?  To
do that, I'd have to run #statsForDirectoryTree: twice.  That's not
going to be faster..

On Sat, Jun 29, 2013 at 5:15 PM, Levente Uzonyi <leves at elte.hu> wrote:
> On Fri, 28 Jun 2013, commits at source.squeak.org wrote:
>
>> A new version of Files was added to project The Inbox:
>> http://source.squeak.org/inbox/Files-cmm.125.mcz
>>
>> ==================== Summary ====================
>>
>> Name: Files-cmm.125
>> Author: cmm
>> Time: 27 June 2013, 7:24:33.37 pm
>> UUID: 0b00b2c4-3430-442b-a924-72040ff43d73
>> Ancestors: Files-fbs.124
>>
>> FileDirectory>>statsForDirectoryTree: is a prime example of FileDirectory
>> living up to its poor reputation.  It does one useless thing, and does it
>> inefficiently.  It has no senders, deservedly.
>>         Tree-walking operations like gathering stats can be written in
>> apps more generally using directoryTreeDo:.
>
>
> I wouldn't say it's inefficient. It uses BFS (instead of DFS, which is used
> by #directoryTreeDo:entries:), which is somewhat better than DFS, because of
> the way FilePlugin works: you have to iterate over the contents of the whole
> directory if you want to it to be efficient.
> If you use DFS, then the contents of all parent directories up to the root
> will have to be stored in memory during the iteration, while if you use BFS,
> then only the sibling directories (without their contents) or the siblings's
> parents will be in memory.
>
> I made two "new" implementations of #statsForDirectoryTree:. This one
> optimizes memory usage for BFS:
>
> statsForDirectoryTree2: rootedPathName
>
>         | dirs files bytes todo |
>
>         dirs := files := bytes := 0.
>         todo := OrderedCollection with: rootedPathName.
>         [ todo isEmpty ] whileFalse: [
>                 | p |
>                 p := todo removeFirst.
>                 self directoryContentsFor: p do: [ :entry |
>                         entry isDirectory
>                                 ifTrue: [
>
>                                         todo addLast: p , self
> pathNameDelimiter asString , entry name.
>                                         dirs := dirs + 1]
>                                 ifFalse: [
>
>                                         files := files + 1.
>                                         bytes := bytes + entry fileSize ] ]
> ].
>         ^{ dirs. files. bytes }
>
> This one uses #directoryTreeDo::
>
> statsForDirectoryTree3: rootedPathName
>
>         | dirs files bytes |
>
>         dirs := files := bytes := 0.
>         (FileDirectory on: rootedPathName) directoryTreeDo: [ :path |
>                 | entry |
>                 entry := path last.
>                 entry isDirectory
>                         ifTrue: [ dirs := dirs + 1 ]
>                         ifFalse: [
>
>                                 files := files + 1.
>                                 bytes := bytes + entry fileSize ] ].
>         ^{ dirs. files. bytes }
>
> Let's see the numbers:
>
> "Make sure the disk won't be accessed"
> FileDirectory default statsForDirectoryTree: '/usr/'.
> FileDirectory default statsForDirectoryTree: '/usr/'.
> "Run the benchmark"
> #(statsForDirectoryTree: statsForDirectoryTree2: statsForDirectoryTree3:)
> collect: [ :each |
>         each -> ((1 to: 5) collect: [ :run |
>                 Smalltalk garbageCollect.
>                 [ FileDirectory default perform: each with: '/usr/' ]
> timeToRun ]) ].
>
> {
>         #statsForDirectoryTree:->#(6340 6300 6430 6316 6228) .
>         #statsForDirectoryTree2:->#(5918 6072 6122 6296 6306) .
>         #statsForDirectoryTree3:->#(8576 8374 8470 8506 8228) }
>
> It's a bit noisy (i was too lazy to exit other programs), but the one using
> #directoryTreeDo: seems to be clearly slower than the existing algorithm
> using BFS.
>
>
> Levente
>
>
>>
>> =============== Diff against Files-fbs.124 ===============
>>
>> Item was changed:
>> + ----- Method: FileDirectory>>directoryTreeDo:entries: (in category
>> 'private') -----
>> - ----- Method: FileDirectory>>directoryTreeDo:entries: (in category
>> 'enumeration') -----
>>  directoryTreeDo: oneArgBlock entries: entriesCollection
>>         "Value oneArgBlock with the path (an OrderedCollection of
>> FileDirectory's) to each DirectoryEntry and the DirectoryEntry itself."
>>         self entries do:
>>                 [ : each |
>>                 entriesCollection add: each.
>>                 oneArgBlock value: entriesCollection.
>>                 each isDirectory ifTrue:
>>                         [ | subdir |
>>                         subdir := each asFileDirectory.
>>                         subdir
>>                                 directoryTreeDo: oneArgBlock
>>                                 entries: entriesCollection ].
>>                 entriesCollection removeLast ]!
>>
>> Item was removed:
>> - ----- Method: FileDirectory>>statsForDirectoryTree: (in category
>> 'enumeration') -----
>> - statsForDirectoryTree: rootedPathName
>> -       "Return the size statistics for the entire directory tree starting
>> at the given root. The result is a three element array of the form: (<number
>> of folders><number of files><total bytes in all files>). This method also
>> serves as an example of how recursively enumerate a directory tree."
>> -       "FileDirectory default statsForDirectoryTree: '\smalltalk'"
>> -
>> -       | dirs files bytes todo entries p |
>> -       dirs := files := bytes := 0.
>> -       todo := OrderedCollection with: rootedPathName.
>> -       [todo isEmpty] whileFalse: [
>> -               p := todo removeFirst.
>> -               entries := self directoryContentsFor: p.
>> -               entries do: [:entry |
>> -                       entry isDirectory
>> -                               ifTrue: [
>> -                                       todo addLast: p , self
>> pathNameDelimiter asString , entry name.
>> -                                       dirs := dirs + 1]
>> -                               ifFalse: [
>> -                                       files := files + 1.
>> -                                       bytes := bytes + entry
>> fileSize]]].
>> -       ^ Array with: dirs with: files with: bytes
>> - !
>>
>>
>>
>


More information about the Squeak-dev mailing list