[squeak-dev] The Inbox: Files-cmm.125.mcz

Chris Muller ma.chris.m at gmail.com
Sun Jun 30 19:45:03 UTC 2013


Ah, thank you!  Now I understand Levente's comments.

Levente, that is an excellent point about directoryTreeDo: traversing
depth-first.  In fact that makes me want to change it to be
breadth-first..  I'm trying to think about all the use-cases it serves
and whether that would make a difference; since we have the path,
maybe not..

On Sun, Jun 30, 2013 at 1:29 PM, Frank Shearar <frank.shearar at gmail.com> wrote:
> On 30 June 2013 18:19, Chris Muller <asqueaker at gmail.com> wrote:
>> Hi Levente, I do not know what you mean by DFS and BFS.
>
> Depth-first versus breadth-first search.
>
> frank
>
>>  My claim that
>> statsForDirectoryTree: "does one useless thing, and does it
>> inefficiently" is commentary about the _design_, not the
>> implementation, because it forces me to enumerate an entire directory
>> even if I don't want to.  For example, what if I want to exclude a
>> ".temp" directory's stats from its containing directory's stats?  To
>> do that, I'd have to run #statsForDirectoryTree: twice.  That's not
>> going to be faster..
>>
>> On Sat, Jun 29, 2013 at 5:15 PM, Levente Uzonyi <leves at elte.hu> wrote:
>>> On Fri, 28 Jun 2013, commits at source.squeak.org wrote:
>>>
>>>> A new version of Files was added to project The Inbox:
>>>> http://source.squeak.org/inbox/Files-cmm.125.mcz
>>>>
>>>> ==================== Summary ====================
>>>>
>>>> Name: Files-cmm.125
>>>> Author: cmm
>>>> Time: 27 June 2013, 7:24:33.37 pm
>>>> UUID: 0b00b2c4-3430-442b-a924-72040ff43d73
>>>> Ancestors: Files-fbs.124
>>>>
>>>> FileDirectory>>statsForDirectoryTree: is a prime example of FileDirectory
>>>> living up to its poor reputation.  It does one useless thing, and does it
>>>> inefficiently.  It has no senders, deservedly.
>>>>         Tree-walking operations like gathering stats can be written in
>>>> apps more generally using directoryTreeDo:.
>>>
>>>
>>> I wouldn't say it's inefficient. It uses BFS (instead of DFS, which is used
>>> by #directoryTreeDo:entries:), which is somewhat better than DFS, because of
>>> the way FilePlugin works: you have to iterate over the contents of the whole
>>> directory if you want to it to be efficient.
>>> If you use DFS, then the contents of all parent directories up to the root
>>> will have to be stored in memory during the iteration, while if you use BFS,
>>> then only the sibling directories (without their contents) or the siblings's
>>> parents will be in memory.
>>>
>>> I made two "new" implementations of #statsForDirectoryTree:. This one
>>> optimizes memory usage for BFS:
>>>
>>> statsForDirectoryTree2: rootedPathName
>>>
>>>         | dirs files bytes todo |
>>>
>>>         dirs := files := bytes := 0.
>>>         todo := OrderedCollection with: rootedPathName.
>>>         [ todo isEmpty ] whileFalse: [
>>>                 | p |
>>>                 p := todo removeFirst.
>>>                 self directoryContentsFor: p do: [ :entry |
>>>                         entry isDirectory
>>>                                 ifTrue: [
>>>
>>>                                         todo addLast: p , self
>>> pathNameDelimiter asString , entry name.
>>>                                         dirs := dirs + 1]
>>>                                 ifFalse: [
>>>
>>>                                         files := files + 1.
>>>                                         bytes := bytes + entry fileSize ] ]
>>> ].
>>>         ^{ dirs. files. bytes }
>>>
>>> This one uses #directoryTreeDo::
>>>
>>> statsForDirectoryTree3: rootedPathName
>>>
>>>         | dirs files bytes |
>>>
>>>         dirs := files := bytes := 0.
>>>         (FileDirectory on: rootedPathName) directoryTreeDo: [ :path |
>>>                 | entry |
>>>                 entry := path last.
>>>                 entry isDirectory
>>>                         ifTrue: [ dirs := dirs + 1 ]
>>>                         ifFalse: [
>>>
>>>                                 files := files + 1.
>>>                                 bytes := bytes + entry fileSize ] ].
>>>         ^{ dirs. files. bytes }
>>>
>>> Let's see the numbers:
>>>
>>> "Make sure the disk won't be accessed"
>>> FileDirectory default statsForDirectoryTree: '/usr/'.
>>> FileDirectory default statsForDirectoryTree: '/usr/'.
>>> "Run the benchmark"
>>> #(statsForDirectoryTree: statsForDirectoryTree2: statsForDirectoryTree3:)
>>> collect: [ :each |
>>>         each -> ((1 to: 5) collect: [ :run |
>>>                 Smalltalk garbageCollect.
>>>                 [ FileDirectory default perform: each with: '/usr/' ]
>>> timeToRun ]) ].
>>>
>>> {
>>>         #statsForDirectoryTree:->#(6340 6300 6430 6316 6228) .
>>>         #statsForDirectoryTree2:->#(5918 6072 6122 6296 6306) .
>>>         #statsForDirectoryTree3:->#(8576 8374 8470 8506 8228) }
>>>
>>> It's a bit noisy (i was too lazy to exit other programs), but the one using
>>> #directoryTreeDo: seems to be clearly slower than the existing algorithm
>>> using BFS.
>>>
>>>
>>> Levente
>>>
>>>
>>>>
>>>> =============== Diff against Files-fbs.124 ===============
>>>>
>>>> Item was changed:
>>>> + ----- Method: FileDirectory>>directoryTreeDo:entries: (in category
>>>> 'private') -----
>>>> - ----- Method: FileDirectory>>directoryTreeDo:entries: (in category
>>>> 'enumeration') -----
>>>>  directoryTreeDo: oneArgBlock entries: entriesCollection
>>>>         "Value oneArgBlock with the path (an OrderedCollection of
>>>> FileDirectory's) to each DirectoryEntry and the DirectoryEntry itself."
>>>>         self entries do:
>>>>                 [ : each |
>>>>                 entriesCollection add: each.
>>>>                 oneArgBlock value: entriesCollection.
>>>>                 each isDirectory ifTrue:
>>>>                         [ | subdir |
>>>>                         subdir := each asFileDirectory.
>>>>                         subdir
>>>>                                 directoryTreeDo: oneArgBlock
>>>>                                 entries: entriesCollection ].
>>>>                 entriesCollection removeLast ]!
>>>>
>>>> Item was removed:
>>>> - ----- Method: FileDirectory>>statsForDirectoryTree: (in category
>>>> 'enumeration') -----
>>>> - statsForDirectoryTree: rootedPathName
>>>> -       "Return the size statistics for the entire directory tree starting
>>>> at the given root. The result is a three element array of the form: (<number
>>>> of folders><number of files><total bytes in all files>). This method also
>>>> serves as an example of how recursively enumerate a directory tree."
>>>> -       "FileDirectory default statsForDirectoryTree: '\smalltalk'"
>>>> -
>>>> -       | dirs files bytes todo entries p |
>>>> -       dirs := files := bytes := 0.
>>>> -       todo := OrderedCollection with: rootedPathName.
>>>> -       [todo isEmpty] whileFalse: [
>>>> -               p := todo removeFirst.
>>>> -               entries := self directoryContentsFor: p.
>>>> -               entries do: [:entry |
>>>> -                       entry isDirectory
>>>> -                               ifTrue: [
>>>> -                                       todo addLast: p , self
>>>> pathNameDelimiter asString , entry name.
>>>> -                                       dirs := dirs + 1]
>>>> -                               ifFalse: [
>>>> -                                       files := files + 1.
>>>> -                                       bytes := bytes + entry
>>>> fileSize]]].
>>>> -       ^ Array with: dirs with: files with: bytes
>>>> - !
>>>>
>>>>
>>>>
>>>
>>


More information about the Squeak-dev mailing list