[squeak-dev] The Inbox: Files-cmm.125.mcz
Frank Shearar
frank.shearar at gmail.com
Sun Jun 30 18:29:32 UTC 2013
On 30 June 2013 18:19, Chris Muller <asqueaker at gmail.com> wrote:
> Hi Levente, I do not know what you mean by DFS and BFS.
Depth-first versus breadth-first search.
frank
> My claim that
> statsForDirectoryTree: "does one useless thing, and does it
> inefficiently" is commentary about the _design_, not the
> implementation, because it forces me to enumerate an entire directory
> even if I don't want to. For example, what if I want to exclude a
> ".temp" directory's stats from its containing directory's stats? To
> do that, I'd have to run #statsForDirectoryTree: twice. That's not
> going to be faster..
>
> On Sat, Jun 29, 2013 at 5:15 PM, Levente Uzonyi <leves at elte.hu> wrote:
>> On Fri, 28 Jun 2013, commits at source.squeak.org wrote:
>>
>>> A new version of Files was added to project The Inbox:
>>> http://source.squeak.org/inbox/Files-cmm.125.mcz
>>>
>>> ==================== Summary ====================
>>>
>>> Name: Files-cmm.125
>>> Author: cmm
>>> Time: 27 June 2013, 7:24:33.37 pm
>>> UUID: 0b00b2c4-3430-442b-a924-72040ff43d73
>>> Ancestors: Files-fbs.124
>>>
>>> FileDirectory>>statsForDirectoryTree: is a prime example of FileDirectory
>>> living up to its poor reputation. It does one useless thing, and does it
>>> inefficiently. It has no senders, deservedly.
>>> Tree-walking operations like gathering stats can be written in
>>> apps more generally using directoryTreeDo:.
>>
>>
>> I wouldn't say it's inefficient. It uses BFS (instead of DFS, which is used
>> by #directoryTreeDo:entries:), which is somewhat better than DFS, because of
>> the way FilePlugin works: you have to iterate over the contents of the whole
>> directory if you want to it to be efficient.
>> If you use DFS, then the contents of all parent directories up to the root
>> will have to be stored in memory during the iteration, while if you use BFS,
>> then only the sibling directories (without their contents) or the siblings's
>> parents will be in memory.
>>
>> I made two "new" implementations of #statsForDirectoryTree:. This one
>> optimizes memory usage for BFS:
>>
>> statsForDirectoryTree2: rootedPathName
>>
>> | dirs files bytes todo |
>>
>> dirs := files := bytes := 0.
>> todo := OrderedCollection with: rootedPathName.
>> [ todo isEmpty ] whileFalse: [
>> | p |
>> p := todo removeFirst.
>> self directoryContentsFor: p do: [ :entry |
>> entry isDirectory
>> ifTrue: [
>>
>> todo addLast: p , self
>> pathNameDelimiter asString , entry name.
>> dirs := dirs + 1]
>> ifFalse: [
>>
>> files := files + 1.
>> bytes := bytes + entry fileSize ] ]
>> ].
>> ^{ dirs. files. bytes }
>>
>> This one uses #directoryTreeDo::
>>
>> statsForDirectoryTree3: rootedPathName
>>
>> | dirs files bytes |
>>
>> dirs := files := bytes := 0.
>> (FileDirectory on: rootedPathName) directoryTreeDo: [ :path |
>> | entry |
>> entry := path last.
>> entry isDirectory
>> ifTrue: [ dirs := dirs + 1 ]
>> ifFalse: [
>>
>> files := files + 1.
>> bytes := bytes + entry fileSize ] ].
>> ^{ dirs. files. bytes }
>>
>> Let's see the numbers:
>>
>> "Make sure the disk won't be accessed"
>> FileDirectory default statsForDirectoryTree: '/usr/'.
>> FileDirectory default statsForDirectoryTree: '/usr/'.
>> "Run the benchmark"
>> #(statsForDirectoryTree: statsForDirectoryTree2: statsForDirectoryTree3:)
>> collect: [ :each |
>> each -> ((1 to: 5) collect: [ :run |
>> Smalltalk garbageCollect.
>> [ FileDirectory default perform: each with: '/usr/' ]
>> timeToRun ]) ].
>>
>> {
>> #statsForDirectoryTree:->#(6340 6300 6430 6316 6228) .
>> #statsForDirectoryTree2:->#(5918 6072 6122 6296 6306) .
>> #statsForDirectoryTree3:->#(8576 8374 8470 8506 8228) }
>>
>> It's a bit noisy (i was too lazy to exit other programs), but the one using
>> #directoryTreeDo: seems to be clearly slower than the existing algorithm
>> using BFS.
>>
>>
>> Levente
>>
>>
>>>
>>> =============== Diff against Files-fbs.124 ===============
>>>
>>> Item was changed:
>>> + ----- Method: FileDirectory>>directoryTreeDo:entries: (in category
>>> 'private') -----
>>> - ----- Method: FileDirectory>>directoryTreeDo:entries: (in category
>>> 'enumeration') -----
>>> directoryTreeDo: oneArgBlock entries: entriesCollection
>>> "Value oneArgBlock with the path (an OrderedCollection of
>>> FileDirectory's) to each DirectoryEntry and the DirectoryEntry itself."
>>> self entries do:
>>> [ : each |
>>> entriesCollection add: each.
>>> oneArgBlock value: entriesCollection.
>>> each isDirectory ifTrue:
>>> [ | subdir |
>>> subdir := each asFileDirectory.
>>> subdir
>>> directoryTreeDo: oneArgBlock
>>> entries: entriesCollection ].
>>> entriesCollection removeLast ]!
>>>
>>> Item was removed:
>>> - ----- Method: FileDirectory>>statsForDirectoryTree: (in category
>>> 'enumeration') -----
>>> - statsForDirectoryTree: rootedPathName
>>> - "Return the size statistics for the entire directory tree starting
>>> at the given root. The result is a three element array of the form: (<number
>>> of folders><number of files><total bytes in all files>). This method also
>>> serves as an example of how recursively enumerate a directory tree."
>>> - "FileDirectory default statsForDirectoryTree: '\smalltalk'"
>>> -
>>> - | dirs files bytes todo entries p |
>>> - dirs := files := bytes := 0.
>>> - todo := OrderedCollection with: rootedPathName.
>>> - [todo isEmpty] whileFalse: [
>>> - p := todo removeFirst.
>>> - entries := self directoryContentsFor: p.
>>> - entries do: [:entry |
>>> - entry isDirectory
>>> - ifTrue: [
>>> - todo addLast: p , self
>>> pathNameDelimiter asString , entry name.
>>> - dirs := dirs + 1]
>>> - ifFalse: [
>>> - files := files + 1.
>>> - bytes := bytes + entry
>>> fileSize]]].
>>> - ^ Array with: dirs with: files with: bytes
>>> - !
>>>
>>>
>>>
>>
>
More information about the Squeak-dev
mailing list
|