[squeak-dev] The Inbox: Files-cmm.125.mcz

Levente Uzonyi leves at elte.hu
Wed Jul 3 05:51:52 UTC 2013


On Sun, 30 Jun 2013, H. Hirzel wrote:

> So, what do you propose Levente? Maybe just an update of the comment
> of FileDirectory which explains this issue and gives your example
> statistics code?

Definitely not, but I'm not sure if it's worth fixing FileDirectory. Even 
though I like its simplicity (only a few classes, single class entry 
point), it currently has some bottlenecks, noise (messy code) and a 
design flaw (DirectoryEntry <-> FileDirectory).


Levente

>
> --Hannes
>
> On 6/29/13, Levente Uzonyi <leves at elte.hu> wrote:
>> On Fri, 28 Jun 2013, commits at source.squeak.org wrote:
>>
>>> A new version of Files was added to project The Inbox:
>>> http://source.squeak.org/inbox/Files-cmm.125.mcz
>>>
>>> ==================== Summary ====================
>>>
>>> Name: Files-cmm.125
>>> Author: cmm
>>> Time: 27 June 2013, 7:24:33.37 pm
>>> UUID: 0b00b2c4-3430-442b-a924-72040ff43d73
>>> Ancestors: Files-fbs.124
>>>
>>> FileDirectory>>statsForDirectoryTree: is a prime example of FileDirectory
>>> living up to its poor reputation.  It does one useless thing, and does it
>>> inefficiently.  It has no senders, deservedly.
>>> 	Tree-walking operations like gathering stats can be written in apps more
>>> generally using directoryTreeDo:.
>>
>> I wouldn't say it's inefficient. It uses BFS (instead of DFS, which is
>> used by #directoryTreeDo:entries:), which is somewhat better than DFS,
>> because of the way FilePlugin works: you have to iterate over the contents
>> of the whole directory if you want to it to be efficient.
>> If you use DFS, then the contents of all parent directories up to the
>> root will have to be stored in memory during the iteration, while if you
>> use BFS, then only the sibling directories (without their contents) or the
>> siblings's parents will be in memory.
>>
>> I made two "new" implementations of #statsForDirectoryTree:. This one
>> optimizes memory usage for BFS:
>>
>> statsForDirectoryTree2: rootedPathName
>>
>>  	| dirs files bytes todo |
>>  	dirs := files := bytes := 0.
>>  	todo := OrderedCollection with: rootedPathName.
>>  	[ todo isEmpty ] whileFalse: [
>>  		| p |
>>  		p := todo removeFirst.
>>  		self directoryContentsFor: p do: [ :entry |
>>  			entry isDirectory
>>  				ifTrue: [
>>  					todo addLast: p , self pathNameDelimiter asString , entry name.
>>  					dirs := dirs + 1]
>>  				ifFalse: [
>>  					files := files + 1.
>>  					bytes := bytes + entry fileSize ] ] ].
>>  	^{ dirs. files. bytes }
>>
>> This one uses #directoryTreeDo::
>>
>> statsForDirectoryTree3: rootedPathName
>>
>>  	| dirs files bytes |
>>  	dirs := files := bytes := 0.
>>  	(FileDirectory on: rootedPathName) directoryTreeDo: [ :path |
>>  		| entry |
>>  		entry := path last.
>>  		entry isDirectory
>>  			ifTrue: [ dirs := dirs + 1 ]
>>  			ifFalse: [
>>  				files := files + 1.
>>  				bytes := bytes + entry fileSize ] ].
>>  	^{ dirs. files. bytes }
>>
>> Let's see the numbers:
>>
>> "Make sure the disk won't be accessed"
>> FileDirectory default statsForDirectoryTree: '/usr/'.
>> FileDirectory default statsForDirectoryTree: '/usr/'.
>> "Run the benchmark"
>> #(statsForDirectoryTree: statsForDirectoryTree2: statsForDirectoryTree3:)
>> collect: [ :each |
>>  	each -> ((1 to: 5) collect: [ :run |
>>  		Smalltalk garbageCollect.
>>  		[ FileDirectory default perform: each with: '/usr/' ] timeToRun ]) ].
>>
>> {
>>  	#statsForDirectoryTree:->#(6340 6300 6430 6316 6228) .
>>  	#statsForDirectoryTree2:->#(5918 6072 6122 6296 6306) .
>>  	#statsForDirectoryTree3:->#(8576 8374 8470 8506 8228) }
>>
>> It's a bit noisy (i was too lazy to exit other programs), but the one
>> using #directoryTreeDo: seems to be clearly slower than the existing
>> algorithm using BFS.
>>
>>
>> Levente
>>
>>>
>>> =============== Diff against Files-fbs.124 ===============
>>>
>>> Item was changed:
>>> + ----- Method: FileDirectory>>directoryTreeDo:entries: (in category
>>> 'private') -----
>>> - ----- Method: FileDirectory>>directoryTreeDo:entries: (in category
>>> 'enumeration') -----
>>>  directoryTreeDo: oneArgBlock entries: entriesCollection
>>>  	"Value oneArgBlock with the path (an OrderedCollection of
>>> FileDirectory's) to each DirectoryEntry and the DirectoryEntry itself."
>>>  	self entries do:
>>>  		[ : each |
>>>  		entriesCollection add: each.
>>>  		oneArgBlock value: entriesCollection.
>>>  		each isDirectory ifTrue:
>>>  			[ | subdir |
>>>  			subdir := each asFileDirectory.
>>>  			subdir
>>>  				directoryTreeDo: oneArgBlock
>>>  				entries: entriesCollection ].
>>>  		entriesCollection removeLast ]!
>>>
>>> Item was removed:
>>> - ----- Method: FileDirectory>>statsForDirectoryTree: (in category
>>> 'enumeration') -----
>>> - statsForDirectoryTree: rootedPathName
>>> - 	"Return the size statistics for the entire directory tree starting at
>>> the given root. The result is a three element array of the form: (<number
>>> of folders><number of files><total bytes in all files>). This method also
>>> serves as an example of how recursively enumerate a directory tree."
>>> - 	"FileDirectory default statsForDirectoryTree: '\smalltalk'"
>>> -
>>> - 	| dirs files bytes todo entries p |
>>> - 	dirs := files := bytes := 0.
>>> - 	todo := OrderedCollection with: rootedPathName.
>>> - 	[todo isEmpty] whileFalse: [
>>> - 		p := todo removeFirst.
>>> - 		entries := self directoryContentsFor: p.
>>> - 		entries do: [:entry |
>>> - 			entry isDirectory
>>> - 				ifTrue: [
>>> - 					todo addLast: p , self pathNameDelimiter asString , entry name.
>>> - 					dirs := dirs + 1]
>>> - 				ifFalse: [
>>> - 					files := files + 1.
>>> - 					bytes := bytes + entry fileSize]]].
>>> - 	^ Array with: dirs with: files with: bytes
>>> - !
>>>
>>>
>>>
>>
>>
>
>


More information about the Squeak-dev mailing list