New subject: The Inbox: Files-cmm.125.mcz

30 Jun 2013

      On Fri, 28 Jun 2013, commits@source.squeak.org wrote:
...
A new version of Files was added to project The Inbox:
http://source.squeak.org/inbox/Files-cmm.125.mcz
==================== Summary ====================
Name: Files-cmm.125
Author: cmm
Time: 27 June 2013, 7:24:33.37 pm
UUID: 0b00b2c4-3430-442b-a924-72040ff43d73
Ancestors: Files-fbs.124
FileDirectory>>statsForDirectoryTree: is a prime example of FileDirectory living up to its poor reputation.  It does one useless thing, and does it inefficiently.  It has no senders, deservedly.
   Tree-walking operations like gathering stats can be written in apps more generally using directoryTreeDo:.
I wouldn't say it's inefficient. It uses BFS (instead of DFS, which is 
used by #directoryTreeDo:entries:), which is somewhat better than DFS, 
because of the way FilePlugin works: you have to iterate over the contents 
of the whole directory if you want to it to be efficient.
If you use DFS, then the contents of all parent directories up to the 
root will have to be stored in memory during the iteration, while if you 
use BFS, then only the sibling directories (without their contents) or the 
siblings's parents will be in memory.
I made two "new" implementations of #statsForDirectoryTree:. This one 
optimizes memory usage for BFS:
statsForDirectoryTree2: rootedPathName
| dirs files bytes todo |
    dirs := files := bytes := 0.
    todo := OrderedCollection with: rootedPathName.
    [ todo isEmpty ] whileFalse: [
    	| p |
    	p := todo removeFirst.
    	self directoryContentsFor: p do: [ :entry |
    		entry isDirectory
    			ifTrue: [
    				todo addLast: p , self pathNameDelimiter asString , entry name.
    				dirs := dirs + 1]
    			ifFalse: [
    				files := files + 1.
    				bytes := bytes + entry fileSize ] ] ].
    ^{ dirs. files. bytes }
This one uses #directoryTreeDo::
statsForDirectoryTree3: rootedPathName
| dirs files bytes |
    dirs := files := bytes := 0.
    (FileDirectory on: rootedPathName) directoryTreeDo: [ :path |
    	| entry |
    	entry := path last.
    	entry isDirectory
    		ifTrue: [ dirs := dirs + 1 ]
    		ifFalse: [
    			files := files + 1.
    			bytes := bytes + entry fileSize ] ].
    ^{ dirs. files. bytes }
Let's see the numbers:
"Make sure the disk won't be accessed"
FileDirectory default statsForDirectoryTree: '/usr/'.
FileDirectory default statsForDirectoryTree: '/usr/'.
"Run the benchmark"
#(statsForDirectoryTree: statsForDirectoryTree2: statsForDirectoryTree3:) collect: [ :each |
    each -> ((1 to: 5) collect: [ :run |
    	Smalltalk garbageCollect.
    	[ FileDirectory default perform: each with: '/usr/' ] timeToRun ]) ].
{
    #statsForDirectoryTree:->#(6340 6300 6430 6316 6228) .
    #statsForDirectoryTree2:->#(5918 6072 6122 6296 6306) .
    #statsForDirectoryTree3:->#(8576 8374 8470 8506 8228) }
It's a bit noisy (i was too lazy to exit other programs), but the one 
using #directoryTreeDo: seems to be clearly slower than the existing 
algorithm using BFS.
Levente
...
=============== Diff against Files-fbs.124 ===============
Item was changed:

----- Method: FileDirectory>>directoryTreeDo:entries: (in category 'private') -----

----- Method: FileDirectory>>directoryTreeDo:entries: (in category 'enumeration') -----

directoryTreeDo: oneArgBlock entries: entriesCollection
   "Value oneArgBlock with the path (an OrderedCollection of FileDirectory's) to each DirectoryEntry and the DirectoryEntry itself."
   self entries do:
   	[ : each |
   	entriesCollection add: each.
   	oneArgBlock value: entriesCollection.
   	each isDirectory ifTrue:
   		[ | subdir |
   		subdir := each asFileDirectory.
   		subdir
   			directoryTreeDo: oneArgBlock
   			entries: entriesCollection ].
   	entriesCollection removeLast ]!
Item was removed:

----- Method: FileDirectory>>statsForDirectoryTree: (in category 'enumeration') -----
statsForDirectoryTree: rootedPathName
"Return the size statistics for the entire directory tree starting at the given root. The result is a three element array of the form: (<number of folders><number of files><total bytes in all files>). This method also serves as an example of how recursively enumerate a directory tree."
"FileDirectory default statsForDirectoryTree: '\smalltalk'"

| dirs files bytes todo entries p |
dirs := files := bytes := 0.
todo := OrderedCollection with: rootedPathName.
[todo isEmpty] whileFalse: [
p := todo removeFirst.

entries := self directoryContentsFor: p.

entries do: [:entry |

	entry isDirectory

		ifTrue: [

			todo addLast: p , self pathNameDelimiter asString , entry name.

			dirs := dirs + 1]

		ifFalse: [

			files := files + 1.

			bytes := bytes + entry fileSize]]].

^ Array with: dirs with: files with: bytes
!

Re: [squeak-dev] The Inbox: Files-cmm.125.mcz

tim

tim