[squeak-dev] Faster directory enumeration?

David T. Lewis lewis at mail.msen.com
Mon Oct 17 17:56:24 UTC 2016


It is probably far too bit-rotted to be of any use now, but here is what I
came up with 15 years ago to improve this:

  http://wiki.squeak.org/squeak/2274

I did briefly look at this again a couple of years ago, and put the
updates on SqueakSource. But I think I found that the directory primitives
are nowhere near as big a win now as they were 15 years ago. Nevertheless
it may still be of some interest.

Dave

> Dear Squeakers,
>
> I want to count files with a certain extension in a folder recursively.
> Here is the code I use:
>
> | dir count runtime |
> count := 0.
> dir := FileDirectory on:
> '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'.
> runtime := Time millisecondsToRun: [
> 	dir directoryTreeDo: [:each |
> 		(each last name endsWith: '.emlx') ifTrue: [count := count + 1]]].
> {count. runtime}. #(289747 66109)
>
> As you can see it finds 289.747 files and it takes about 66 seconds. Is
> there any faster way to do this given the current VM primitives?
>
> The reason I ask is that the equivalent Python code takes between 1.5 and
> 6 seconds. :-/
>
> #!/usr/local/bin/python3
> import os
> import time
>
> path =
> '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'
>
> print(path)
>
> start = time.time()
> emlx = 0
> for dirpath, dirnames, filenames in os.walk(path):
>     for filename in filenames:
>         if filename.endswith('.emlx'):
>             emlx += 1
>
> runtime = time.time() - start
>
> print(emlx, runtime)
>
> It seems to have to do with an optimized os.scandir() function, described
> here: https://www.python.org/dev/peps/pep-0471/
>
> Cheers,
> Bernhard
>
>
>




More information about the Squeak-dev mailing list