[squeak-dev] Faster directory enumeration?

Levente Uzonyi leves at caesar.elte.hu
Mon Oct 17 23:30:41 UTC 2016


The whole image-side code starting from #directoryTreeDo: could use some 
optimization, but that would only make it at most 1.5x faster.
If I were you, I'd use OSProcess and execute this:

 	find directory -name '*.exml'

It's not that nice, but it shouldn't take more than a second to find the 
files.

Levente

On Mon, 17 Oct 2016, Bernhard Pieber wrote:

> Dear Squeakers,
>
> I want to count files with a certain extension in a folder recursively. Here is the code I use:
>
> | dir count runtime |
> count := 0.
> dir := FileDirectory on: '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'.
> runtime := Time millisecondsToRun: [
> 	dir directoryTreeDo: [:each |
> 		(each last name endsWith: '.emlx') ifTrue: [count := count + 1]]].
> {count. runtime}. #(289747 66109)
>
> As you can see it finds 289.747 files and it takes about 66 seconds. Is there any faster way to do this given the current VM primitives?
>
> The reason I ask is that the equivalent Python code takes between 1.5 and 6 seconds. :-/
>
> #!/usr/local/bin/python3
> import os
> import time
>
> path = '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'
>
> print(path)
>
> start = time.time()
> emlx = 0
> for dirpath, dirnames, filenames in os.walk(path):
>    for filename in filenames:
>        if filename.endswith('.emlx'):
>            emlx += 1
>
> runtime = time.time() - start
>
> print(emlx, runtime)
>
> It seems to have to do with an optimized os.scandir() function, described here: https://www.python.org/dev/peps/pep-0471/
>
> Cheers,
> Bernhard
>
>
>
>


More information about the Squeak-dev mailing list