[squeak-dev] Faster directory enumeration?

Bernhard Pieber bernhard at pieber.com
Mon Oct 17 17:38:48 UTC 2016


Dear Squeakers,

I want to count files with a certain extension in a folder recursively. Here is the code I use:

| dir count runtime |
count := 0.
dir := FileDirectory on: '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'.
runtime := Time millisecondsToRun: [
	dir directoryTreeDo: [:each | 
		(each last name endsWith: '.emlx') ifTrue: [count := count + 1]]].
{count. runtime}. #(289747 66109)

As you can see it finds 289.747 files and it takes about 66 seconds. Is there any faster way to do this given the current VM primitives?

The reason I ask is that the equivalent Python code takes between 1.5 and 6 seconds. :-/

#!/usr/local/bin/python3
import os
import time

path = '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'

print(path)

start = time.time()
emlx = 0
for dirpath, dirnames, filenames in os.walk(path):
    for filename in filenames:
        if filename.endswith('.emlx'):
            emlx += 1

runtime = time.time() - start

print(emlx, runtime)

It seems to have to do with an optimized os.scandir() function, described here: https://www.python.org/dev/peps/pep-0471/

Cheers,
Bernhard




More information about the Squeak-dev mailing list