[squeak-dev] Faster directory enumeration?

David T. Lewis lewis at mail.msen.com
Mon Oct 17 20:17:32 UTC 2016


Hi Bernhard,

InterpreterPlugin is part of the VMMaker package, so you would need to be
working in an image with VMMaker loaded (maybe one of the prepared image
from Eliot's site).

I should have checked my own notes before replying - I cannot explain the
reason for this, but it seems that the readdir() primitives no longer
provided any performance benefit when I tested them a couple of years ago.

Here is what I wrote in the summary on
http://www.squeaksource.com/DirectoryPlugin:

Performance characteristics have changed significantly since Squeak circa
2003. The readdir() primitives no longer provide any benefit, but the file
testing primitives still yield a couple orders of magnitude improvement
for some functions.


So ... I guess that some additional profiling would be in order.

Dave


> Hi Dave,
>
> Thanks for the answer. I guess I would need to build the latest version of
> the plugin myself, right? (I am on macOS Sierra.)
>
> I could load DirectoryPlugin. However,
> VMConstruction-Plugins-DirectoryPlugin needs InterpreterPlugin available.
>
> Bernhard
>
>> Am 17.10.2016 um 19:56 schrieb David T. Lewis <lewis at mail.msen.com>:
>>
>> It is probably far too bit-rotted to be of any use now, but here is what
>> I
>> came up with 15 years ago to improve this:
>>
>>  http://wiki.squeak.org/squeak/2274
>>
>> I did briefly look at this again a couple of years ago, and put the
>> updates on SqueakSource. But I think I found that the directory
>> primitives
>> are nowhere near as big a win now as they were 15 years ago.
>> Nevertheless
>> it may still be of some interest.
>>
>> Dave
>>
>>> Dear Squeakers,
>>>
>>> I want to count files with a certain extension in a folder recursively.
>>> Here is the code I use:
>>>
>>> | dir count runtime |
>>> count := 0.
>>> dir := FileDirectory on:
>>> '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'.
>>> runtime := Time millisecondsToRun: [
>>> 	dir directoryTreeDo: [:each |
>>> 		(each last name endsWith: '.emlx') ifTrue: [count := count + 1]]].
>>> {count. runtime}. #(289747 66109)
>>>
>>> As you can see it finds 289.747 files and it takes about 66 seconds. Is
>>> there any faster way to do this given the current VM primitives?
>>>
>>> The reason I ask is that the equivalent Python code takes between 1.5
>>> and
>>> 6 seconds. :-/
>>>
>>> #!/usr/local/bin/python3
>>> import os
>>> import time
>>>
>>> path =
>>> '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'
>>>
>>> print(path)
>>>
>>> start = time.time()
>>> emlx = 0
>>> for dirpath, dirnames, filenames in os.walk(path):
>>>    for filename in filenames:
>>>        if filename.endswith('.emlx'):
>>>            emlx += 1
>>>
>>> runtime = time.time() - start
>>>
>>> print(emlx, runtime)
>>>
>>> It seems to have to do with an optimized os.scandir() function,
>>> described
>>> here: https://www.python.org/dev/peps/pep-0471/
>>>
>>> Cheers,
>>> Bernhard
>>>
>>>
>>>
>>
>>
>>
>
>






More information about the Squeak-dev mailing list