[squeak-dev] Faster directory enumeration?

Eliot Miranda eliot.miranda at gmail.com
Mon Oct 17 21:19:24 UTC 2016


On Mon, Oct 17, 2016 at 1:17 PM, David T. Lewis <lewis at mail.msen.com> wrote:

> Hi Bernhard,
>
> InterpreterPlugin is part of the VMMaker package, so you would need to be
> working in an image with VMMaker loaded (maybe one of the prepared image
> from Eliot's site).
>

There aren't any.  There is a script in the image subdirectory of
http://www.github.com/opensmalltalk/vm which builds one; see
image/buildspurtrunkvmmakerimage.sh

I should have checked my own notes before replying - I cannot explain the
> reason for this, but it seems that the readdir() primitives no longer
> provided any performance benefit when I tested them a couple of years ago.
>
> Here is what I wrote in the summary on
> http://www.squeaksource.com/DirectoryPlugin:
>
> Performance characteristics have changed significantly since Squeak circa
> 2003. The readdir() primitives no longer provide any benefit, but the file
> testing primitives still yield a couple orders of magnitude improvement
> for some functions.
>
>
> So ... I guess that some additional profiling would be in order.
>
> Dave
>
>
> > Hi Dave,
> >
> > Thanks for the answer. I guess I would need to build the latest version
> of
> > the plugin myself, right? (I am on macOS Sierra.)
> >
> > I could load DirectoryPlugin. However,
> > VMConstruction-Plugins-DirectoryPlugin needs InterpreterPlugin
> available.
> >
> > Bernhard
> >
> >> Am 17.10.2016 um 19:56 schrieb David T. Lewis <lewis at mail.msen.com>:
> >>
> >> It is probably far too bit-rotted to be of any use now, but here is what
> >> I
> >> came up with 15 years ago to improve this:
> >>
> >>  http://wiki.squeak.org/squeak/2274
> >>
> >> I did briefly look at this again a couple of years ago, and put the
> >> updates on SqueakSource. But I think I found that the directory
> >> primitives
> >> are nowhere near as big a win now as they were 15 years ago.
> >> Nevertheless
> >> it may still be of some interest.
> >>
> >> Dave
> >>
> >>> Dear Squeakers,
> >>>
> >>> I want to count files with a certain extension in a folder recursively.
> >>> Here is the code I use:
> >>>
> >>> | dir count runtime |
> >>> count := 0.
> >>> dir := FileDirectory on:
> >>> '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-
> E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'.
> >>> runtime := Time millisecondsToRun: [
> >>>     dir directoryTreeDo: [:each |
> >>>             (each last name endsWith: '.emlx') ifTrue: [count := count
> + 1]]].
> >>> {count. runtime}. #(289747 66109)
> >>>
> >>> As you can see it finds 289.747 files and it takes about 66 seconds. Is
> >>> there any faster way to do this given the current VM primitives?
> >>>
> >>> The reason I ask is that the equivalent Python code takes between 1.5
> >>> and
> >>> 6 seconds. :-/
> >>>
> >>> #!/usr/local/bin/python3
> >>> import os
> >>> import time
> >>>
> >>> path =
> >>> '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-
> E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'
> >>>
> >>> print(path)
> >>>
> >>> start = time.time()
> >>> emlx = 0
> >>> for dirpath, dirnames, filenames in os.walk(path):
> >>>    for filename in filenames:
> >>>        if filename.endswith('.emlx'):
> >>>            emlx += 1
> >>>
> >>> runtime = time.time() - start
> >>>
> >>> print(emlx, runtime)
> >>>
> >>> It seems to have to do with an optimized os.scandir() function,
> >>> described
> >>> here: https://www.python.org/dev/peps/pep-0471/
> >>>
> >>> Cheers,
> >>> Bernhard
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >
> >
>
>
>
>
>
>


-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20161017/0e99d736/attachment-0001.htm


More information about the Squeak-dev mailing list