Zip archive performance

Cees De Groot cdegroot at gmail.com
Wed Oct 5 15:03:46 UTC 2005


Hi,

When using the functions in System-Archives to create a zip archive
over a large number of files (~1500 in our test case), performance
suffers badly (takes around 220 seconds on my box).

The problem is that Archive>>addTree:match: only passes the filename
to new archive members (creating them with #newFromFile:). When
tracing calls, this ends up at ZipNewFileMember>>from:, which calls
the innocently-looking #directoryEntry on StandardFileStream. However,
this method invokes another scan of the directory on the OS level and
a linear search over them. Which means that if you have a directory
with a hundred files, for every file these hundred files are stat()'ed
(or the Win32 equivalent from them)...

I worked around by passing the directory entry from
Archive>>addTree:match: (making #newFromFile:entry:, etcetera). This
enhances performance in our test case by a factor of 10...

I'm not sure whether this is the correct fix, just wanted to report
that there's room for optimization here :)

Code used:

TimeProfileBrowser onBlock: [| zip |
zip := ZipArchive new.
zip addTree: self default dataDirectory match: [:each | true].
zip writeToFileNamed: 'c:\temp\dgv.zip']



More information about the Squeak-dev mailing list