[Challenge] large files smart compare (was: Re: Squeak for I/O and Memory Intensive tasks )
Jon Hylands
jon at huv.com
Tue Jan 29 08:12:08 UTC 2002
On Tue, 29 Jan 2002 09:39:54 +0200, Yoel Jacobsen <yoel at emet.co.il> wrote:
> My Squeak tests
> ============
> The attached file is an almost empty LDAPObject class with fromStr:
> class method only. This method parse a single LDIF entry. Look the the
> class information for the commands to create a 10K lines LDIF file and
> to parse it.
>
> Parsing 10K lines took me about 12 minutes in which the image was
> working on this as a single task. Profiling shoes the time is mostly
> spent on adding to collections.
I filed your code into a 3.1 image (3828), and it took 11.8 seconds to
parse the 10,000 lines...
I'm running a 1.0 GHz P-III in Windows XP.
tinyBenchmarks results on this machine:
90,651,558 bytecodes/sec
2,825,302 sends/sec
If you change the collection you are using from Bag to OrderedCollection,
the time goes down to 6.4 seconds...
Making a few changes to get rid of unneccessary code brought the time down
to 5.2 seconds:
fromStr: str
| obj dict pairsKey pairsVal point |
obj _ self new.
dict _ Dictionary new.
str linesDo: [ :line |
point _ line findString: ': '.
pairsKey _ line copyFrom: 1 to: (point - 1).
pairsVal _ line copyFrom: (point + 2) to: (line size).
(dict at: pairsKey ifAbsentPut: [OrderedCollection new])
add: pairsVal].
^obj attrs: dict
If we take this version of the method, and switch it back to using a Bag
from an OrderedCollection, the time goes up to 9 seconds.
Later,
Jon
--------------------------------------------------------------
Jon Hylands Jon at huv.com http://www.huv.com/jon
Project: Micro Seeker (Micro Autonomous Underwater Vehicle)
http://www.huv.com
More information about the Squeak-dev
mailing list
|