[Challenge] large files smart compare (was: Re: Squeak for I/O and Memory Intensive tasks )

Jon Hylands jon at huv.com
Tue Jan 29 08:12:08 UTC 2002


On Tue, 29 Jan 2002 09:39:54 +0200, Yoel Jacobsen <yoel at emet.co.il> wrote:

> My Squeak tests
> ============
> The attached file is an almost empty LDAPObject class with fromStr: 
> class method only. This method parse a single LDIF entry. Look the the 
> class information for the commands to create a 10K lines LDIF file and 
> to parse it.
> 
> Parsing 10K lines took me about 12 minutes in which the image was 
> working on this as a single task. Profiling shoes the time is mostly 
> spent on adding to collections.

I filed your code into a 3.1 image (3828), and it took 11.8 seconds to
parse the 10,000 lines...

I'm running a 1.0 GHz P-III in Windows XP.

tinyBenchmarks results on this machine:

90,651,558 bytecodes/sec
2,825,302 sends/sec

If you change the collection you are using from Bag to OrderedCollection,
the time goes down to 6.4 seconds...

Making a few changes to get rid of unneccessary code brought the time down
to 5.2 seconds:

fromStr: str 
	| obj dict pairsKey pairsVal point |
	obj _ self new.

	dict _ Dictionary new.
	str linesDo: [ :line |
		point _ line findString: ': '.
		pairsKey _ line copyFrom: 1 to: (point - 1).
		pairsVal _ line copyFrom: (point + 2) to: (line size).
		(dict at: pairsKey ifAbsentPut: [OrderedCollection new])
add: pairsVal].
 
	^obj attrs: dict

If we take this version of the method, and switch it back to using a Bag
from an OrderedCollection, the time goes up to 9 seconds.

Later,
Jon

--------------------------------------------------------------
   Jon Hylands      Jon at huv.com      http://www.huv.com/jon

  Project: Micro Seeker (Micro Autonomous Underwater Vehicle)
           http://www.huv.com



More information about the Squeak-dev mailing list