More File Performance Q.?

Bob Arning arning at charm.net
Thu May 16 11:17:01 UTC 2002


On 16 May 2002 00:20:22 -0500 Jimmie Houchin <jhouchin at texoma.net> wrote:
>This program opens the file and reads each line to determine if any line
>beginning with 'from' is actually a header or in the body. If in the
>body I insert a space at the beginning of the line.
>
>Due to the requirement of reading each line to operate on it I had to
>change from StandardFileStream to CrLfFileStream.
>
>Is there anything I am doing wrong in my code which is causing problems?

Let's see what MessageTally has to say...

======================================================
 - 1158 tallies, 19288 msec.

**Tree**
33.9% {6539ms} CrLfFileStream(FileStream)>>contentsOfEntireFile
  |33.9% {6539ms} CrLfFileStream>>next:
  |  19.7% {3800ms} String>>withSqueakLineEndings
  |    |11.6% {2237ms} primitives
  |    |4.4% {849ms} String(SequenceableCollection)>>copyFrom:to:
  |    |3.7% {714ms} String>>indexOfAnyOf:startingAt:ifAbsent:
  |  14.2% {2739ms} CrLfFileStream(StandardFileStream)>>next:
  |    10.4% {2006ms} CrLfFileStream(PositionableStream)>>nextInto:
  |      |10.4% {2006ms} CrLfFileStream(StandardFileStream)>>next:into:startingAt:
  |    3.7% {714ms} primitives
28.2% {5439ms} String>>beginsWith:
9.9% {1910ms} StandardFileStream>>nextPutAll:
8.9% {1717ms} String>>asLowercase
  |4.7% {907ms} String>>translateToLowercase
  |  |4.7% {907ms} String>>translateWith:
  |  |  4.1% {791ms} String>>translateFrom:to:table:
  |4.1% {791ms} String(Object)>>copy
  |  4.1% {791ms} String(SequenceableCollection)>>shallowCopy
  |    3.5% {675ms} String(SequenceableCollection)>>copyFrom:to:
7.3% {1408ms} ReadStream(PositionableStream)>>nextLine
  |6.5% {1254ms} ReadStream>>upTo:
  |  3.6% {694ms} String(SequenceableCollection)>>copyFrom:to:
  |  2.8% {540ms} String>>indexOf:startingAt:ifAbsent:
7.1% {1369ms} StandardFileStream(WriteStream)>>cr
  7.1% {1369ms} StandardFileStream>>nextPut:

**Leaves**
28.2% {5439ms} String>>beginsWith:
11.6% {2237ms} String>>withSqueakLineEndings
11.5% {2218ms} String(SequenceableCollection)>>copyFrom:to:
10.4% {2006ms} CrLfFileStream(StandardFileStream)>>next:into:startingAt:
9.9% {1910ms} StandardFileStream>>nextPutAll:
7.1% {1369ms} StandardFileStream>>nextPut:
4.1% {791ms} String>>translateFrom:to:table:
3.7% {714ms} CrLfFileStream(StandardFileStream)>>next:
3.7% {714ms} String>>indexOfAnyOf:startingAt:ifAbsent:
2.8% {540ms} String>>indexOf:startingAt:ifAbsent:

**Memory**
	old			-24,468 bytes
	young		-156,388 bytes
	used		-180,856 bytes
	free		+156,388 bytes

**GCs**
	full			3 totalling 1,892ms (10.0% uptime), avg 631.0ms
	incr		396 totalling 389ms (2.0% uptime), avg 1.0ms
	tenures		0
	root table	0 overflows
======================================================

First observation: we are losing 10% to garbage collection. Andreas has some code that reduces this, but I'm not sure if it is released yet.

Second: grouping the numbers to show the majow parts:

-- reading - 41.2%
33.9% {6539ms} CrLfFileStream(FileStream)>>contentsOfEntireFile
7.3% {1408ms} ReadStream(PositionableStream)>>nextLine

-- finding 'from' - 37.1%
8.9% {1717ms} String>>asLowercase
28.2% {5439ms} String>>beginsWith:

-- writing - 17%
9.9% {1910ms} StandardFileStream>>nextPutAll:
7.1% {1369ms} StandardFileStream(WriteStream)>>cr

Looking at the finding part, converting to lowercase seems a bit much, so a smarter #beginsWith: is in order....

======================================================
beginsWith2: prefix
	"Answer whether the receiver begins with the given prefix string.
	The comparison is NOT case-sensitive."

	self size < prefix size ifTrue: [^ false].
	self first asLowercase == prefix first asLowercase ifFalse: [^false].
	^ (self findSubstring: prefix in: self startingAt: 1
			matchTable: CaseInsensitiveOrder) = 1
======================================================

This saves about 4 seconds

======================================================
 - 918 tallies, 15265 msec.

**Tree**
41.8% {6381ms} CrLfFileStream(FileStream)>>contentsOfEntireFile
21.0% {3206ms} ReadStream(PositionableStream)>>nextLine
14.9% {2274ms} StandardFileStream>>nextPutAll:
9.4% {1435ms} StandardFileStream(WriteStream)>>cr
5.4% {824ms} String>>beginsWith2:
2.7% {412ms} primitives
2.3% {351ms} StandardFileStream>>flush
======================================================

Next, if you can avoid CrLfFileStream (by handling your particular requirements in #nextLine, e.g.), you can save a bit more:

======================================================
 - 791 tallies, 13122 msec.

**Tree**
20.7% {2716ms} StandardFileStream(FileStream)>>contentsOfEntireFile
25.2% {3307ms} ReadStream(PositionableStream)>>nextLine
25.8% {3385ms} StandardFileStream>>nextPutAll:
8.0% {1050ms} StandardFileStream>>flush
7.8% {1024ms} StandardFileStream(WriteStream)>>cr
7.5% {984ms} String>>beginsWith2:
2.1% {276ms} String>>findString:
======================================================

Further improvements may be a bit harder to find (there... that should get someone going).

Cheers,
Bob



More information about the Squeak-dev mailing list