[Newbies] Binary file I/O performance problems

David Finlayson dfinlayson at usgs.gov
Tue Sep 2 22:38:36 UTC 2008


I've been working on my first Smalltalk program which needs to read
and write large c structs from a binary file. I wrote two classes
BinaryStreamReader and BinaryStreamWriter that take a stream and can
read (or write) all of the integer and floating point types I need
(also handles byte-swapping if necessary). I wrote a test program that
focuses on just reading a small (for us) 123 Mb data file on disk. The
program takes about 166 seconds to run compared to 1.2 seconds for an
equivalent C version (140x faster than Squeak version).

As an example of the style of code I've written, here is the method
that reads an unsigned 32-bit integer:

uint32
	" returns the next unsigned, 32-bit integer from the binary stream "
	" see PositionableStream for original implimentation."
	| n a b c d |
	isBigEndian
		ifTrue:
			[ a := stream next.
			b := stream next.
			c := stream next.
			d := stream next ]
		ifFalse:
			[ d := stream next.
			c := stream next.
			b := stream next.
			a := stream next ].
	((((a notNil and: [ b notNil ]) and: [ c notNil ])) and: [ d notNil])
		ifTrue:
			[ n := a.
			n := (n bitShift: 8) + b.
			n := (n bitShift: 8) + c.
			n := (n bitShift: 8) + d ]
		ifFalse: [ n := nil ].
	^ n

There are at 4 calls to stream next for each integer and sure enough,
a profile of the code (attached below) shows that most of the time is
being lost in the StandardFileStream basicNext and next methods. There
must be a better way to do this. Scaled up to operational code, I will
need to process about 40 Gb of data per day. My C code currently takes
about 16 cpu hours to do this work (including number crunching). In
Squeak, just reading the data would take 3 cpu months!

Hopefully, someone can help me out here. The working code is available
on squeaksource.org if anyone is interested:

http://www.squeaksource.com/@CWlm_vX4hAPUzk5w/7SVjQQhp

Thanks,

David

Below is a message tally of my program:



 - 166088 tallies, 166100 msec.

**Tree**
100.0% {166100ms} SEAFileReader>>printAllBlocks
  99.9% {165934ms} ProcessedPingBlock>>readFrom:
    99.9% {165934ms} XYZAPingData>>readFrom:
      99.7% {165602ms} XYZATransducerData>>readFrom:
        95.9% {159290ms} XYZAPointData>>readFrom:
          46.4% {77070ms} BinaryStreamReader>>double
            |41.9% {69596ms} BinaryStreamReader>>uint32
            |  |28.1% {46674ms} StandardFileStream>>next
            |  |  |14.1% {23420ms} primitives
            |  |  |14.0% {23254ms} StandardFileStream>>basicNext
            |  |9.8% {16278ms} LargePositiveInteger>>+
            |  |  |6.1% {10132ms} LargePositiveInteger(Integer)>>+
            |  |  |  |3.1% {5149ms} primitives
            |  |  |  |3.0% {4983ms} SmallInteger(Number)>>negative
            |  |  |3.7% {6146ms} primitives
            |  |4.1% {6810ms} primitives
            |2.5% {4153ms} Float class(Behavior)>>new:
            |2.0% {3322ms} primitives
          13.9% {23088ms} BinaryStreamReader>>float
            |10.4% {17274ms} BinaryStreamReader>>uint32
            |  |7.0% {11627ms} StandardFileStream>>next
            |  |  |3.5% {5814ms} primitives
            |  |  |3.5% {5814ms} StandardFileStream>>basicNext
            |  |2.4% {3986ms} LargePositiveInteger>>+
            |2.2% {3654ms} Float class>>fromIEEE32Bit:
          13.7% {22756ms} BinaryStreamReader>>int32
            |7.7% {12790ms} BinaryStreamReader>>uint32
            |  |6.8% {11295ms} StandardFileStream>>next
            |  |  3.5% {5814ms} StandardFileStream>>basicNext
            |  |  3.4% {5647ms} primitives
            |5.2% {8637ms} SmallInteger>>>=
            |  4.3% {7142ms} SmallInteger(Magnitude)>>>=
            |    3.5% {5814ms} SmallInteger>><
            |      2.6% {4319ms} SmallInteger(Integer)>><
          10.7% {17773ms} BinaryStreamReader>>uint16
            |6.9% {11461ms} StandardFileStream>>next
            |  |3.5% {5814ms} StandardFileStream>>basicNext
            |  |3.3% {5481ms} primitives
            |3.8% {6312ms} primitives
          6.8% {11295ms} BinaryStreamReader>>skip:
            |5.0% {8305ms} StandardFileStream>>skip:
          3.4% {5647ms} BinaryStreamReader>>int8
            2.6% {4319ms} BinaryStreamReader>>uint8

**Leaves**
25.4% {42189ms} StandardFileStream>>basicNext
25.2% {41857ms} StandardFileStream>>next
6.0% {9966ms} BinaryStreamReader>>uint32
5.6% {9302ms} SmallInteger(Number)>>negative
4.6% {7641ms} LargePositiveInteger>>+
3.8% {6312ms} LargePositiveInteger(Integer)>>+
3.8% {6312ms} BinaryStreamReader>>uint16
3.4% {5647ms} Float class(Behavior)>>new:
2.0% {3322ms} BinaryStreamReader>>double

**Memory**
	old			+3,705,004 bytes
	young		-28,800 bytes
	used		+3,676,204 bytes
	free		+362,744 bytes

**GCs**
	full			50 totalling 2,524ms (2.0% uptime), avg 50.0ms
	incr		19959 totalling 2,794ms (2.0% uptime), avg 0.0ms
	tenures		6,041 (avg 3 GCs/tenure)
	root table	0 overflows




-- 
David Finlayson, Ph.D.
Operational Geologist

U.S. Geological Survey
Pacific Science Center
400 Natural Bridges Drive
Santa Cruz, CA 95060, USA

Tel: 831-427-4757, Fax: 831-427-4748, E-mail: dfinlayson at usgs.gov


More information about the Beginners mailing list