[Newbies] Binary file I/O performance problems
David Finlayson
dfinlayson at usgs.gov
Tue Sep 2 22:38:36 UTC 2008
I've been working on my first Smalltalk program which needs to read
and write large c structs from a binary file. I wrote two classes
BinaryStreamReader and BinaryStreamWriter that take a stream and can
read (or write) all of the integer and floating point types I need
(also handles byte-swapping if necessary). I wrote a test program that
focuses on just reading a small (for us) 123 Mb data file on disk. The
program takes about 166 seconds to run compared to 1.2 seconds for an
equivalent C version (140x faster than Squeak version).
As an example of the style of code I've written, here is the method
that reads an unsigned 32-bit integer:
uint32
" returns the next unsigned, 32-bit integer from the binary stream "
" see PositionableStream for original implimentation."
| n a b c d |
isBigEndian
ifTrue:
[ a := stream next.
b := stream next.
c := stream next.
d := stream next ]
ifFalse:
[ d := stream next.
c := stream next.
b := stream next.
a := stream next ].
((((a notNil and: [ b notNil ]) and: [ c notNil ])) and: [ d notNil])
ifTrue:
[ n := a.
n := (n bitShift: 8) + b.
n := (n bitShift: 8) + c.
n := (n bitShift: 8) + d ]
ifFalse: [ n := nil ].
^ n
There are at 4 calls to stream next for each integer and sure enough,
a profile of the code (attached below) shows that most of the time is
being lost in the StandardFileStream basicNext and next methods. There
must be a better way to do this. Scaled up to operational code, I will
need to process about 40 Gb of data per day. My C code currently takes
about 16 cpu hours to do this work (including number crunching). In
Squeak, just reading the data would take 3 cpu months!
Hopefully, someone can help me out here. The working code is available
on squeaksource.org if anyone is interested:
http://www.squeaksource.com/@CWlm_vX4hAPUzk5w/7SVjQQhp
Thanks,
David
Below is a message tally of my program:
- 166088 tallies, 166100 msec.
**Tree**
100.0% {166100ms} SEAFileReader>>printAllBlocks
99.9% {165934ms} ProcessedPingBlock>>readFrom:
99.9% {165934ms} XYZAPingData>>readFrom:
99.7% {165602ms} XYZATransducerData>>readFrom:
95.9% {159290ms} XYZAPointData>>readFrom:
46.4% {77070ms} BinaryStreamReader>>double
|41.9% {69596ms} BinaryStreamReader>>uint32
| |28.1% {46674ms} StandardFileStream>>next
| | |14.1% {23420ms} primitives
| | |14.0% {23254ms} StandardFileStream>>basicNext
| |9.8% {16278ms} LargePositiveInteger>>+
| | |6.1% {10132ms} LargePositiveInteger(Integer)>>+
| | | |3.1% {5149ms} primitives
| | | |3.0% {4983ms} SmallInteger(Number)>>negative
| | |3.7% {6146ms} primitives
| |4.1% {6810ms} primitives
|2.5% {4153ms} Float class(Behavior)>>new:
|2.0% {3322ms} primitives
13.9% {23088ms} BinaryStreamReader>>float
|10.4% {17274ms} BinaryStreamReader>>uint32
| |7.0% {11627ms} StandardFileStream>>next
| | |3.5% {5814ms} primitives
| | |3.5% {5814ms} StandardFileStream>>basicNext
| |2.4% {3986ms} LargePositiveInteger>>+
|2.2% {3654ms} Float class>>fromIEEE32Bit:
13.7% {22756ms} BinaryStreamReader>>int32
|7.7% {12790ms} BinaryStreamReader>>uint32
| |6.8% {11295ms} StandardFileStream>>next
| | 3.5% {5814ms} StandardFileStream>>basicNext
| | 3.4% {5647ms} primitives
|5.2% {8637ms} SmallInteger>>>=
| 4.3% {7142ms} SmallInteger(Magnitude)>>>=
| 3.5% {5814ms} SmallInteger>><
| 2.6% {4319ms} SmallInteger(Integer)>><
10.7% {17773ms} BinaryStreamReader>>uint16
|6.9% {11461ms} StandardFileStream>>next
| |3.5% {5814ms} StandardFileStream>>basicNext
| |3.3% {5481ms} primitives
|3.8% {6312ms} primitives
6.8% {11295ms} BinaryStreamReader>>skip:
|5.0% {8305ms} StandardFileStream>>skip:
3.4% {5647ms} BinaryStreamReader>>int8
2.6% {4319ms} BinaryStreamReader>>uint8
**Leaves**
25.4% {42189ms} StandardFileStream>>basicNext
25.2% {41857ms} StandardFileStream>>next
6.0% {9966ms} BinaryStreamReader>>uint32
5.6% {9302ms} SmallInteger(Number)>>negative
4.6% {7641ms} LargePositiveInteger>>+
3.8% {6312ms} LargePositiveInteger(Integer)>>+
3.8% {6312ms} BinaryStreamReader>>uint16
3.4% {5647ms} Float class(Behavior)>>new:
2.0% {3322ms} BinaryStreamReader>>double
**Memory**
old +3,705,004 bytes
young -28,800 bytes
used +3,676,204 bytes
free +362,744 bytes
**GCs**
full 50 totalling 2,524ms (2.0% uptime), avg 50.0ms
incr 19959 totalling 2,794ms (2.0% uptime), avg 0.0ms
tenures 6,041 (avg 3 GCs/tenure)
root table 0 overflows
--
David Finlayson, Ph.D.
Operational Geologist
U.S. Geological Survey
Pacific Science Center
400 Natural Bridges Drive
Santa Cruz, CA 95060, USA
Tel: 831-427-4757, Fax: 831-427-4748, E-mail: dfinlayson at usgs.gov
More information about the Beginners
mailing list