File Performance Q.?
John M McIntosh
johnmci at smalltalkconsulting.com
Fri May 10 21:07:02 UTC 2002
>>Squeak Tests
>>Squeak 3.2gamma4811
>>In a workspace
>>
>>Transcript show:
>>[file1 := CrLfFileStream new.
>>file2 := CrLfFileStream new.
>>file1 open: 'c:\jimmie\test\testfile.txt' forWrite: false.
>>file2 open: 'c:\jimmie\test\testfile2.txt' forWrite: true.
>>
>>[file1 atEnd] whileFalse:
>>[file2 nextPutAll: file1 nextLine.
>> file2 nextPutAll: String cr].
>>
>>file2 flush.
>>file2 close.
>>file1 close] timeToRun.
>>
>>In the Transcript:
>>304141
>>298161
>>302502
Ah, well you see if you look at the handy source code what you'll see
is that we are checking for end of file, and for cr and lf too, but
doing this by reading and writting a single character at a time. This
equates in your case to almost 15 million primitive calls. Of course
reading and writting 1 byte at a time from a file isn't very fast and
it depends on the VM implementation to boot.
For example on a 500mhz pb under os-x with mac vm 3.2.7b4 for a
1,999,213 text file I get 13,707 milliseconds, since BSD unix (aka
OS-X) is doing a bit of file buffering for me.
Now if I don't care about file sizes, or have lots of memory which is
greater than any file I want to deal with then I could do
file2 nextPutAll: file1 contents. That takes 2,367ms
Mmm which seems a bit large. Ah CrLfFileStream when I ask for the
contents, grabs it in a chunk then cheerfully spins though the data
looking for CRLF and changes then to CR.
So if I use the StandardFileStream class instead, then do
file2 nextPutAll: file1 contents that takes 552ms
I'm sure at this point things have been cached by the BSD buffer
system by now so an arbitrary file might take longer to read.
If someone has some spare time, perhaps one could use exceptions to
trap end of file checks here versus doing a primitive call.. Just a
thought.
If Squeak had buffered file io in the smalltalk layer this type of
test would of course run faster. So I wonder what Python does, any
one have the source code to look at? Bet they do buffered io, large
reads, then passing data, then buffering output, with large writes.
--
--
===========================================================================
John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
===========================================================================
More information about the Squeak-dev
mailing list
|