File Performance Q.?

John M McIntosh johnmci at smalltalkconsulting.com
Fri May 10 21:07:02 UTC 2002


>>Squeak Tests
>>Squeak 3.2gamma4811
>>In a workspace
>>
>>Transcript show:
>>[file1 := CrLfFileStream new.
>>file2 := CrLfFileStream new.
>>file1 open: 'c:\jimmie\test\testfile.txt' forWrite: false.
>>file2 open: 'c:\jimmie\test\testfile2.txt' forWrite: true.
>>
>>[file1 atEnd] whileFalse:
>>[file2 nextPutAll: file1 nextLine.
>>  file2 nextPutAll: String cr].
>>
>>file2 flush.
>>file2 close.
>>file1 close] timeToRun.
>>
>>In the Transcript:
>>304141
>>298161
>>302502

Ah, well you see if you look at the handy source code what you'll see 
is that we are checking for end of file, and for cr and lf too, but 
doing this by reading and writting a single character at a time. This 
equates in your case to almost 15 million primitive calls. Of course 
reading and writting 1 byte at a time from a file isn't very fast and 
it depends on the VM implementation to boot.

For example on a 500mhz pb under os-x with mac vm 3.2.7b4 for a 
1,999,213 text file I get 13,707 milliseconds, since BSD unix (aka 
OS-X) is doing a bit of file buffering for me.

Now if I don't care about file sizes, or have lots of memory which is 
greater than any file I want to deal with then I could do
file2 nextPutAll: file1 contents. That takes 2,367ms

Mmm which seems a bit large. Ah CrLfFileStream when I ask for the 
contents, grabs it in a chunk then cheerfully spins though the data 
looking for CRLF and changes then to CR.

So if I use the StandardFileStream class instead, then do
file2 nextPutAll: file1 contents that takes 552ms
I'm sure at this point things have been cached by the BSD buffer 
system by now so an arbitrary file might take longer to read.

If someone has some spare time, perhaps one could use exceptions to 
trap end of file checks here versus doing a primitive call.. Just a 
thought.

If Squeak had buffered file io in the smalltalk layer this type of 
test would of course run faster. So I wonder what Python does, any 
one have the source code to look at? Bet they do buffered io, large 
reads, then passing data, then buffering output, with large writes.

-- 
--
===========================================================================
John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
===========================================================================



More information about the Squeak-dev mailing list