Writing a large Collection of integers to a file fast

David T. Lewis lewis at mail.msen.com
Mon Jan 28 00:00:11 UTC 2008


On Sun, Jan 27, 2008 at 02:05:35PM -0800, John M McIntosh wrote:
> 
> On Jan 27, 2008, at 9:38 AM, David T. Lewis wrote:
> 
> >Which shows that for the particular VM and image that I was using,
> >the majority of the processing time was spent in multibyte character
> >conversion and conversion of integers to strings, and less than
> >seven percent was spent in I/O primitives.
> >
> Actually if you open a FileStream you get a MultiByteFileStream  
> instance.
> 
> If the stream is binary it invokes methods on the super class  
> StandardFileStream to put a character or a collection of characters.
> 
> However if it is text then it proceeds to read or write one character  
> at a time causes yes a discrete file I/O primitive call.
> 
> 
> So say you need a UTF8 Stream and you have 1 million characters and  
> you say
> foo nextPutAll: millionCharacterString
> 
> This causes 1 million file I/O operations, that takes a *long* time.

Quite right. But however inefficient this might be, the file I/O
primitive call is still not the bottleneck. Most of the time (over
40% in my example) is eaten up in MultiByteFileStream>>nextPut:
of which a small portion is the actual I/O primitive.

I tried this test on an AMD-64 laptop running a Linux VM, and then
with a Windows VM on the same machine. The results were about the same
for both VMs (about 4% of the run time taken up in the I/O primitives),
so for this kind of file I/O there is no significant difference in
performance between a VM that uses buffered (stdio) I/O versus a VM
that uses lower level (write to HANDLE) I/O.  Furthermore, the I/O
performance in the VM is only a small part of the overall file processing
time, dispite calling a primitive for every single character processed.

For the record, the test case was:

  TimeProfileBrowser onBlock:
    [aFilename := 'foo.txt'.
    aLargeCollection := 1 to: 100000.
    aFile := CrLfFileStream fileNamed: aFilename.
    aLargeCollection do: [ :int |
      aFile nextPutAll: int printString, String cr].
    aFile close]

> In Sophie I coded a SophieMultiByteMemoryFileStream which fronts the  
> real stream with a buffer the size of the stream, that way the
> Translators get/put bytes to the buffer, and at close time I  flush  
> the entire buffer to disk as a binary file.  Obviously this is not a  
> general
> purpose solution since it relies on the fact in Sophie we know the  
> UTF8 files we are working with will only be a few MB in size.

That sounds like a good idea.

Dave
 



More information about the Squeak-dev mailing list