Writing a large Collection of integers to a file fast
David T. Lewis
lewis at mail.msen.com
Mon Jan 28 00:00:11 UTC 2008
On Sun, Jan 27, 2008 at 02:05:35PM -0800, John M McIntosh wrote:
> On Jan 27, 2008, at 9:38 AM, David T. Lewis wrote:
> >Which shows that for the particular VM and image that I was using,
> >the majority of the processing time was spent in multibyte character
> >conversion and conversion of integers to strings, and less than
> >seven percent was spent in I/O primitives.
> Actually if you open a FileStream you get a MultiByteFileStream
> If the stream is binary it invokes methods on the super class
> StandardFileStream to put a character or a collection of characters.
> However if it is text then it proceeds to read or write one character
> at a time causes yes a discrete file I/O primitive call.
> So say you need a UTF8 Stream and you have 1 million characters and
> you say
> foo nextPutAll: millionCharacterString
> This causes 1 million file I/O operations, that takes a *long* time.
Quite right. But however inefficient this might be, the file I/O
primitive call is still not the bottleneck. Most of the time (over
40% in my example) is eaten up in MultiByteFileStream>>nextPut:
of which a small portion is the actual I/O primitive.
I tried this test on an AMD-64 laptop running a Linux VM, and then
with a Windows VM on the same machine. The results were about the same
for both VMs (about 4% of the run time taken up in the I/O primitives),
so for this kind of file I/O there is no significant difference in
performance between a VM that uses buffered (stdio) I/O versus a VM
that uses lower level (write to HANDLE) I/O. Furthermore, the I/O
performance in the VM is only a small part of the overall file processing
time, dispite calling a primitive for every single character processed.
For the record, the test case was:
[aFilename := 'foo.txt'.
aLargeCollection := 1 to: 100000.
aFile := CrLfFileStream fileNamed: aFilename.
aLargeCollection do: [ :int |
aFile nextPutAll: int printString, String cr].
> In Sophie I coded a SophieMultiByteMemoryFileStream which fronts the
> real stream with a buffer the size of the stream, that way the
> Translators get/put bytes to the buffer, and at close time I flush
> the entire buffer to disk as a binary file. Obviously this is not a
> purpose solution since it relies on the fact in Sophie we know the
> UTF8 files we are working with will only be a few MB in size.
That sounds like a good idea.
More information about the Squeak-dev