[squeak-dev] FilePlugin IO performance stdio versus HANDLE (was: Adding fsync() call to the primitiveFileFlush prim ?)

David T. Lewis lewis at mail.msen.com
Mon May 23 00:24:18 UTC 2016


I think this is what I meant to suggest:

Somebody (tm) should try a new implementation of FilePlugin for the Unix VM,
and implement all of the IO in terms of e.g. read() and write() rather than
fread() and fwrite(). Then see which one works better in terms of real world
performance.

This new plugin would be conceptually similar to Andreas' Windows plugin,
which operates on a Windows HANDLE, similar to a Unix file descriptor.

In general, we can compare the Windows and Unix VMs to see which has the
better real-world file IO performance. But those VMs are different in many
other ways, so if we just want to know the difference between file IO written
to the stdio level versus file IO written to the descriptor/HANDLE level,
then a good way to do it would be to write such a plugin for the Unix VM
and see if it is better or worse than the current stdio implementation.

This might be a good student project, or maybe a hobby project for a Sunday
Squeaker with more free Sundays than I have at the moment.

Dave



On Sun, May 22, 2016 at 12:25:49PM -0700, Eliot Miranda wrote:
> Hi David,
> 
> > On May 21, 2016, at 9:45 PM, David T. Lewis <lewis at mail.msen.com> wrote:
> > 
> >> On Sat, May 21, 2016 at 09:47:07AM -0700, marcel.taeumel wrote:
> >> Hi Tim,
> >> 
> >> in Windows, this is called FlushFileBuffers, I guess:
> >> https://msdn.microsoft.com/en-us/library/windows/desktop/aa364439%28v=vs.85%29.aspx
> > 
> > Slightly off topic, but worth mentioning: The implementation of FilePlugin
> > for Windows operates on HANDLE references to files, which I believe are roughly
> > equivalent to file descriptors on Unix. Thus the Unix VMs are written to the
> > higher level stdio interface, and the Windows VM uses a more direct lower level
> > IO strategy. I have always wondered which of the two approaches (low level
> > HANDLE/descriptor versus higher level buffered stdio) produces better overall
> > performance for Squeak.
> > 
> > One way to answer the question would be to implement a FilePlugin for Unix
> > VMs with all of the IO done at the descriptor level. Specifically, a
> > SQFile->file would be a reference to an integer file descriptor (similar to
> > a Windows HANDLE), and the platform support code would operate against
> > file descriptors rather than (FILE *) references.
> 
> IME this depends on two things, whether the in-image implementation (StandardFileStream et al) is buffered or not, and whether the system provides proper finalization or simply post-mortem finalization.  Both interact.
> 
> If the image level implement ration is not buffered then the VM needs to provide it.  This is essentially our case; the problem being that external calls are relatively slow.  If buffered, then if finalization is performed on a post-mortem copy, close via finalization cannot flush unless the post-mortem copy is updated after every write, cuz it will flush stake data.
> 
> So the design we want, that we should aim towards
> - does all buffering in the image
> -uses ephemerons to finalize the actual file so that valid data is written in close via finalization.
> 
> With this approach the "FilePlugin" provides only the slimmest of interfaces to the OS's open, close, read, write and seek primitives, and as Tim has pointed out there are advantages in it providing single calls that combine seek;read and seek;write, eg see the current conversation about read-only file copies and the debugger (although I think my suggestion of substituteReadOnlyCopyWhile: is better).
> 
> > 
> > Doing a reimplementation of FilePlugin for Unix is probably not a huge
> > project, but I have never gotten around to trying it.
> > 
> > Has anyone else wondered about this? Which is better, the Windows VM file
> > strategy, or the Unix VM file strategy?
> > 
> > Dave
> > 
> >> 
> >> MSDN also suggests to use unbuffered I/O instead of calling such a flush
> >> function too often. What are our options to control buffered vs. unbuffered
> >> from Squeak land?
> >> 
> >> https://support.microsoft.com/en-us/kb/99794
> >> https://msdn.microsoft.com/en-us/library/windows/desktop/cc644950%28v=vs.85%29.aspx
> >> 
> >> On what media is the data stored? I think that you cannot be 100% sure to
> >> have all data written after some function call returns because some details
> >> are out of reach for user applications. Think of some USB driver that needs
> >> just two more cycles to finish writing... I am no expert there but it seems
> >> tricky to find the correct point in time to turn the power off. Regular OS
> >> shutdown seems more appropriate...
> >> 
> >> Best,
> >> Marcel
> 
> _,,,^..^,,,_ (phone)


More information about the Squeak-dev mailing list