Since I'm fiddling with file primitives I'd better ask.
Are we missing something at the primitive layer? At this point we should be able to open, close, read, write, seek, truncate, and flush. Soooo what have we forgotten?
John M McIntosh wrote:
Since I'm fiddling with file primitives I'd better ask.
Are we missing something at the primitive layer? At this point we should be able to open, close, read, write, seek, truncate, and flush. Soooo what have we forgotten?
Sometimes I'd like to have functionality to read/write Mac resources forks and finder attributes beyond the type/creator stuff, but that's pretty low-priority. It would be nice for installers or archivers, though.
Hans-Martin
John,
If you could consider adding memory mapping for systems which support it, that would be WONDERFUL. I posted, last night, the outline for doing in on Win32/CE, and the code is similar for *nix using mmap.
--- Noel
John M McIntosh johnmci@smalltalkconsulting.com is widely believed to have written:
Since I'm fiddling with file primitives I'd better ask.
Are we missing something at the primitive layer? At this point we should be able to open, close, read, write, seek, truncate, and flush. Soooo what have we forgotten?
A decent way to handle file attributes such as type/creator/permissions. The lack of this is why I had to munge up a fudgy FileCopyPlugin to help the VMMaker.
tim
Are we missing something at the [file] primitive layer?
A decent way to handle file attributes such as type/creator/permissions.
But how do you describe this in a way that is portable between Unix file permissions and SMB (Samba, Win32, OS/2) ACLs?
Hans-Martin Mosner asked about Resource Forks. OS/2 supports the notion of extended attributes. These are similar to Resource Forks: data "associated with" a file, and accessed as a key-value pair. The Windows Registry system also supports key-value access, although the key-space is hierarchical instead of flat.
Let me toss out a strawman proposal to collectively address these issues as follows: for each file there can be an optional key-space associated with it (a Dictionary in Smalltalk terms) containing meta-data.
On the Macintosh, this key-space would provide access to the resource fork. On OS/2, this would provide access to extended attributes. On all systems, security properties can be mapped into this meta-data key-space using reserved keys. Systems that support multiple data forks could map them using reserved keys associated with streams.
For that matter, the approach could be reused to cover the Windows Registry API, in as much as the hierarchical nature of the Windows Registry can mapped as a dictionaries of dictionaries, although overriding the access methods to support direct use of the hierarchical keys would be a goodness.
I realize that not everything here is a primitive, but the proposal directly addresses the security and resource/attribute issues raised for inclusion as primitives.
--- Noel
"Noel J. Bergman" noel@devtech.com is widely believed to have written:
Are we missing something at the [file] primitive layer?
A decent way to handle file attributes such as type/creator/permissions.
But how do you describe this in a way that is portable between Unix file permissions and SMB (Samba, Win32, OS/2) ACLs?
Dunno, which is why I simply asked for it instead of suggesting how to do it.
At the simplest level I would just like to see it being possible to get the files attributes (whatever may be applicable to any particular platform) in some manner (maybe just a ByteArray so there is no expectation of user visible semantic content) and then be able to apply them to another file (typically the copy I've just made in another place) in the most sensible manner we can come up with. I suspect that there are reasonably useful treatments of permissions we can make portable, but doubt that type/creator stuff could be dealt with cross platform. Maybe, just maybe, some sort of MIMEtype mapping can be done sometimes.
tim
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Friday 25 May 2001 03:30, John M McIntosh wrote:
Since I'm fiddling with file primitives I'd better ask.
Are we missing something at the primitive layer? At this point we should be able to open, close, read, write, seek, truncate, and flush. Soooo what have we forgotten?
Lock.
- -- Robert M. Lefkowitz r0ml@alum.mit.edu 845 358 7586
Ok, I do have code from C. Keith Ray to do the resource fork thing, but that is mac specific, but let's have Noel J. Bergman hash out the details of doing platform specific file features.
The best suggestion is from Robert M. Lefkowitz , that is support for lock/unlock. I'll consider lockf(3)
John M McIntosh wrote:
Since I'm fiddling with file primitives I'd better ask.
Are we missing something at the primitive layer? At this point we should be able to open, close, read, write, seek, truncate, and flush. Soooo what have we forgotten?
I would second lock (this all seems too good to be true!)
I haven't looked at the latest file primitives recently, so the following is off the top of my head.
Here is the docs on the Python file object: http://www.python.org/doc/current/lib/bltin-file-objects.html It also has isatty(). Here is the docs on the lower level Python file descriptor object: http://www.python.org/doc/current/lib/os-fd-ops.html (It says dup is available on Macintosh, Unix, Windows). Here is higher level file system info in Python: http://www.python.org/doc/current/lib/os-file-dir.html
Some other ideas:
While lockf(3) would work http://www.mcsr.olemiss.edu/cgi-bin/man-cgi?lockf+3 it would be nicer to lock separate from seeking. For example, an application might try for locks on several records in different places before actually starting to move around and do any writing. Depending on how the OS implements locks, there might be an avoidable overhead to seek first. This would entail passing the function the place to lock from and the length. Note, it is not necessarily an error to lock some section past the end of the file (if you are planning to write there eventually) but it might be an error to seek there. I guess usually the locked section for new data may be continuos with the end of the file, so lockf might suffice (as long as it doesn't try to do end-of-file error checking).
Is "tell" worthwhile as a standalone, as opposed to being always a byproduct of seek to the current position? I would think it might be.
It would be nice to get the file size without seeking to the end and doing a tell (which might actually cause a delay for a disk seek). Algorithms manipulating files that can be used from multiple processes might need to check the file size a lot.
Do all platforms support both 32bit and 64bit seek positions (and by extension locks, setSize, and so on)? For a big database, doing 64bit seeks is important. I might point out IBM is predicting 400GB desktop drives in a couple years using "pixie dust" (ruthenium) http://news.cnet.com/news/0-1003-200-5976693.html and it would be a shame if Squeak was stuck at a 4GB limit for files.
Some unix types love to duplicate file handles (typically used when processes get forked -- sometimes for interprocess communication, sometimes for parallel processing on the same file). This may not apply that much in Squeak (isn't all file handling within the main Squeak thread?)
It would be nice to get the name and path of an open file (but this is not essential).
It would be nice to specify whether to wait or not for the file operation (seek, read, write) to complete. But if you do not want to wait, then there is the issue of callbacks or some status object to poll (ick).
Is is possible to force a file to be closed even if there are errors? Similarly, is is possible to check if a file is still open, or find out other metadata about it (read only, write only, shared)?
Can we tell if we are at the end of a file easily?
Can we open files in exclusive use mode or shared mode (or various types of sharing like shareDenyNone, shareDenyWrite, shareDenyRead)? Can we open files that are forced to always only append? Can we change the file sharing mode or the r/w status while we are using it (probably not)?
It's not exactly a file function, but can we easily rename and delete files across platforms?
In general, the more fine grained the file primitives and opening and reopening choices are, the more room there is for an optimization layer underneath them. This is as opposed to, say, smushing seek and tell together, or seek and lock, where the layer underneath may otherwise not know you are, say, just seeking so you can do a lock (in which case it might not need to move the read/write head). Likewise, it would be faster if the platform supported it to just test for eof instead of seeking to the current position, getting the value, seeking to the end of the file, getting that value, comparing the values, and going back to where you were at the start.
It's near certain not every platform can do all things. For platforms that cannot support some specific functions, it would be nice to be able to query which of the primitives were implemented and to what extent. For example a database requiring locks for transactions could check whether the platform it was running on supported them. An algorithm could check if asynchronous I/O was allowed. A program could check if 64bit file offsets were allowed. For platforms that implement things like lock as a something also requiring seek, that information might be recorded as well. Perhaps this could just be a primitive returning a 30bit word indicating by bits the presence or absence of file features (although perhaps it would take more to do this, like perhaps a primitive to query a capability (by index) and get a 30bit value back for just that capability).
-Paul Fernhout Kurtz-Fernhout Software ========================================================= Developers of custom software and educational simulations Creators of the Garden with Insight(TM) garden simulator http://www.kurtz-fernhout.com
On Fri, May 25, 2001 at 11:07:26PM -0400, Paul Fernhout wrote:
John M McIntosh wrote:
Some unix types love to duplicate file handles (typically used when processes get forked -- sometimes for interprocess communication, sometimes for parallel processing on the same file). This may not apply that much in Squeak (isn't all file handling within the main Squeak thread?)
If you need this, it's already in the unix-specific part of OSProcess. And it is quite easy to add other platform-specific things like this to Squeak as needed.
Dave
p.s. Still looking for volunteers to do the Mac and Win ports of OSProcess.
Since I'm fiddling with file primitives I'd better ask.
Are we missing something at the primitive layer? At this point we should be able to open, close, read, write, seek, truncate, and flush. Soooo what have we forgotten?
These tend to be pretty platform specific in details but the following are pretty important at times and some could have platform independent interfaces:
1) issue IOCTL's, many device files need you to use these for a range of stuff, with very platform specific data formats
2) seek to 64-bit offset's, many OS's support this and things like video files often need bigger than 32-bits of offset
3) query file/filesystem characteristics, for example on the Windows platform, you have to ask the filesystem that contains a specific filename if it's case sensitive as different filesystems vary
4) validation of filenames is also dependent on the filesystem, so should be handled by a component that get's it right, generally a filesystem API
5) your list didn't include locking, both byte ranges and total file sharing control like read-only, exclusive access, shared read-write
6) cancel a read/write, many OS's allow you to cancel the I/O on say a device file which has stopped talking to you, this assumes I/O doesn't block the whole process
7) controlling buffering, on many platforms unbuffered I/O has dramatically different performance characteristics than normal buffered I/O, it also often has requirements like read/write in only disk sector size blocks
8) physical media control, like eject media, load media, lock/unlock media are especially important for device files
9) dynamic device detection, a number of OS's allow hardware to dynamically show up and disappear, for example flash meda cards or USB disk drives, informing an application the device configuration has changed is often pretty important to get the UI to work as expected
10) query/set arbitrary attributes with name value pair, normal attributes are things like modification date, but many OS's have attributes like file creator, or security attributes/backup status/compression/encryption, one could also view the normal data as just a really big attribute, so perhaps the open call should just specify an additional parameter for which stream/attribute it get's connected to
- Jan
Jan Bottorff janb@pmatrix.com wrote:
Since I'm fiddling with file primitives I'd better ask.
Are we missing something at the primitive layer? At this point we should be able to open, close, read, write, seek, truncate, and flush. Soooo what have we forgotten?
Hmm, actually there are two kinds of #flush -- one that makes sure the data gets into the OS, and one that makes sure the data gets onto the disk. The latter is useful for high-reliability servers. The function is called "sync" on Unix, but I prefer "uberFlush", or maybe "flushToDisk" depending on your mood. :)
But having a real #flush at all is already a big step forward. Yay!
Lex
[ On Monday, May 28, 2001 at 11:51:43 (-0500), Lex Spoon wrote: ]
Subject: Re: pending file primitives (EH?)
Hmm, actually there are two kinds of #flush -- one that makes sure the data gets into the OS
I.e. empties any buffers in the VM....
and one that makes sure the data gets onto the disk. The latter is useful for high-reliability servers. The function is called "sync" on Unix, but I prefer "uberFlush", or maybe "flushToDisk" depending on your mood. :)
Using sync() is a bit of over-kill, and won't work on all platforms (certainly it doesn't provide any guarantees at the time it returns!) In fact it's generally considered bad manners for any application to directly call sync() -- the system should be left to its own for this.
However fsync() [originally from 4.2BSD] "normally" has the desired effect (though of course there are still no absolute guarantees).
To be really sure that all data is committed to stable storage as early as possible it's usually best to use the O_SYNC flag on open() [if your current target platform supports it, of course]. On some systems you can even go completely nuts and combine in O_RSYNC to be sure that the file's new access time is committed to stable storage before the read() completes! ;-)
The trick with O_SYNC is that it then allows a program using stdio to use the normal stdio buffering as appropriate and to just call fflush(), not only flush its stdio buffers but to also ensure any data written has been been committed to stable storage if necessary. It gives you the best of both buffering and integrity all under the direct control of the application.
Which reminds me. An application doing fixed-record I/O with stdio will usually want to adjust the stdio buffer to be some exact multiple of the record size, especially if using O_SYNC since the last thing you want is for stdio to flush a partial record out behind your back....
I don't know how this might all translate into smalltalk primitives, or whether it maps up to FileStream or not.....
squeak-dev@lists.squeakfoundation.org