pending file primitives (EH?)

Paul Fernhout pdfernhout at kurtz-fernhout.com
Sat May 26 03:07:26 UTC 2001


John M McIntosh wrote:
> 
> Since I'm fiddling with file primitives I'd better ask.
> 
> Are we missing something at the primitive layer? At this point we
> should be able to open, close, read, write, seek, truncate, and
> flush. Soooo what have we forgotten?

I would second lock (this all seems too good to be true!)

I haven't looked at the latest file primitives recently, so the
following is off the top of my head.

Here is the docs on the Python file object:
  http://www.python.org/doc/current/lib/bltin-file-objects.html
It also has isatty().
Here is the docs on the lower level Python file descriptor object:
  http://www.python.org/doc/current/lib/os-fd-ops.html
(It says dup is available on Macintosh, Unix, Windows). 
Here is higher level file system info in Python:
  http://www.python.org/doc/current/lib/os-file-dir.html

Some other ideas:

While lockf(3) would work
  http://www.mcsr.olemiss.edu/cgi-bin/man-cgi?lockf+3
it would be nicer to lock separate from seeking. For example, an
application might try for locks on several records in different places
before actually starting to move around and do any writing. Depending on
how the OS implements locks, there might be an avoidable overhead to
seek first. This would entail passing the function the place to lock
from and the length. Note, it is not necessarily an error to lock some
section past the end of the file (if you are planning to write there
eventually) but it might be an error to seek there. I guess usually the
locked section for new data may be continuos with the end of the file,
so lockf might suffice (as long as it doesn't try to do end-of-file
error checking).

Is "tell" worthwhile as a standalone, as opposed to being always a
byproduct of seek to the current position? I would think it might be.

It would be nice to get the file size without seeking to the end and
doing a tell (which might actually cause a delay for a disk seek).
Algorithms manipulating files that can be used from multiple processes
might need to check the file size a lot.

Do all platforms support both 32bit and 64bit seek positions (and by
extension locks, setSize, and so on)? For a big database, doing 64bit
seeks is important. I might point out IBM is predicting 400GB desktop
drives in a couple years using "pixie dust" (ruthenium) 
  http://news.cnet.com/news/0-1003-200-5976693.html
and it would be a shame if Squeak was stuck at a 4GB limit for files. 

Some unix types love to duplicate file handles (typically used when
processes get forked -- sometimes for interprocess communication,
sometimes for parallel processing on the same file). This may not apply
that much in Squeak (isn't all file handling within the main Squeak
thread?)

It would be nice to get the name and path of an open file (but this is
not essential).

It would be nice to specify whether to wait or not for the file
operation (seek, read, write) to complete. But if you do not want to
wait, then there is the issue of callbacks or some status object to poll
(ick).

Is is possible to force a file to be closed even if there are errors?
Similarly, is is possible to check if a file is still open, or find out
other metadata about it (read  only, write only, shared)?

Can we tell if we are at the end of a file easily?

Can we open files in exclusive use mode or shared mode (or various types
of sharing like shareDenyNone, shareDenyWrite, shareDenyRead)? Can we
open files that are forced to always only append? Can we change the file
sharing mode or the r/w status while we are using it (probably not)?

It's not exactly a file function, but can we easily rename and delete
files across platforms?

In general, the more fine grained the file primitives and opening and
reopening choices are, the more room there is for an optimization layer
underneath them. This is as opposed to, say, smushing seek and tell
together, or seek and lock, where the layer underneath may otherwise not
know you are, say, just seeking so you can do a lock (in which case it
might not need to move the read/write head). Likewise, it would be
faster if the platform supported it to just test for eof instead of
seeking to the current position, getting the value, seeking to the end
of the file, getting that value, comparing the values, and going back to
where you were at the start.

It's near certain not every platform can do all things. For platforms
that cannot support some specific functions, it would be nice to be able
to query which of the primitives were implemented and to what extent.
For example a database requiring locks for transactions could check
whether the platform it was running on supported them. An algorithm
could check if asynchronous I/O was allowed. A program could check if
64bit file offsets were allowed. For platforms that implement things
like lock as a something also requiring seek, that information might be
recorded as well. Perhaps this could just be a primitive returning a
30bit word indicating by bits the presence or absence of file features
(although perhaps it would take more to do this, like perhaps a
primitive to query a capability (by index) and get a 30bit value back
for just that capability).

-Paul Fernhout
Kurtz-Fernhout Software 
=========================================================
Developers of custom software and educational simulations
Creators of the Garden with Insight(TM) garden simulator
http://www.kurtz-fernhout.com





More information about the Squeak-dev mailing list