Re: [BUG] file Truncate is it really busted? - Squeak-dev

29 Nov 2001

      woods@weird.com (Greg A. Woods) wrote:
    > Subject: Re: [BUG] file Truncate is it really busted?
    >
    > woods@weird.com (Greg A. Woods) replied to my explanation of the
    > 'f' prefixes in C file handling functions.
    > 
    > 	Well there's also the fact that the two uses of the 'f' prefix are
    		  vvvvvvvvvvvvvvvvvvvvvvv
    > 	really in fundamentally different levels of the API -- or depending on
    		  ^^^^^^^^^^^^^^^^^^^^^^^
    > 	your point of view maybe even completely different APIs.
    > 
    > Fundamentally different?

    Note I said "completely different", not "fundamentaly different"  ;-)

Nope.  You wrote _both_.
>  Some of them are in the kernel, and some of them
    > are in the C stdio library,

That is to SUPPORT your claim that they are at 'different levels of
the API'.
Well on most unix and unix-like systems both the kernel system call
    functions and the stdio functions are in libc.
Half true.  Glue code for the system calls can be found in libc.
There is an observable difference.  If you use dis(1), you can find out
what the actual code of fopen() is, and you can step through it in the
debugger.  You can't do that with open().
But again, that's to SUPPORT your claim.
If the system call interfaces had ever been supplied separately
    it might have been in a library called libsys.  Libc has become
    a mess of a whole slew of other very much unrelated APIs.
> but from the point of view of your average UNIX
    > programmer this is a distinction without a difference.

    Perhaps that's true, though I don't really believe anyone with
    even a small amount of C programming experience could confuse
    the set of kernel interface functions that deal with file
    descriptors and the set of "higher" stdio functions that deal
    with the buffered file I/O using pointers to structures in which
    the library maintains internal state.

It's obvious you don't teach students.  Believe it.  In many ways the
stdio functions are NOT "higher"; a lot of the interesting stuff like
locking can't really be done through them.
Let's face it, it's pretty hard to argue that "dealing with files"
amounts to "completely different APIs", especially when the names
are by design very very similar.
This thread started because someone *WAS* confused by the "f" prefix,
and expected a convention from one part of the UNIX I/O library to
apply to a function that actually came from another part of the UNIX
I/O library.
>  open() and fopen()
    > are *both* in POSIX.

    Yeah, sure, but that's really got nothing whatsoever to do with their
    differences.....  POSIX defines the API for many unrelated sets of
    functions.

Yes it has.  They are both "open a file" functions and they are both
in all the standards that include "open".  POSIX defines one API that
covers many topics, and the fact that there is not a consistent naming
convention in that single API is precisely what this thread is about.
>  A UNIX look-alike could quite legitimately place
    > fopen() in the kernel and open() in a library, just like it is on my Mac
    > at home.  (Yep, Think C layered the UNIX functions on top of the stdio
    > functions, and they were layered on top of the MacOS ones.)

    Hmmm... that's not really an important distinction in a single user
    system where the hardware doesn't properly isolate the data and code of
    the system from the data and code of user programs.
Nothing in POSIX requires the hardware to isolate code and data.
Nothing at all.  I have used a "UNIX" implementation where everything
was mapped into a single address space.  This had snags, like you couldn't
save pointers into a file and load them back later, because every time a
program was run it might start at a different address.
The implementation of a difference between lower-level "kernel"
    file access functions and a layer of buffered I/O functions in a
    single address space, especially in C or something even
    lower-level, kind of makes all these distinctions artificial.

I think that was my point.  "all these distinctions artificial" doesn't
sound much like "fundamentally different ... completely different".
> Yes, I know.  That's pretty much what I said.  That's the PROBLEM.

    I don't see how there can be a problem so long as you keep a separate
    view in your mind of the kernel API and the stdio API, and never attempt
    to mix the two without first gaining a deep understanding of the
    potential interactions of the particular mix you contemplate.

The whole point of this thread is that the call to ftruncate() in
a version of Squeak is wrong because the author saw the "f" prefix and
thought it meant "takes a FILE* instead of an int" when in fact it
meant "takes an int instead of a char*".
In POSIX as it stands now, there _isn't_ any "kernel API" and there
_isn't_ any "stdio API", there is _one_ API with many functions,
including two different ways to access files, and for nontrivial
applications you have to use both of those ways, because each of them
can do something the other can't.
It seems as though having a "direct" and a "buffered" layer is something
operating system designers think we can't do without:  VMS has both
RMS and QIO, and MVS has something similar.  On the other hand, the
B6700 MCP managed without such a distinction, and arguably CMS too.
There is an important design lesson here, which is that naming matters.
Since we're discussing this in a forum and in a context that
    relates to interfaces in an object-oriented system how about we
    simply declare that the lower level file-descriptor functions
    are in one class, and that the stdio functions are in another
    class (perhaps a richer class derived from the former one).
How about we don't?  C doesn't have classes; and the C++ interface to
POSIX doesn't do it that way.
Stdio provides buffering and formatting; descriptor I/O doesn't.
Descriptor I/O provides memory mapping, locking, truncation, and
synchronisation; stdio doesn't.  It is very hard to argue that one
is richer than the other.  This is part of the problem (the one that
actually happened, remember?)  Many programs in a UNIX environment
require *both* layers, sometimes with the same file.
>  The getchar(), getc(), putchar, putc(), printf(), and scanf()
    > functions, amongst others, operate on the internal elements of a FILE
    > structure, but don't have a FILE * argument.

    Ah, now you're taking the analogy backwards and far too far.

No, I am (a) exhibiting a logical error in your argument,
and (b) pointing out that it is this very thinking "'f' prefix means
FILE* argument" that actually led to a real programming error in a
version of Squeak.

    You're perhaps confusing naming conventions between two unrelated APIs.
    There's no direct connection between the 'f' prefix and the fact that
    the function is a stdio function.
This is very insulting.  My first posting in this thread made it quite clear
that I understood the distinction.  The posting to which Woods was replying
made it even clearer that I understood stdio quite thoroughly.
I am not confusing naming conventions between two unrelated APIs.
SOMEONE *ELSE* was confused by the fact that two STRONGLY RELATED
parts of the unified POSIX API used the same prefix for opposite
purposes.
> I'm sorry, but this is precisely why it WOULD make sense to have fftruncate(),
    > because ftruncate(fileno(fp)) just plain doesn't work.  I *think* the
    > following code will work, but of course it isn't portable to systems that
    > don't have f{un,}lockfile():

    Ah, I see -- you have some rather extravagant expectations!  :-)

I don't regard "it should work correctly" as extravagant.
'ftruncate(fileno(fp))' does in fact work, perfectly even -- it just
    doesn't leave the FILE* pointed to by 'fp' in any kind of useful state,
    and expecting it to do so is perhaps unrealistic since you're mixing too
    many operations between different levels.

WRONG.  That's not the point at all.  The point is that a call to
ftruncate(fileno(fp)) TRUNCATES AT THE WRONG PLACE!  If the last
operation was an output, the file may be left too short.  If the last
operation was an input, the file may be left too long.
Generally speaking any experienced Unix programmer will endeavour to
    never mix stdio operations on a given file with low-level file
    descriptor I/O operations on the same file.
Endeavour, yes.  The point is (once again) that POSIX provides in its
unified API two different ways of accessing files, neither of which
provides all the operations that the other does, using inconsistent
naming conventions, and that sometimes you need to do an operation from
one set to a file that was opened using the other set, and this has in
recent days led to an actual mistake by an actual person OTHER THAN ME
in Squeak.
There is a practical issue for Squeak, and a design issue.
Practical issue:
    Run the VM through lint or lclint, and track every issue to its
    cause.  Be very very careful of functions that start with 'f'.
    Watch for mixing approaches:  UNIX systems often have more than
    one way to do threads, or more than one way to do semaphores,
    or more than one way to do memory mapping.
Design issue:
    The only thing worse than the POSIX API is the Windows API.
    We must imitate neither.
As Smalltalk programmers, we need to be aware of common protocols,
    and avoid names like 'next' or 'size' unless our methods do what
    someone else familiar with the protocol would expect; and if we
    have something where a common name would make sense, we should not
    invent another.
By now Squeak has about as many classes as UNIX has functions,
    so modules are coming along just in time to avoid naming issues
    like this for classes.  A great big SHOUT of THANKS to the people
    working on modules!