[VM] non-MacRoman file names

Richard A. O'Keefe ok at cs.otago.ac.nz
Thu Nov 27 03:11:43 UTC 2003


"David T. Lewis" <lewis at mail.msen.com> wrote:
	I think that this is really an attibute of the file system, rather than the
	operating system per se, but it might be reasonable to query the OS about its
	assumptions.  On unix and/or posix flavored systems, maximum file length seems
	to be given by FILENAME_MAX.
	
	  #include <stdio.h>
	  main() {printf("FILENAME_MAX is %d\n", FILENAME_MAX);}
	
Ah.  That actually has nothing to do with Unix; FILENAME_MAX is part
of the C library and is required by C89 and C99.  It must be provided
on all systems.

What *POSIX* provides is the function pathconf().

    pathconf(directory, _PC_NAME_MAX)
	the length in bytes of the longest simple name that can be
	created in the given directory.
    pathconf(directory, _PC_PATH_MAX)
	the length in bytes of the longest pathname that can be used
	relative to the given directory.

	In addition, various header files make reference to PATH_MAX and related
	definitions (255 is a commonly used value).

It is in fact a sort of lower bound; for the utmost in networked file system
precision, pathconf() is the business.

	If I'm interpreting this correctly, you can think of a path as a
	string of arbitrary length delimited by path separators ($/ for
	unix), with the only hard limit being that of the FILENAME_MAX.
	Presumably the components of a path string are themselves file names,
	each of length FILENAME_MAX or less.
	
>From a UNIX manual page:
     The integer constant FILENAME_MAX specifies  the  number  of
     bytes  needed to hold the longest pathname of a file allowed
     by the implementation.  If the system does not impose a max-
     imum  limit, this value is the recommended size for a buffer
     intended to hold a file's pathname.

So FILENAME_MAX is *not* a hard limit; there can be file systems with *no*
limit (or no practical limit) and in that case FILENAME_MAX is still
defined, it just isn't the limit.

	Other operating systems handle paths and folders quite differently, but
	it may turn out that FILENAME_MAX is only thing worth asking the operating
	system about, and the rest could be handled on the fly.
	
Note that using FILENAME_MAX does *not* involve asking the operating system.
It's a compile-time constant.  Code compiled on system X and installed on
system Y may be wrong.  Some of these limits are configurable; some are
even changeable without rebotting.  So system X might have
FILENAME_MAX == pathconf("/", _PC_PATH_MAX) == 255, while system Y,
running the same release of the same operating system on the same kind
of hardware, might have pathconf("/", _PC_PATH_MAX) == 1024.  System X
and system Y might even be the _same_ system before and after a reconfiguration
and reboot.

Practically the only thing you can be sure of is that FILENAME_MAX will
not be too big (and I'm not sure of that).

To give a specific example, I just compiled this little test program:	
#include <stdio.h>
#include <unistd.h>

int main(void) {
    printf("pathconf: %ld; stdio.h: %ld\n",
        pathconf("/", _PC_PATH_MAX), (long)FILENAME_MAX);
    return 0;
}

Outputs:
    pathconf: 1024; stdio.h: 1024   (UltraSPARC, Solaris 2.9)
    pathconf: 1023; stdio.h: 255    (Alpha, OSF/1 Tru64 UNIX V5.1)

Note, and this is the scary part, that FILENAME_MAX is 255 on the
Alpha, but pathconf("/", _PC_PATH_MAX) is 1023.  	

If a compile-time bound is needed, this seems to be the safest code:

    #include <stdio.h>
    #ifdef FILENAME_MAX
    #if FILENAME_MAX > 1024
    #define FILE_NAME_BYTES_LIMIT FILENAME_MAX
    #endif
    #endif
    #ifndef FILE_NAME_BYTES_LIMIT
    #define FILE_NAME_BYTES_LIMIT 1024
    #endif

But as I said before, for real networked file system precision,
pathconf() and its sibling fpathconf() are the only game in town.
(Lessee, with UTF-8 some plausible characters can be 3 bytes; it
wouldn't be too hard to hit that limit in some scripts...)




More information about the Squeak-dev mailing list