FileDirectory>>fileExists: (was: Re: [BUG]Unable to load BFAV, various problems )

Ned Konz ned at bike-nomad.com
Thu Apr 22 23:43:20 UTC 2004


On Thursday 22 April 2004 3:35 pm, Colin Putney wrote:
> I'm not sure I understand what the problem is.

The problem is that we don't just want to *represent* filenames, we actually 
have to do some manipulation with them. Or we want to transport them between 
file systems, platforms, or between systems with the same kind of file system 
but different character encoding. Or we want to save them in the image and 
then expect to use them later.

These things are all trouble prone.

Look at the kinds of things we do with filenames:

* truncate change set names
* construct sequentially named files
* copy files between directories
* construct names for new files
* construct new directories (Squeaklets, etc.) with fixed names
* map filenames to file://URLs
* map file://URLs to filenames
* map filenames between platforms (like between the canonical Zip 
representation and the native representation)
* do logic to convert absolute names to relative ones and vice versa, which 
requires:
* comparing prefixes of path names
* traversing directory hierarchies, which can be more inefficient (or can 
recurse forever) in the case of symlinks

For instance, there's no guarantee that you can copy a file from one point to 
another in a directory hierarchy without changing the filename somehow. Or 
that the rules for mapping a URL to a filename would be the same for all the 
components of the hierarchical name.

I'll give an example: you have a Mac, and you've mounted an old floppy on the 
desktop. So now you have an HFS+ file system down to the desktop, and then 
have an older file system (with different rules) from the root of the floppy 
on down.

Similarly, a Novell file system mounted on a Windows box could have a concept 
of 'extensions' that would limit your ability to have multiple periods in a 
file name, even though parent directories could have that kind of filename.

So you couldn't be assured that you could use the same filename in two 
different levels of the same file hierarchy.

> Clearly, there are some filenames/locations that will be valid on some
> platforms but not others. There's nothing we can do about that. At the
> same time, we might not be able to take full advantage of a particular
> file system's capabilities. A file system might be able to have Unicode
> filenameds, but we're still going to be representing them in MacRoman
> in Squeak. (For now at least, the m17n stuff might change that.) So?

And we may have mixtures of encodings and other file name rules, as well.

> If the user tries to create or access a file with a name that's not
> valid on the current platform we throw an error. What's wrong with
> that? If the filename isn't valid, there's not going to be a file there
> anyway, right?

Not necessarily. For instance, many file systems will do (possibly 
non-reversible) re-mapping of characters (case conversions, encoding changes, 
truncations, etc.). So you get your file created but the name you created it 
with doesn't match the name you get when you enumerate the directory.

> Are you suggesting that a sequence of tokens is not sufficient to
> represent the location of a file on some filesystems? That *would* be a
> show stopper for this scheme.

I'm saying that if you throw away the context that you got those tokens from 
you can't do sensible manipulation of them later. Especially when the file 
system in question is temporarily unavailable (off-line volumes, etc.). So if 
you want to do anything but save file names and use them later on the same 
system without manipulation, you will have thrown away some of the 
information you need to have to do the job.

> I *think* that things like volumes and mount points can be encoded in
> the sequence of tokens. Is there a reason #('Macintosh HD' 'System
> Folder' 'Finder') couldn't be treated by the HfsFilePeer as a volume
> name followed by a folder name and a file name?

It would only be able to do this if it had the same volumes around to inspect. 
And you'd have to flag that sequence as being absolute, as well, because 
there's no way to tell whether the first string referred to a volume or a 
subdirectory of something else.

> Also note the difference between relative and absolute file references.
> There's a good chance an absolute file reference won't be valid if you
> move the image to another machine, let alone another platform. So we're
> going to have to throw FileNotFound exceptions. And if we're doing
> that, why not InvalidFilename as well?

You'd also have to record the difference between relative and absolute, which 
requires more than just a sequence of tokens.

> OTOH, a it's perfectly reasonable to have a reference relative to the
> image keep working if you move both the image and the referenced file
> to another machine running a different OS. This is a common case that
> is broken now, but would work fine with the sequence-of-tokens idea.

Other than path delimiter changes, how would the sequence of tokens idea help?

We already can map path delimiters and pathnames (the VMs and Smalltalk code 
do this with varying degrees of success), but we don't always get it right.

But there are lots of cases where one of the tokens wouldn't represent a valid 
path component on another system (or even the same system on a different day 
with different volumes mounted!).

I'd recommend that we store more context than just a sequence of Strings.

At least we should be able to know:

* is this an absolute pathname?
* which of the components represent mounted volumes?
* which of the components represent directories?
* what are the rules for file naming at each level of the hierarchy (which is 
required if you're doing filename construction or copying)?
* what is the character encoding (of course, the String can store this 
information)?

We need to decide:

* how good do we want to be about protecting the users from common mistakes 
and problems? An example of an attempt at this is in the warning on change 
set name length, another is in the filename cleanup code that maps some 
characters into others.

* do we want to know anything about (possibly offline) volumes (removable, 
network, etc.)? If we do, we can prompt to have the volumes put online, or 
can take other actions to do so (i.e. mounting a network share).

* how well do we want to be able to handle links, aliases, and other such 
things?

* do we want to behave differently when traversing the hierarchy when we get 
to a potentially slow (remote or floppy or CD) file system?

* what about permissions? An obvious problem with the logic in fileExists: is 
that a file can exist but not be readable. So you can see its name when you 
enumerate the directory but then fail the readability test. What about 
write-only files?

-- 
Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE



More information about the Squeak-dev mailing list