[Q] Isn't 'file://foo/bar' asUrl supposed to give a relative FileUrl?

Mon Aug 18 14:39:35 UTC 2003

Michael Rueger <michael at squeakland.org> wrote:
> Lex Spoon wrote:
> > Bert Freudenberg <bert at isg.cs.uni-magdeburg.de> wrote:
> > 
> > 
> >>>It's a useful extension.  Such a URL is not covered in the RFC's, but it
> >>>is useful and people do it.
> >>
> >>I don't think it's useful. I'd consider it even bad and confusing (like 
> >>this discussion clearly shows) because the power of URLs stems from the 
> >>simple fact that it is *only* a well-defined string. Also, the code 
> >>becomes rather ugly, with "isAbsolute ifTrue: [...]" scattered around 
> >>the place.
> 
> A few things from the RFC are below.
> The problem that I have with the current implementation is not that it 
> is incomplete, but it does too many things. URIs are basically strings 
> without any semantic meaning in respect to the scheme. The scheme only 
> becomes "meaningful" through its interpretation as a protocol. So the 
> right thing to do IM(H)O is to have "pure" URIs like in my package and 
> dispatch any operations to the protocol counter part, be it the file 
> system or an ftp or http client.
> 

I agree that URL or URI objects should be "pure", and that's how the
hierarchy in Squeak is actually designed.  In retrospect, I agree that
relative FileURL's are handled badly.  I now think that isAbsolute
should have been handled at parse time, because there is no point in
sending such a URL across the network.  At the time, that style of URL
was more common and I thought it would be better to represent it
accurately; on further thought, it is slightly more important that the
URL's themselves remain strictly correct.

But, I also think this kind of thing should be handled in the parser,
not the protocol handler.  I don't even understand how the protocol
handler *could* handle it.  If you have a parsed URL, then you have a
scheme already and so it's too late to make a decision about what to do
with 'www.google.com' or 'foo.txt'.  If there is going to be smart
parsing, then it seems like the parser needs to do it.

> The problems that are evident in all these hacks throughout the URL 
> classes stem from trying to interpret non-conforming URI constructs in a 
> "smart" way.  If you actually stick to the spec, especially the part
> 
>    Indeed, the
>     base URI is necessary to define the semantics of any relative URI
>     reference; without it, a relative reference is meaningless.
> 
> none of the discussed problems are an issue.
> E.g., 'g' asUrl -> http://g/  is one of these smart things that are just 
> plain incorrect.
> 

What do you mean by incorrect?  This behavior is correct according to
the letter of the RFC's, because the input is invalid to begin with. 
It's also correct according to the spirit of being generous in what you
accept, which is a spirit that goes back to the dawn of the Internet.

For the record, I don't think that an application should use such a
malformed URL, and I do wish that a strict parser were available in
addition to the loose one.  But if there is only one parser, it seems
like it may as well be tolerant.

Finally, nothing in the URL hierarchy that has the 'ls' initials on it
is a "hack".  I expected URL's to be a cornerstone of a WWW-friendly
Squeak, and thus I tried to be careful with both the design and the
implementation.  If the code is really so horrendous, please be gentle
in your depictions.  :)

I want Squeak to have the best code possible, and I promise I'm not just
being stubborn.  I am worried that carefully written code might be
getting tossed wholesale simply because a rewrite seems so much more
exciting.  When I see the word "hack" being thrown around, then that
worry becomes an outright suspicion.  Let's play nice and try to make
Squeak as good as possible.  Surely we should prefer carefully written
code that has stood 5 years of usage, until we have something specific
to improve on.  The problems being discussed are small and easy to fix;
is it worth it do swap them for outright bugs we'll have to track down?

At this point, I think the best steps forward would be:

	1. Modify FileURL so that isAbsolute is handled at parse time.

	2. Somehow make both strict and tolerant parsers be possible.  In
particular, there are three places to look at:  'www.google.com',
'file:foo.txt', and 'file:/foo.txt'. 

	3. Make FileURL a subclass of HierarchicalURL, so that it parses and
prints hostnames properly.  This is likely to cause bugs.  :(

Oh, and:

	4. Clean up FileURL in general.  In particular, its absoluteFromText:
should probably not exist, and the pathForBlah methods need some
examination.

I'll take a stab at most of these as time allows, barring objections of
course.

Lex