[squeak-dev] The Inbox: WebClient-Core-ct.126.mcz

Tom Beckmann tomjonabc at gmail.com
Mon Oct 26 07:39:58 UTC 2020


Hey everyone,

to sum up my understanding: the goal of this change is to change this
pattern

WebClient new
    httpGet: '
https://api.github.com/repos/myuser/myprivaterepo/zipball/master'
    do: [:req | req headerAt: 'Authorization' put: 'Basic ',
'myuser:mypasswd' base64Encoded]

to

WebClient new
    preAuthenticateMethod: #basic;
    user: 'myuser' password: 'mypasswd';
    httpGet: '
https://api.github.com/repos/myuser/myprivaterepo/zipball/master'

I dare say the fact that this pattern is required cannot be disputed since
a number of services, such as Github, require the user to authenticate in
this way.

So the question boils down to whether we want to explicitly support the
pattern or require users to have some low-level understanding of the
Authorization header to be able to produce it themselves.
Most HTTP client libraries that I looked at (httpie, curl, axios, Zinc) do
provide this convenience to the user, typically even making it even
"easier" by eagerly assuming Basic auth as soon as a username and password
are provided.

Now for my personal take: I think Christoph's proposed change is minimally
invasive (the diff is still not perfectly cleaned up, potentially making it
appear larger than it is) and adds value for the user for a (common?)
pattern, while making it very explicit what happens with their credentials
and when. So it's a +0.5 from me, and it would be a +1 if we could get a
"perfectly clean" diff for reviewing the code (I only looked at it from the
conceptual POV for now). Maybe I also missed an update that cleaned the
diff but couldn't find a "perfect" one in the inbox thus far :)

Best,
Tom

On Sun, Oct 25, 2020 at 11:00 PM Thiede, Christoph <
Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:

> Hi Tobias,
>
>
> > The API should stick to the HTTP RFCs and use 401 to say: You need
> authentication.
>
>
> Hm, but this would be an information leak on the server-side. :-)
>
>
> > Sadly you omitted the anyauth stuff that actually works how
> Authentication in HTTP is spec'ed.
>
> Sorry about that. Still, even in curl, anyauth is an opt-in feature, not
> an opt-out. So my proposal for adding #preAuthenticationMethod as
> an opt-in feature would be equivalent to adding an #anyAuth property as an
> opt-out. Why should WebClient be less powerful than curl? I see it can be
> abused, but Squeak already contains a lot of dangerous protocols that still
> can be useful in particular situations. Just insert a warning into
> the method comment and it'll be OK I think.
>
> > Except when you use  "https://api.github.com/authorizations" first,
> which 401s.
>
> True; but still, this would require either a change in the design or an
> edge-case implementation because the WebClient connection logic is not
> aware of whether GitHub or BitBucket or whatever else should be contacted.
>
> > Otherwise, encode it in the URL?
>
> Do you mean like http://username:password@www.example.com? At the moment,
> WebClient is not treating this differently than WebClient >> #username and
> #password. Is this the correct behavior? curl, again, uses preauth in this
> situation, and Pharo does this, too. Unfortunately, I could not find a
> clear answer to this question in RFC1738 ...
>
> > Don't rely on just my "judgement" ;)
>
> Your arguments are highly appreciated! I'm just trying to figure out your
> motivations ... Yepp, some >=3rd opinions would be a good thing. :-)
>
> Best,
> Christoph
>
> ------------------------------
> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
> Auftrag von Tobias Pape <Das.Linux at gmx.de>
> *Gesendet:* Sonntag, 25. Oktober 2020 22:19:49
> *An:* The general-purpose Squeak developers list
> *Betreff:* Re: [squeak-dev] The Inbox: WebClient-Core-ct.126.mcz
>
> Hi
>
> > On 25.10.2020, at 22:03, Thiede, Christoph <
> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
> >
> > Hi Tobias,
> >
> > sorry for the long delay. I was on holiday a few days and did not manage
> to return earlier to this interesting topic ...
> >
> >
> > > No thats wrong. Curl will only send auth data if you provide it.
> >
> > > Doing "curl -u user:pw" is the same as using WebClient httpGet:do: and
> adding the Authorization header manually
> >
> > > The point is, you instruct Curl to _provide credentials
> unconditionally_.
> >
> > I think this is exactly the point. "curl -u user:pw" reads as "Download
> a resource, 'specify[ing] the user name and password to use for server
> authentication"' (cited from the man).
>
> Yes, that means _unconditionally_, even if the source does not need it.
> This is called an information leak.
>
>
> > If I do
> > WebClient httpDo: [:client | client username: 'user'; password: 'pw';
> get: 'https://example.com/rest/whoami'],
> > this reads exactly the same for me. Otherwise, #username: and #password:
> better might be renamed into
> #optionalUsername/#lazyUsername/#usernameIfAsked etc.
> >
>
> Nope.
>
> > Apart from this, I have tested the behavior for Pharo, too, where the
> default web client is ZnClient: And it works like curl, too, rather than
> like our WebClient, i.e. the following works without any extra low-level
> instructions:
> >
> > ZnClient new
> > url: '
> https://api.github.com/repos/LinqLover/openHAB-configuration/zipball/master
> ';
> > username: 'LinqLover' password: 'mytoken';
> > downloadTo: 'foo.zip'
> >
> > > So you always know beforehand which resources need authentication?
> > > Neat, I dont :D
> >
> > I suppose we are having different use cases in mind: You are thinking of
> a generic browser application while I am thinking of a specific API client
> implementation. Is this correct?
>
> No. The API should stick to the HTTP RFCs and use 401 to say: You need
> authentication.
>
> > If I'm developing a REST API client, I do have to know whether a
> resource requires authentication or whether it doesn't. This is usually
> specified in the API documentation. Why add another level of uncertainty by
> using this trial-and-error strategy?
>
> Because preemtive auth is wrong.
>
> Sadly you omitted the anyauth stuff that actually works how Authentication
> in HTTP is spec'ed.
> Yes, it is "one request more". Yes, it is right thing to do.
>
> Just because it is convenient and just because people are doing it, it
> does not mean it is good.
> In fact, the whole "curl as API-consumer" is fine, but sticking "-u" to
> each and every request is a security nightmare just second to "curl ... |
> sudo bash".
>
> >
> >
> > In the context of my Metacello PR, the problem is that following your
> approach of specifying the Authorization header would mess up all the
> different layers of abstraction that are not aware of web client
> implementations and headers but only of a URL and a username/password pair.
> I had hoped that I could pass a constant string 'Basic' to the
> Authorization header for all cases where the WebClient is invoked, but
> unfortunately, GitHub does not understand this either, the header must
> contain the password even in the first run.
>
> Except when you use  "https://api.github.com/authorizations" first, which
> 401s.
>
>
> > It would lead to some portion on unhandsome spaghetti code if I had to
> implement an edge case for GitHub in the relevant method
> (MetacelloSqueakPlatform class >> #downloadZipArchive:to:username:pass:);
> for this reason, I would find it really helpful to turn on preAuth at this
> place.
>
> > Do you dislike this feature in general, even when turned off by default?
>
> Yes. WebClient is not just an API-consumer. It ought to be safe.
> Otherwise, encode it in the URL?
>
> > I'm not even requiring to make this an opt-out feature, this inbox
> version only implements it as an opt-in.
>
> I don't know.
>
> I think there has been too little input from others here.
> Don't rely on just my "judgement" ;)
>
> Best regards
>         -Tobias
>
>
> >
> > Best,
> > Christoph
> >
> > Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
> Auftrag von Tobias Pape <Das.Linux at gmx.de>
> > Gesendet: Dienstag, 13. Oktober 2020 10:04:23
> > An: The general-purpose Squeak developers list
> > Betreff: Re: [squeak-dev] The Inbox: WebClient-Core-ct.126.mcz
> >
> > Hi
> >
> > > On 12.10.2020, at 23:42, Thiede, Christoph <
> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
> > >
> > > Hi Tobias,
> > >
> > > okay, I see this authorization pattern now, so you mentioned two ways
> to work around the current limitations:
> > > First, by GETting https://api.github.com/authorizations before, or
> second, by passing the Authorization header manually.
> > > Is this correct?
> >
> > Yes. However, the second one is the one GitHub "recommends"
> >
> >
> > >
> > > However, both of these approaches lack of the RESTful-typical
> simplicity of "making a single HTTP request without dealing with complex
> call protocols or low-level connectivity code". To give an illustration of
> my use case, please see this PR on Metacello:
> https://github.com/Metacello/metacello/pull/534
> > > IMHO it would be a shame if you could not access a popular REST API
> like api.github.com in Squeak using a single message send to the
> WebClient/WebSocket class.
> >
> > There is no such thing as simplicity when a REST-Based resource-provider
> supports both authenticated and unauthenticated access.
> > If you cannot know beforehand, no single-request stuff is gonna help. No
> dice.
> >
> >
> > >
> > > > > Why not?
> > > >
> > > > It leaks credentials unnecessarily.
> > >
> > > Ah, good point! But this pattern (EAFP for web connections) is not
> really state of the art, is it? As mentioned, curl, for example, sends the
> authentication data in the very first request, which is a tool I would tend
> to *call* state of the art.
> >
> > No thats wrong. Curl will only send auth data if you provide it.
> >
> > Doing "curl -u user:pw" is the same as using WebClient httpGet:do: and
> adding the Authorization header manually
> >
> >
> > The sequence is split manually:
> > ```
> > $ curl https://api.github.com/repos/krono/debug/zipball/master
> > {
> >   "message": "Not Found",
> >   "documentation_url": "
> https://docs.github.com/rest/reference/repos#download-a-repository-archive
> "
> > }
> > # Well, I'm left to guess. Maybe exists, maybe not.
> > $ curl -u krono https://api.github.com/repos/krono/debug/zipball/master
> >
> > ```
> > (In this case, I can't even show what's going on as I use 2FA, which
> makes single-request REST to _never_ work on private repos.)
> >
> > The point is, you instruct Curl to _provide credentials unconditionally_.
> > The "heavy lifting" of finding out when to do that is not done by curl
> but by users of curl.
> >
> > Look:
> >
> > ```
> > $ curl http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/xpaware/
> > $
> > # Well, no response?
> > $ curl  -v
> http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/xpaware/
> > *   Trying 2001:638:807:204::8d59:e178...
> > * TCP_NODELAY set
> > * Connected to www.hpi.uni-potsdam.de (2001:638:807:204::8d59:e178)
> port 80 (#0)
> > > GET /hirschfeld/squeaksource/xpaware/ HTTP/1.1
> > > Host: www.hpi.uni-potsdam.de
> > > User-Agent: curl/7.54.0
> > > Accept: */*
> > >
> > < HTTP/1.1 401
> > < Date: Tue, 13 Oct 2020 07:43:04 GMT
> > < Server: nginx/1.14.2
> > < Content-Length: 0
> > < WWW-Authenticate: Basic realm="SwaSource - XP aware"
> > <
> > * Connection #0 to host www.hpi.uni-potsdam.de left intact
> > ```
> >
> > Thats the 401 we're looking for. We have found that the resource needs
> authentication.
> >
> > Sidenote: Curl can do the roundtrip (man curl, search anyauth):
> >
> > ```
> > $ curl -u topa --anyauth -v
> http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/xpaware/
> > Enter host password for user 'topa':
> > *   Trying 2001:638:807:204::8d59:e178...
> > * TCP_NODELAY set
> > * Connected to www.hpi.uni-potsdam.de (2001:638:807:204::8d59:e178)
> port 80 (#0)
> > > GET /hirschfeld/squeaksource/xpaware/ HTTP/1.1
> > > Host: www.hpi.uni-potsdam.de
> > > User-Agent: curl/7.54.0
> > > Accept: */*
> > >
> > < HTTP/1.1 401
> > < Date: Tue, 13 Oct 2020 07:46:05 GMT
> > < Server: nginx/1.14.2
> > < Content-Length: 0
> > < WWW-Authenticate: Basic realm="SwaSource - XP aware"
> > <
> > * Connection #0 to host www.hpi.uni-potsdam.de left intact
> > * Issue another request to this URL: '
> http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/xpaware/'
> > * Found bundle for host www.hpi.uni-potsdam.de: 0x7fb8c8c0b1b0 [can
> pipeline]
> > * Re-using existing connection! (#0) with host www.hpi.uni-potsdam.de
> > * Connected to www.hpi.uni-potsdam.de (2001:638:807:204::8d59:e178)
> port 80 (#0)
> > * Server auth using Basic with user 'topa'
> > > GET /hirschfeld/squeaksource/xpaware/ HTTP/1.1
> > > Host: www.hpi.uni-potsdam.de
> > > Authorization: Basic *******************
> > > User-Agent: curl/7.54.0
> > > Accept: */*
> > >
> > < HTTP/1.1 200
> > < Date: Tue, 13 Oct 2020 07:46:05 GMT
> > < Server: nginx/1.14.2
> > < Content-Type: text/html
> > < Content-Length: 15131
> > < Vary: Accept-Encoding
> > ```
> >
> > And in this case it does _not_ send auth in the first request but only
> in the second.
> >
> > Sidenote2: If the first request comes back 200, no second one is issued,
> no credentials leak:
> >
> > ```
> > $ curl -u topa --anyauth -v
> http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/xpforums/
> > Enter host password for user 'topa':
> > *   Trying 2001:638:807:204::8d59:e178...
> > * TCP_NODELAY set
> > * Connected to www.hpi.uni-potsdam.de (2001:638:807:204::8d59:e178)
> port 80 (#0)
> > > GET /hirschfeld/squeaksource/xpforums/ HTTP/1.1
> > > Host: www.hpi.uni-potsdam.de
> > > User-Agent: curl/7.54.0
> > > Accept: */*
> > >
> > < HTTP/1.1 200
> > < Date: Tue, 13 Oct 2020 07:46:56 GMT
> > < Server: nginx/1.14.2
> > < Content-Type: text/html
> > < Content-Length: 75860
> > < Vary: Accept-Encoding
> > ```
> >
> >
> >
> >
> >
> > > And speed is another point, given the fact that internet connections
> in Squeak are really slow ...
> > > Why do you call this behavior a leak? The application developer will
> not pass authentication data to the web client unless they expect the
> server to consume these data anyway.
> >
> > So you always know beforehand which resources need authentication?
> > Neat, I dont :D
> >
> > > If you deem it necessary, we could turn off the pre-authentication as
> soon as the client was redirected to another server ...
> >
> > What happens here is that we're bending over backwards because github
> decided to be a bad player.
> >
> > I mean, on most sited you visit in browsers, no auth data is sent
> _unless_ you are asked to (redirect to a login) or you _explicitely_ click
> on a login link.
> >
> > If you want preemtive auth, do it with WebClient httpGet:do:.
> >
> >
> >
> > >
> > > > I understand that the method is maybe not the most common style, but
> I think that functional changes should in such cases not be mixed with
> style changes.
> > >
> > > Alright, please see WebClient-Core-ct.128. But maybe we should
> consider to use prettyDiff for the mailing list notifications as a default?
> Just an idea.
> >
> > I personally find prettydiffs useless, but that's just me.
> >
> > Best regards
> >         -Tobias
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20201026/52c3a8ab/attachment.html>


More information about the Squeak-dev mailing list