Server timeouts and 504 return codes

List overview All Threads
Download

newer

older

The Inbox:...

The Trunk: System-tonyg.1054.mcz

tim Rowledge

27 Jan 2019 27 Jan '19

2:53 a.m.

A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.

I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day. Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ? Am I right in imagining that we can't normally affect that timeout?

If I have any reasonable grasp on this then we should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.

Except of course a 418 which has well defined error handling...

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim You forgot to do your backup 16 days ago. Tomorrow you'll need that version.

Show replies by date

Tobias Pape

27 Jan 27 Jan

1:22 p.m.

...

On 27.01.2019, at 02:53, tim Rowledge tim@rowledge.org wrote:

A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.

I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day. Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ? Am I right in imagining that we can't normally affect that timeout?

Well, we can.

What happens here:

- All our websites, including all HTTP services, such as the Map, arrive together at squeak.org, aka alan.box.squeak.org That is an nginx server. And also the server who eventually spits out the 504. - alan then sees we want a connection to the Map, and does a HTTP request to ted.box.squeak.org (=> alan is a _reverse proxy_) and upon response gets us that back.

- if ted fails to respond in 60s, alan gives a 504.

Simple as that. This limits the possibility that we wait too long (ie >60s) on ted.

Elephant in the room: why not directly ted? the nginx on alan is configured as hardened as I thought best, and actually can handle a multitude of requests much better than our squeak-based "application servers". This distinction between reverse proxy and application server is btw quite standard and enables some things. For example:

We can tune a lot of things on alan with regards to how it should handle things. The simplest being:

- we can tune the timeout: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout that's where the 60s come from, and we could simply crank it up. - HOWEVER: this could mean we eventually run into other timeouts, for example on the server or even in TCP or so. - so increasing this just like that _may_ help or _may_ make the Map useless altogether, so please be careful y'all :)

but also: - we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache - we could make alan not even ask ted when we know the answer already. - Attention: we need a lot of information on what is stable and what not to do this. - (its tempting to try, tho) - (we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…) - Note: a lot of time is probably spend by ted generating HTTP and by alan parsing HTTP. Using Fcgi, for example, reduces that, and is supported by both nginx (https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html) and GemStone, but I don't know whether we already have one in squeak.

...

If I have any reasonable grasp on this then we should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.

All 500-ish codes essentially say "the server is to blame" and the client can do noghitn about that. I don't think that 504 is meaningfully better handled than 503 or 502 in the WebClient. It think it's ok to pass that through.

...

Except of course a 418 which has well defined error handling...

At least not 451…

Best regards -Tobias

...

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim You forgot to do your backup 16 days ago. Tomorrow you'll need that version.

Levente Uzonyi

6:50 p.m.

On Sun, 27 Jan 2019, Tobias Pape wrote:

...

Hi

...
On 27.01.2019, at 02:53, tim Rowledge tim@rowledge.org wrote:

A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.

I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day. Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ? Am I right in imagining that we can't normally affect that timeout?

Well, we can.

What happens here:

All our websites, including all HTTP services, such as the Map, arrive together at squeak.org, aka alan.box.squeak.org

That is an nginx server. And also the server who eventually spits out the 504.

alan then sees we want a connection to the Map, and does a HTTP request to ted.box.squeak.org (=> alan is a _reverse proxy_)

and upon response gets us that back.

if ted fails to respond in 60s, alan gives a 504.

Simple as that. This limits the possibility that we wait too long (ie >60s) on ted.

Elephant in the room: why not directly ted? the nginx on alan is configured as hardened as I thought best, and actually can handle a multitude of requests much better than our squeak-based "application servers". This distinction between reverse proxy and application server is btw quite standard and enables some things. For example:

We can tune a lot of things on alan with regards to how it should handle things. The simplest being:

we can tune the timeout: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout

that's where the 60s come from, and we could simply crank it up.

HOWEVER: this could mean we eventually run into other timeouts, for example on the server or even in TCP or so.

so increasing this just like that _may_ help or _may_ make the Map useless altogether, so please be careful y'all :)

Tim reported shorter than 45s timeouts, so it is very likely an issue with the SqueakMap image.

...

but also:

we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache

we could make alan not even ask ted when we know the answer already.

Attention: we need a lot of information on what is stable and what not to do this.

(its tempting to try, tho)

(we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)

If squeaksource/mc used ETags, then the squeaksource image could simply return 304 and let nginx serve the cached mczs while keeping the statistics updated. That would also let us save bandwidth by not downloading files already sitting in the client's package cache. We could also use nginx to serve files instead of the image, but then the image would have to know that it's sitting behind nginx.

...

Note: a lot of time is probably spend by ted generating HTTP and by alan parsing HTTP. Using Fcgi, for example, reduces that, and is supported by both nginx (https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html) and GemStone, but I don't know whether we already have one in squeak.

I'm 99% sure http overhead is negligible.

Levente

...

...
If I have any reasonable grasp on this then we should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.

All 500-ish codes essentially say "the server is to blame" and the client can do noghitn about that. I don't think that 504 is meaningfully better handled than 503 or 502 in the WebClient. It think it's ok to pass that through.

...
Except of course a 418 which has well defined error handling...

At least not 451…

Best regards -Tobias

...
tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim You forgot to do your backup 16 days ago. Tomorrow you'll need that version.

Tobias Pape

8:06 p.m.

...

On 27.01.2019, at 18:50, Levente Uzonyi leves@caesar.elte.hu wrote:

On Sun, 27 Jan 2019, Tobias Pape wrote:

...
Hi

...
On 27.01.2019, at 02:53, tim Rowledge tim@rowledge.org wrote: A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections. I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day. Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ? Am I right in imagining that we can't normally affect that timeout?

Well, we can.

What happens here:

All our websites, including all HTTP services, such as the Map, arrive together at squeak.org, aka alan.box.squeak.org

That is an nginx server. And also the server who eventually spits out the 504.

alan then sees we want a connection to the Map, and does a HTTP request to ted.box.squeak.org (=> alan is a _reverse proxy_)

and upon response gets us that back.

if ted fails to respond in 60s, alan gives a 504.

Simple as that. This limits the possibility that we wait too long (ie >60s) on ted.

Elephant in the room: why not directly ted? the nginx on alan is configured as hardened as I thought best, and actually can handle a multitude of requests much better than our squeak-based "application servers". This distinction between reverse proxy and application server is btw quite standard and enables some things. For example:

We can tune a lot of things on alan with regards to how it should handle things. The simplest being:

we can tune the timeout: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout

that's where the 60s come from, and we could simply crank it up.

HOWEVER: this could mean we eventually run into other timeouts, for example on the server or even in TCP or so.

so increasing this just like that _may_ help or _may_ make the Map useless altogether, so please be careful y'all :)

Tim reported shorter than 45s timeouts, so it is very likely an issue with the SqueakMap image.

But then we wouldn't have 504. 504 is explicitly: upstream timed out.

What we have is:

/etc/nginx/conf.d/proxy.conf

### proxy-timeouts ### proxy_connect_timeout 30; proxy_send_timeout 90; proxy_read_timeout 90;

And _not_ being able to connect could also mean 504.

But _that_ in turn means that map is so overloaded it cannot take new connections, and that would be a bummer.

...

...
but also:

we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache

we could make alan not even ask ted when we know the answer already.

Attention: we need a lot of information on what is stable and what not to do this.

(its tempting to try, tho)

(we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)

If squeaksource/mc used ETags, then the squeaksource image could simply return 304 and let nginx serve the cached mczs while keeping the statistics updated.

I think I had something like that already in SqueakSource3

...

That would also let us save bandwidth by not downloading files already sitting in the client's package cache. We could also use nginx to serve files instead of the image, but then the image would have to know that it's sitting behind nginx.

You can do something like that with nginx (_and_ notifiy the server). That would be around 20 lines nginx and 50 lines in SqueakSource3

...

...

Note: a lot of time is probably spend by ted generating HTTP and by alan parsing HTTP. Using Fcgi, for example, reduces that, and is supported by both nginx (https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html) and GemStone, but I don't know whether we already have one in squeak.

I'm 99% sure http overhead is negligible.

probably. but I don't know.

Best regards -Tobias

...

Levente

...
...
If I have any reasonable grasp on this then we should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.

All 500-ish codes essentially say "the server is to blame" and the client can do noghitn about that. I don't think that 504 is meaningfully better handled than 503 or 502 in the WebClient. It think it's ok to pass that through.

...
Except of course a 418 which has well defined error handling...

At least not 451…

Best regards -Tobias

...
tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim You forgot to do your backup 16 days ago. Tomorrow you'll need that version.

Chris Muller

9:10 p.m.

Hi guys,

...

...
...
A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.

I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day. Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ? Am I right in imagining that we can't normally affect that timeout?

Well, we can.

What happens here:

All our websites, including all HTTP services, such as the Map, arrive together at squeak.org, aka alan.box.squeak.org

That is an nginx server. And also the server who eventually spits out the 504.

alan then sees we want a connection to the Map, and does a HTTP request to ted.box.squeak.org (=> alan is a _reverse proxy_)

and upon response gets us that back.

Thanks for the great explanation! I want to learn more about admin'ing, so its great to have this in-context example of a reverse-proxy, thanks for setting that up!

...

...

if ted fails to respond in 60s, alan gives a 504.

60s seems like a ideally balanced timeout setting -- the longest any possible request should be expected to wait ... and yet clients can still shorten to 45s or 30 if they want a shorter timeout.

...

...
Simple as that. This limits the possibility that we wait too long (ie >60s) on ted.

Elephant in the room: why not directly ted? the nginx on alan is configured as hardened as I thought best, and actually can handle a multitude of requests much better than our squeak-based "application servers". This distinction between reverse proxy and application server is btw quite standard and enables some things. For example:

We can tune a lot of things on alan with regards to how it should handle things. The simplest being:

we can tune the timeout: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout

that's where the 60s come from, and we could simply crank it up.

HOWEVER: this could mean we eventually run into other timeouts, for example on the server or even in TCP or so.

so increasing this just like that _may_ help or _may_ make the Map useless altogether, so please be careful y'all :)

Tim reported shorter than 45s timeouts, so it is very likely an issue with the SqueakMap image.

Yes, the SqueakMap server image is one part of the dynamic, but I think another is a bug in the trunk image. I think the reason Tim is not seeing 45 seconds before error is because the timeout setting of the high-up client is not being passed all the way down to the lowest-level layers -- e.g., from HTTPSocket --> WebClient --> SocketStream --> Socket. By the time it gets down to Socket which does the actual work, it's operating on its own 30 second timeout.

It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model after an update (e.g., adding a Release, etc.). It's so long because the server is running on a very old 3.x image, interpreter VM. It's running a HttpView2 app which doesn't even compile in modern Squeak. That's why it hasn't been brought forward yet, but I am working on a new API service to replace it with the eventual goal of SqueakMap being an "App Store" experience, and it will not suffer timeouts.

...

...
but also:

we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache

we could make alan not even ask ted when we know the answer already.

Attention: we need a lot of information on what is stable and what not to do this.

(its tempting to try, tho)

(we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)

If squeaksource/mc used ETags, then the squeaksource image could simply return 304 and let nginx serve the cached mczs while keeping the statistics updated.

Tim's email was about SqueakMap, not SqueakSource. SqueakSource serves the mcz's straight off the hard-drive platter. We don't need to trade away download statistics to save a few ms on a mcz request.

...

That would also let us save bandwidth by not downloading files already sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

Best, Chris

...

We could also use nginx to serve files instead of the image, but then the image would have to know that it's sitting behind nginx.

...

Note: a lot of time is probably spend by ted generating HTTP and by alan parsing HTTP. Using Fcgi, for example, reduces that, and is supported by both nginx (https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html) and GemStone, but I don't know whether we already have one in squeak.

I'm 99% sure http overhead is negligible.

Levente

...
...
If I have any reasonable grasp on this then we should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.

All 500-ish codes essentially say "the server is to blame" and the client can do noghitn about that. I don't think that 504 is meaningfully better handled than 503 or 502 in the WebClient. It think it's ok to pass that through.

...
Except of course a 418 which has well defined error handling...

At least not 451…

Best regards -Tobias

...
tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim You forgot to do your backup 16 days ago. Tomorrow you'll need that version.

Levente Uzonyi

9:48 p.m.

On Sun, 27 Jan 2019, Chris Muller wrote:

...

Hi guys,

...
...
...
A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.

I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day. Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ? Am I right in imagining that we can't normally affect that timeout?

Well, we can.

What happens here:

All our websites, including all HTTP services, such as the Map, arrive together at squeak.org, aka alan.box.squeak.org

That is an nginx server. And also the server who eventually spits out the 504.

alan then sees we want a connection to the Map, and does a HTTP request to ted.box.squeak.org (=> alan is a _reverse proxy_)

and upon response gets us that back.

Thanks for the great explanation! I want to learn more about admin'ing, so its great to have this in-context example of a reverse-proxy, thanks for setting that up!

...
...

if ted fails to respond in 60s, alan gives a 504.

60s seems like a ideally balanced timeout setting -- the longest any possible request should be expected to wait ... and yet clients can still shorten to 45s or 30 if they want a shorter timeout.

...
...
Simple as that. This limits the possibility that we wait too long (ie >60s) on ted.

Elephant in the room: why not directly ted? the nginx on alan is configured as hardened as I thought best, and actually can handle a multitude of requests much better than our squeak-based "application servers". This distinction between reverse proxy and application server is btw quite standard and enables some things. For example:

We can tune a lot of things on alan with regards to how it should handle things. The simplest being:

we can tune the timeout: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout

that's where the 60s come from, and we could simply crank it up.

HOWEVER: this could mean we eventually run into other timeouts, for example on the server or even in TCP or so.

so increasing this just like that _may_ help or _may_ make the Map useless altogether, so please be careful y'all :)

Tim reported shorter than 45s timeouts, so it is very likely an issue with the SqueakMap image.

Yes, the SqueakMap server image is one part of the dynamic, but I think another is a bug in the trunk image. I think the reason Tim is not seeing 45 seconds before error is because the timeout setting of the high-up client is not being passed all the way down to the lowest-level layers -- e.g., from HTTPSocket --> WebClient --> SocketStream --> Socket. By the time it gets down to Socket which does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

...

It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model after an update (e.g., adding a Release, etc.). It's so long because the server is running on a very old 3.x image, interpreter VM. It's running a HttpView2 app which doesn't even compile in modern Squeak. That's why it hasn't been brought forward yet, but I am working on a new API service to replace it with the eventual goal of SqueakMap being an "App Store" experience, and it will not suffer timeouts.

...
...
but also:

we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache

we could make alan not even ask ted when we know the answer already.

Attention: we need a lot of information on what is stable and what not to do this.

(its tempting to try, tho)

(we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)

If squeaksource/mc used ETags, then the squeaksource image could simply return 304 and let nginx serve the cached mczs while keeping the statistics updated.

Tim's email was about SqueakMap, not SqueakSource. SqueakSource

That part of the thread changed direction. It happens sometimes.

...

serves the mcz's straight off the hard-drive platter. We don't need to trade away download statistics to save a few ms on a mcz request.

Download statistics would stay the same despite being flawed (e.g. you'll download everything multiple times even if those files are sitting in your package cache). You would save seconds, not milliseconds by not downloading files again.

...

...
That would also let us save bandwidth by not downloading files already sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

Levente

...

Best, Chris

...
We could also use nginx to serve files instead of the image, but then the image would have to know that it's sitting behind nginx.

...

Note: a lot of time is probably spend by ted generating HTTP and by alan parsing HTTP. Using Fcgi, for example, reduces that, and is supported by both nginx (https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html) and GemStone, but I don't know whether we already have one in squeak.

I'm 99% sure http overhead is negligible.

Levente

...
...
If I have any reasonable grasp on this then we should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.

All 500-ish codes essentially say "the server is to blame" and the client can do noghitn about that. I don't think that 504 is meaningfully better handled than 503 or 502 in the WebClient. It think it's ok to pass that through.

...
Except of course a 418 which has well defined error handling...

At least not 451…

Best regards -Tobias

...
tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim You forgot to do your backup 16 days ago. Tomorrow you'll need that version.

Chris Muller

11:18 p.m.

Hi Levente,

...

...
Yes, the SqueakMap server image is one part of the dynamic, but I think another is a bug in the trunk image. I think the reason Tim is not seeing 45 seconds before error is because the timeout setting of the high-up client is not being passed all the way down to the lowest-level layers -- e.g., from HTTPSocket --> WebClient --> SocketStream --> Socket. By the time it gets down to Socket which does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

Well, it depends on if, for example, you're in the middle of Antarctica with a slow internet connection in an office with a fast connection. A 30 second timeout is just the maximum amount of time the client will wait for the entire process before presenting a debugger, that's all it can do.

...

...
It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model after an update (e.g., adding a Release, etc.). It's so long because the server is running on a very old 3.x image, interpreter VM. It's running a HttpView2 app which doesn't even compile in modern Squeak. That's why it hasn't been brought forward yet, but I am working on a new API service to replace it with the eventual goal of SqueakMap being an "App Store" experience, and it will not suffer timeouts.

...
...
but also:

we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache

we could make alan not even ask ted when we know the answer already.

Attention: we need a lot of information on what is stable and what not to do this.

(its tempting to try, tho)

(we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)

If squeaksource/mc used ETags, then the squeaksource image could simply return 304 and let nginx serve the cached mczs while keeping the statistics updated.

Tim's email was about SqueakMap, not SqueakSource. SqueakSource

That part of the thread changed direction. It happens sometimes.

...
serves the mcz's straight off the hard-drive platter. We don't need to trade away download statistics to save a few ms on a mcz request.

Download statistics would stay the same despite being flawed (e.g. you'll download everything multiple times even if those files are sitting in your package cache).

Not if we fix the package-cache (more about this, below).

...

You would save seconds, not milliseconds by not downloading files again.

IIUC, you're saying we would save one hope in the "download" -- instead of client <--> alan <--> andreas, it would just be client <--> alan. Is that right?

I don't know what the speed between alan <---> andreas is, but I doubt it's much slower than client <---> alan in most cases, so the savings would seem to be minimal..?

...

...
...
That would also let us save bandwidth by not downloading files already sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds really bad we should fix that, not surrender to it.

- Chris

Chris Muller

11:34 p.m.

...

...
...
...
That would also let us save bandwidth by not downloading files already sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

Even still, we could check the package-cache first, open up the one with that name and see if its teh correct UUID...

...

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds really bad we should fix that, not surrender to it.

Chris

Levente Uzonyi

28 Jan 28 Jan

midnight

On Sun, 27 Jan 2019, Chris Muller wrote:

...

...
...
...
...
That would also let us save bandwidth by not downloading files already sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

Even still, we could check the package-cache first, open up the one with that name and see if its teh correct UUID...

UUIDs may work, but hashes have the advantage that the tools don't have to know about the internals of the packages. Also, I think mcds and mcms don't have UUIDs, but hashes would work with those too.

Levente

...

...
No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds really bad we should fix that, not surrender to it.

Chris

Levente Uzonyi

27 Jan 27 Jan

11:59 p.m.

On Sun, 27 Jan 2019, Chris Muller wrote:

...

Hi Levente,

...
...
Yes, the SqueakMap server image is one part of the dynamic, but I think another is a bug in the trunk image. I think the reason Tim is not seeing 45 seconds before error is because the timeout setting of the high-up client is not being passed all the way down to the lowest-level layers -- e.g., from HTTPSocket --> WebClient --> SocketStream --> Socket. By the time it gets down to Socket which does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

Well, it depends on if, for example, you're in the middle of Antarctica with a slow internet connection in an office with a fast connection. A 30 second timeout is just the maximum amount of time the client will wait for the entire process before presenting a debugger, that's all it can do.

We can be sure that Tim should get subsecond response times instead of timeouts after 30 seconds.

...

...
...
It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model after an update (e.g., adding a Release, etc.). It's so long because the server is running on a very old 3.x image, interpreter VM. It's running a HttpView2 app which doesn't even compile in modern Squeak. That's why it hasn't been brought forward yet, but I am working on a new API service to replace it with the eventual goal of SqueakMap being an "App Store" experience, and it will not suffer timeouts.

...
...
but also:

we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache

we could make alan not even ask ted when we know the answer already.

Attention: we need a lot of information on what is stable and what not to do this.

(its tempting to try, tho)

(we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)

If squeaksource/mc used ETags, then the squeaksource image could simply return 304 and let nginx serve the cached mczs while keeping the statistics updated.

Tim's email was about SqueakMap, not SqueakSource. SqueakSource

That part of the thread changed direction. It happens sometimes.

...
serves the mcz's straight off the hard-drive platter. We don't need to trade away download statistics to save a few ms on a mcz request.

Download statistics would stay the same despite being flawed (e.g. you'll download everything multiple times even if those files are sitting in your package cache).

Not if we fix the package-cache (more about this, below).

...
You would save seconds, not milliseconds by not downloading files again.

IIUC, you're saying we would save one hope in the "download" -- instead of client <--> alan <--> andreas, it would just be client <--> alan. Is that right?

No. If the client doesn't have the mcz in the package cache but nginx has it in its cache, then we save the transfer of data between alan and andreas. The file doesn't have to be read from the disk either. If the client does have the mcz, then we save the complete file transfer.

...

I don't know what the speed between alan <---> andreas is, but I doubt it's much slower than client <---> alan in most cases, so the savings would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and send that through a socket. Nginx does that thing magnitudes faster than Squeak.

...

...
...
...
That would also let us save bandwidth by not downloading files already sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe

Not at the same time, but it's possible, and it just happened recently with Chronology-ul.21. It is perfectly possible that a client has a version in its package cache with the same name as a different version on the server.

...

we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds

I don't know. It wasn't me who designed it. :)

...

really bad we should fix that, not surrender to it.

Yes, that should be fixed, but it needs changes on the server side. What I always had in mind was to extend the repository listing with hashes/uuids so that the client could figure out if it needs to download a specific version. But care must be taken not to break the code for non-ss repositories (e.g. simple directory listings).

Levente

...

Chris

Chris Muller

28 Jan 28 Jan

1:39 a.m.

Hi,

...

...
...
...
Yes, the SqueakMap server image is one part of the dynamic, but I think another is a bug in the trunk image. I think the reason Tim is not seeing 45 seconds before error is because the timeout setting of the high-up client is not being passed all the way down to the lowest-level layers -- e.g., from HTTPSocket --> WebClient --> SocketStream --> Socket. By the time it gets down to Socket which does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

Well, it depends on if, for example, you're in the middle of Antarctica with a slow internet connection in an office with a fast connection. A 30 second timeout is just the maximum amount of time the client will wait for the entire process before presenting a debugger, that's all it can do.

We can be sure that Tim should get subsecond response times instead of timeouts after 30 seconds.

Right, but timeout settings are a necessary tool sometimes, my point was that we should fix client code in trunk to make timeouts work properly.

Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to map.squeak.org and click around and see. For the remaining 1% that aren't, the issue is known and we're working on a new server to fix that.

...

...
...
...
It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model after an

and so if in the meantime it can simply be made to wait 45s instead of 30s, then current SqueakMap will only be that occasional delay at worst, instead of the annoying debugger we currently get.

...

...
...
You would save seconds, not milliseconds by not downloading files again.

IIUC, you're saying we would save one hope in the "download" -- instead of client <--> alan <--> andreas, it would just be client <--> alan. Is that right?

No. If the client doesn't have the mcz in the package cache but nginx has it in its cache, then we save the transfer of data between alan and andreas.

Are alan and andreas co-located?

...

The file doesn't have to be read from the disk either.

I assume you mean "read from disk" on alan? What about after it's cached so many mcz's in RAM that its paging out to swap file? To me, wasing precious RAM (of any server) to cache old MCZ file contents that no one will ever download (because they become old very quickly) feels wasteful. Dragster cars are wasteful too, but yes, they are "faster"... on a dragstrip. :) I guess there'd have to be some kind of application-specific smart management of the cache...

Levente, what about the trunk directory listing, can it cache that? That is the _#1 thing_ source.squeak.org is accessing and sending back over, and over, and over again -- every time that MC progress box that says, "Updating [repository name]".

...

If the client does have the mcz, then we save the complete file transfer.

...
I don't know what the speed between alan <---> andreas is, but I doubt it's much slower than client <---> alan in most cases, so the savings would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and send that through a socket.

By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Yes, it still has to return back through alan but I assume alan does not wait for a "full download" received from andreas before its already pipeing back to the Squeak client. If true, then it seems like it only amounts to saving one hop, which would hardly be noticeable over what we have now.

...

Nginx does that thing magnitudes faster than Squeak.

The UX would not be magnitudes faster though, right?

...

...
...
...
...
That would also let us save bandwidth by not downloading files already sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe

Not at the same time, but it's possible, and it just happened recently with Chronology-ul.21. It is perfectly possible that a client has a version in its package cache with the same name as a different version on the server.

But we don't want to restrict what's possible in our software design because of that. That situation is already a headache anyway. Same name theoretically can come only from the same person (if we ensure unique initials) and so this is avoidable / fixable by resaving one of them as a different name...

...

...
we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds

I don't know. It wasn't me who designed it. :)

I meant ANY "cache".

https://en.wikipedia.org/wiki/Cache_(computing)

For Monticello, package-cache's other use-case is when an authentication issue occurs when trying to save to a HTTP repository. At that point the Version object with the new ancestry was already constructed in memory, so rather than worry about trying to "undo" all that, it was simpler and better to save it to a package-cache, persist it safely so the client can simply move forward from there (get access to the HTTP and copy it or whatever).

- Chris

...

...
really bad we should fix that, not surrender to it.

Yes, that should be fixed, but it needs changes on the server side. What I always had in mind was to extend the repository listing with hashes/uuids so that the client could figure out if it needs to download a specific version. But care must be taken not to break the code for non-ss repositories (e.g. simple directory listings).

Levente

...

Chris

tim Rowledge

2:23 a.m.

I'm really pleased some competent people are thinking about this; means I can stop worrying about something outside my main thrust !

Generally I prefer things to timeout very quickly if they are going to timeout at all - I was startled to see that the default timeout appears to be 45 seconds. This is especially the case if the thing potentially timing out is blocking any other actions I might want to be getting on with, it used to be a *real* annoyance with some RISC OS applications blocking the entire OS through poor design. Some better user feedback about the progress would help in a lot of cases. After all, if you have some indication that stuff is actually being done for you it is less annoying. It's a pity there isn't a class of html 'error' message that says "I'm working on it, busy right now, check again in X seconds" or "we're sorry, all our sockets are busy. Please stay online and we'll get to you soon" etc.

I am interested in what error responses we might sensibly handle and how. Some examples that document helpful behaviour would be nice to add so that future authors have some guidance in doing smart things.

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Oxymorons: Sweet sorrow

Tobias Pape

9:01 a.m.

Hi Tim

...

On 28.01.2019, at 02:23, tim Rowledge tim@rowledge.org wrote:

I'm really pleased some competent people are thinking about this; means I can stop worrying about something outside my main thrust !

Generally I prefer things to timeout very quickly if they are going to timeout at all - I was startled to see that the default timeout appears to be 45 seconds. This is especially the case if the thing potentially timing out is blocking any other actions I might want to be getting on with, it used to be a *real* annoyance with some RISC OS applications blocking the entire OS through poor design. Some better user feedback about the progress would help in a lot of cases. After all, if you have some indication that stuff is actually being done for you it is less annoying. It's a pity there isn't a class of html 'error' message that says "I'm working on it, busy right now, check again in X seconds" or "we're sorry, all our sockets are busy. Please stay online and we'll get to you soon" etc.

I am interested in what error responses we might sensibly handle and how. Some examples that document helpful behaviour would be nice to add so that future authors have some guidance in doing smart things.

Yeah, quick timeouts would be great, but we can't have them, somehow. One of the problems is latency that accumulates over the multiple hops your packages take. Also, things like TCP timeouts have to be accounted for. This is all nontrivial now. I mean, in the US or all over Europe, we could probably do away with 10s timeouts max. But then things get complicated for African or South-american users, where the whole network latency can accumulate around that number PLUS overhead the sending app and receiving clients incur…

¯_(ツ)_/¯

...

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Oxymorons: Sweet sorrow

Levente Uzonyi

2:40 a.m.

On Sun, 27 Jan 2019, Chris Muller wrote:

...

Hi,

...
...
...
...
Yes, the SqueakMap server image is one part of the dynamic, but I think another is a bug in the trunk image. I think the reason Tim is not seeing 45 seconds before error is because the timeout setting of the high-up client is not being passed all the way down to the lowest-level layers -- e.g., from HTTPSocket --> WebClient --> SocketStream --> Socket. By the time it gets down to Socket which does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

Well, it depends on if, for example, you're in the middle of Antarctica with a slow internet connection in an office with a fast connection. A 30 second timeout is just the maximum amount of time the client will wait for the entire process before presenting a debugger, that's all it can do.

We can be sure that Tim should get subsecond response times instead of timeouts after 30 seconds.

Right, but timeout settings are a necessary tool sometimes, my point was that we should fix client code in trunk to make timeouts work properly.

Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to map.squeak.org and click around and see. For the remaining 1% that aren't, the issue is known and we're working on a new server to fix that.

Great! That was my point: the image needs to be fixed.

...

...
...
...
...
It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model after an

and so if in the meantime it can simply be made to wait 45s instead of 30s, then current SqueakMap will only be that occasional delay at worst, instead of the annoying debugger we currently get.

I don't see why that would make a difference: the user would get a debugger anyway, but only 15 seconds later.

...

...
...
...
You would save seconds, not milliseconds by not downloading files again.

IIUC, you're saying we would save one hope in the "download" -- instead of client <--> alan <--> andreas, it would just be client <--> alan. Is that right?

No. If the client doesn't have the mcz in the package cache but nginx has it in its cache, then we save the transfer of data between alan and andreas.

Are alan and andreas co-located?

They are cloud servers in the same data center.

...

...
The file doesn't have to be read from the disk either.

I assume you mean "read from disk" on alan? What about after it's cached so many mcz's in RAM that its paging out to swap file? To me, wasing precious RAM (of any server) to cache old MCZ file contents that no one will ever download (because they become old very quickly) feels wasteful. Dragster cars are wasteful too, but yes, they are "faster"... on a dragstrip. :) I guess there'd have to be some kind of application-specific smart management of the cache...

Nginx's proxy_cache can handle that all automatically. Also, we don't need a large cache. A small, memory-only cache would do it.

...

Levente, what about the trunk directory listing, can it cache that?

Sure.

...

That is the _#1 thing_ source.squeak.org is accessing and sending back over, and over, and over again -- every time that MC progress box that says, "Updating [repository name]".

Right, unless you update an older image.

...

...
If the client does have the mcz, then we save the complete file transfer.

...
I don't know what the speed between alan <---> andreas is, but I doubt it's much slower than client <---> alan in most cases, so the savings would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and send that through a socket.

By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server, and it'll just grind to a halt when some load arrives. Every time the external semaphore table is full, a GC is triggered to try clear it up via the finalization process. Reading a file into memory is slow, writing it to a socket is slow. (Compared to nginx which uses sendfile to let the kernel handle that). And Squeak can only use a single process to handle everything.

...

Yes, it still has to return back through alan but I assume alan does not wait for a "full download" received from andreas before its already pipeing back to the Squeak client. If true, then it seems like it only amounts to saving one hop, which would hardly be noticeable over what we have now.

The goal of caching is not about saving a hop, but to avoid handling files in Squeak.

...

...
Nginx does that thing magnitudes faster than Squeak.

The UX would not be magnitudes faster though, right?

Directly by letting nginx serving the file, no, but the server image would be less likely to get stalled (return 5xx responses). But the caching scheme I described in this thread would make the UX a lot quicker too, because data would not have to be transferred when you already have it.

...

...
...
...
...
...
That would also let us save bandwidth by not downloading files already sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe

Not at the same time, but it's possible, and it just happened recently with Chronology-ul.21. It is perfectly possible that a client has a version in its package cache with the same name as a different version on the server.

But we don't want to restrict what's possible in our software design because of that. That situation is already a headache anyway. Same name theoretically can come only from the same person (if we ensure unique initials) and so this is avoidable / fixable by resaving one of them as a different name...

It wasn't me who created the duplicate. If your suggestion had been in place, some images out there, including mine, would have been broken by the update process.

...

...
...
we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds

I don't know. It wasn't me who designed it. :)

I meant ANY "cache".

https://en.wikipedia.org/wiki/Cache_(computing)

It still depends on the purpose of the cache. It's possible that package-cache is just a misnomer or it was just a plan to use it as a cache which hasn't happened yet.

...

For Monticello, package-cache's other use-case is when an authentication issue occurs when trying to save to a HTTP repository. At that point the Version object with the new ancestry was already constructed in memory, so rather than worry about trying to "undo" all that, it was simpler and better to save it to a package-cache, persist it safely so the client can simply move forward from there (get access to the HTTP and copy it or whatever).

The package-cache is also handy as a default repository and as an offline storage.

Levente

...

Chris

...
...
really bad we should fix that, not surrender to it.

Yes, that should be fixed, but it needs changes on the server side. What I always had in mind was to extend the repository listing with hashes/uuids so that the client could figure out if it needs to download a specific version. But care must be taken not to break the code for non-ss repositories (e.g. simple directory listings).

Levente

...

Chris

Eliot Miranda

3:45 a.m.

Hi Levente,

...

On Jan 27, 2019, at 5:40 PM, Levente Uzonyi leves@caesar.elte.hu wrote:

...
On Sun, 27 Jan 2019, Chris Muller wrote:

Hi,

...
...
...
...
Yes, the SqueakMap server image is one part of the dynamic, but I think another is a bug in the trunk image. I think the reason Tim is not seeing 45 seconds before error is because the timeout setting of the high-up client is not being passed all the way down to the lowest-level layers -- e.g., from HTTPSocket --> WebClient --> SocketStream --> Socket. By the time it gets down to Socket which does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

Well, it depends on if, for example, you're in the middle of Antarctica with a slow internet connection in an office with a fast connection. A 30 second timeout is just the maximum amount of time the client will wait for the entire process before presenting a debugger, that's all it can do.

We can be sure that Tim should get subsecond response times instead of timeouts after 30 seconds.

Right, but timeout settings are a necessary tool sometimes, my point was that we should fix client code in trunk to make timeouts work properly.

Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to map.squeak.org and click around and see. For the remaining 1% that aren't, the issue is known and we're working on a new server to fix that.

Great! That was my point: the image needs to be fixed.

...
...
...
...
...
It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model after an

and so if in the meantime it can simply be made to wait 45s instead of 30s, then current SqueakMap will only be that occasional delay at worst, instead of the annoying debugger we currently get.

I don't see why that would make a difference: the user would get a debugger anyway, but only 15 seconds later.

...
...
...
...
You would save seconds, not milliseconds by not downloading files again.

IIUC, you're saying we would save one hope in the "download" -- instead of client <--> alan <--> andreas, it would just be client <--> alan. Is that right?

No. If the client doesn't have the mcz in the package cache but nginx has it in its cache, then we save the transfer of data between alan and andreas.

Are alan and andreas co-located?

They are cloud servers in the same data center.

...
...
The file doesn't have to be read from the disk either.

I assume you mean "read from disk" on alan? What about after it's cached so many mcz's in RAM that its paging out to swap file? To me, wasing precious RAM (of any server) to cache old MCZ file contents that no one will ever download (because they become old very quickly) feels wasteful. Dragster cars are wasteful too, but yes, they are "faster"... on a dragstrip. :) I guess there'd have to be some kind of application-specific smart management of the cache...

Nginx's proxy_cache can handle that all automatically. Also, we don't need a large cache. A small, memory-only cache would do it.

...
Levente, what about the trunk directory listing, can it cache that?

Sure.

...
That is the _#1 thing_ source.squeak.org is accessing and sending back over, and over, and over again -- every time that MC progress box that says, "Updating [repository name]".

Right, unless you update an older image.

...
...
If the client does have the mcz, then we save the complete file transfer.

...
I don't know what the speed between alan <---> andreas is, but I doubt it's much slower than client <---> alan in most cases, so the savings would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and send that through a socket.

By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server, and it'll just grind to a halt when some load arrives. Every time the external semaphore table is full, a GC is triggered to try clear it up via the finalization process. Reading a file into memory is slow, writing it to a socket is slow. (Compared to nginx which uses sendfile to let the kernel handle that). And Squeak can only use a single process to handle everything.

That’s configurable. Alas because writing lock-free table growth is not easy the external semaphore table doesn’t grow automatically. But the vm does allow its size to be specified in a value cached in the image header and read at startup (IIRC). So we could easily have a 4K entry external semaphore table.

...

...
Yes, it still has to return back through alan but I assume alan does not wait for a "full download" received from andreas before its already pipeing back to the Squeak client. If true, then it seems like it only amounts to saving one hop, which would hardly be noticeable over what we have now.

The goal of caching is not about saving a hop, but to avoid handling files in Squeak.

...
...
Nginx does that thing magnitudes faster than Squeak.

The UX would not be magnitudes faster though, right?

Directly by letting nginx serving the file, no, but the server image would be less likely to get stalled (return 5xx responses). But the caching scheme I described in this thread would make the UX a lot quicker too, because data would not have to be transferred when you already have it.

...
...
...
...
...
> That would also let us save bandwidth by not downloading files already > sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe

Not at the same time, but it's possible, and it just happened recently with Chronology-ul.21. It is perfectly possible that a client has a version in its package cache with the same name as a different version on the server.

But we don't want to restrict what's possible in our software design because of that. That situation is already a headache anyway. Same name theoretically can come only from the same person (if we ensure unique initials) and so this is avoidable / fixable by resaving one of them as a different name...

It wasn't me who created the duplicate. If your suggestion had been in place, some images out there, including mine, would have been broken by the update process.

...
...
...
we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds

I don't know. It wasn't me who designed it. :)

I meant ANY "cache".

https://en.wikipedia.org/wiki/Cache_(computing)

It still depends on the purpose of the cache. It's possible that package-cache is just a misnomer or it was just a plan to use it as a cache which hasn't happened yet.

...
For Monticello, package-cache's other use-case is when an authentication issue occurs when trying to save to a HTTP repository. At that point the Version object with the new ancestry was already constructed in memory, so rather than worry about trying to "undo" all that, it was simpler and better to save it to a package-cache, persist it safely so the client can simply move forward from there (get access to the HTTP and copy it or whatever).

The package-cache is also handy as a default repository and as an offline storage.

Levente

...

Chris

...
...
really bad we should fix that, not surrender to it.

Yes, that should be fixed, but it needs changes on the server side. What I always had in mind was to extend the repository listing with hashes/uuids so that the client could figure out if it needs to download a specific version. But care must be taken not to break the code for non-ss repositories (e.g. simple directory listings).

Levente

...

Chris

John Pfersich

3:56 a.m.

I know that it mostly a cost and reconfiguration thing, but has there been any thought to maybe make multiple servers? With the front end doing a round robin to distribute the load? I’m saying this without knowing what kind of loads the server is experiencing, or whether there are log files that record the activity.

Sent from my iPhone https://boincstats.com/signature/-1/user/51616339056/sig.png See https://objectnets.net and https://objectnets.org

...

On Jan 27, 2019, at 18:45, Eliot Miranda eliot.miranda@gmail.com wrote:

Hi Levente,

...
...
On Jan 27, 2019, at 5:40 PM, Levente Uzonyi leves@caesar.elte.hu wrote:

On Sun, 27 Jan 2019, Chris Muller wrote:

Hi,

...
...
...
> Yes, the SqueakMap server image is one part of the dynamic, but I > think another is a bug in the trunk image. I think the reason Tim is > not seeing 45 seconds before error is because the timeout setting of > the high-up client is not being passed all the way down to the > lowest-level layers -- e.g., from HTTPSocket --> WebClient --> > SocketStream --> Socket. By the time it gets down to Socket which > does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

Well, it depends on if, for example, you're in the middle of Antarctica with a slow internet connection in an office with a fast connection. A 30 second timeout is just the maximum amount of time the client will wait for the entire process before presenting a debugger, that's all it can do.

We can be sure that Tim should get subsecond response times instead of timeouts after 30 seconds.

Right, but timeout settings are a necessary tool sometimes, my point was that we should fix client code in trunk to make timeouts work properly.

Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to map.squeak.org and click around and see. For the remaining 1% that aren't, the issue is known and we're working on a new server to fix that.

Great! That was my point: the image needs to be fixed.

...
...
...
...
> It is a fixed amount of time, I *think* still between 30 and 45 > seconds, that it takes the SqueakMap server to save its model after an

and so if in the meantime it can simply be made to wait 45s instead of 30s, then current SqueakMap will only be that occasional delay at worst, instead of the annoying debugger we currently get.

I don't see why that would make a difference: the user would get a debugger anyway, but only 15 seconds later.

...
...
...
...
You would save seconds, not milliseconds by not downloading files again.

IIUC, you're saying we would save one hope in the "download" -- instead of client <--> alan <--> andreas, it would just be client <--> alan. Is that right?

No. If the client doesn't have the mcz in the package cache but nginx has it in its cache, then we save the transfer of data between alan and andreas.

Are alan and andreas co-located?

They are cloud servers in the same data center.

...
...
The file doesn't have to be read from the disk either.

I assume you mean "read from disk" on alan? What about after it's cached so many mcz's in RAM that its paging out to swap file? To me, wasing precious RAM (of any server) to cache old MCZ file contents that no one will ever download (because they become old very quickly) feels wasteful. Dragster cars are wasteful too, but yes, they are "faster"... on a dragstrip. :) I guess there'd have to be some kind of application-specific smart management of the cache...

Nginx's proxy_cache can handle that all automatically. Also, we don't need a large cache. A small, memory-only cache would do it.

...
Levente, what about the trunk directory listing, can it cache that?

Sure.

...
That is the _#1 thing_ source.squeak.org is accessing and sending back over, and over, and over again -- every time that MC progress box that says, "Updating [repository name]".

Right, unless you update an older image.

...
...
If the client does have the mcz, then we save the complete file transfer.

...
I don't know what the speed between alan <---> andreas is, but I doubt it's much slower than client <---> alan in most cases, so the savings would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and send that through a socket.

By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server, and it'll just grind to a halt when some load arrives. Every time the external semaphore table is full, a GC is triggered to try clear it up via the finalization process. Reading a file into memory is slow, writing it to a socket is slow. (Compared to nginx which uses sendfile to let the kernel handle that). And Squeak can only use a single process to handle everything.

That’s configurable. Alas because writing lock-free table growth is not easy the external semaphore table doesn’t grow automatically. But the vm does allow its size to be specified in a value cached in the image header and read at startup (IIRC). So we could easily have a 4K entry external semaphore table.

...
...
Yes, it still has to return back through alan but I assume alan does not wait for a "full download" received from andreas before its already pipeing back to the Squeak client. If true, then it seems like it only amounts to saving one hop, which would hardly be noticeable over what we have now.

The goal of caching is not about saving a hop, but to avoid handling files in Squeak.

...
...
Nginx does that thing magnitudes faster than Squeak.

The UX would not be magnitudes faster though, right?

Directly by letting nginx serving the file, no, but the server image would be less likely to get stalled (return 5xx responses). But the caching scheme I described in this thread would make the UX a lot quicker too, because data would not have to be transferred when you already have it.

...
...
...
...
>> That would also let us save bandwidth by not downloading files already >> sitting in the client's package cache. > > How so? Isn't the package-cache checked before hitting the server at > all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe

Not at the same time, but it's possible, and it just happened recently with Chronology-ul.21. It is perfectly possible that a client has a version in its package cache with the same name as a different version on the server.

But we don't want to restrict what's possible in our software design because of that. That situation is already a headache anyway. Same name theoretically can come only from the same person (if we ensure unique initials) and so this is avoidable / fixable by resaving one of them as a different name...

It wasn't me who created the duplicate. If your suggestion had been in place, some images out there, including mine, would have been broken by the update process.

...
...
...
we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds

I don't know. It wasn't me who designed it. :)

I meant ANY "cache".

https://en.wikipedia.org/wiki/Cache_(computing)

It still depends on the purpose of the cache. It's possible that package-cache is just a misnomer or it was just a plan to use it as a cache which hasn't happened yet.

...
For Monticello, package-cache's other use-case is when an authentication issue occurs when trying to save to a HTTP repository. At that point the Version object with the new ancestry was already constructed in memory, so rather than worry about trying to "undo" all that, it was simpler and better to save it to a package-cache, persist it safely so the client can simply move forward from there (get access to the HTTP and copy it or whatever).

The package-cache is also handy as a default repository and as an offline storage.

Levente

...

Chris

...
...
really bad we should fix that, not surrender to it.

Yes, that should be fixed, but it needs changes on the server side. What I always had in mind was to extend the repository listing with hashes/uuids so that the client could figure out if it needs to download a specific version. But care must be taken not to break the code for non-ss repositories (e.g. simple directory listings).

Levente

...

Chris

Tobias Pape

9:07 a.m.

...

On 28.01.2019, at 03:56, John Pfersich via Squeak-dev squeak-dev@lists.squeakfoundation.org wrote:

I know that it mostly a cost and reconfiguration thing, but has there been any thought to maybe make multiple servers? With the front end doing a round robin to distribute the load? I’m saying this without knowing what kind of loads the server is experiencing, or whether there are log files that record the activity.

That's what squeaksource3 on gemstone is doing, actually. Three FCGI-servers (gems) serving the web interface and files, if necessary. also, one extra gem to collect stale seaside sessions and one extra gem to do async things like update statistics and send out emails. (The Squeak version of that uses Squeak processes for that and is probably not as resilient…)

Best regards -Tobias

Chris Muller

29 Jan 29 Jan

1:08 a.m.

...

...
On 28.01.2019, at 03:56, John Pfersich via Squeak-dev squeak-dev@lists.squeakfoundation.org wrote:

I know that it mostly a cost and reconfiguration thing, but has there been any thought to maybe make multiple servers? With the front end doing a round robin to distribute the load? I’m saying this without knowing what kind of loads the server is experiencing, or whether there are log files that record the activity.

That's what squeaksource3 on gemstone is doing, actually. Three FCGI-servers (gems) serving the web interface and files, if necessary. also, one extra gem to collect stale seaside sessions and one extra gem to do async things like update statistics and send out emails. (The Squeak version of that uses Squeak processes for that and is probably not as resilient…)

And yet, when I just went to:

http://ss3.gemtalksystems.com/ss/Projects

and entered "squeaksour" in the Search field to look for your referenced packages, my web browser sat there spinning for AT LEAST 30 seconds (felt like more, but I didn't time it) before returning with the result list. This is one of the issues I've had with squeaksource3 since the beginning. The same first-search on squeaksource.com hosted on single-threaded Squeak 3.x image running on interpreter VM is more than an order-of-magnitude faster.

My point being, I agree that "dragster tech" can be fun, but if the extra complexity it brings sinks the UX, it was all only just masturbation. The real challenge is not to use nginx, but to actually provide a UX that is better-enough to pay for the extra cost and complexity.

Best, Chris

Tobias Pape

8:45 a.m.

...

On 29.01.2019, at 01:08, Chris Muller asqueaker@gmail.com wrote:

...
...
On 28.01.2019, at 03:56, John Pfersich via Squeak-dev squeak-dev@lists.squeakfoundation.org wrote:

I know that it mostly a cost and reconfiguration thing, but has there been any thought to maybe make multiple servers? With the front end doing a round robin to distribute the load? I’m saying this without knowing what kind of loads the server is experiencing, or whether there are log files that record the activity.

That's what squeaksource3 on gemstone is doing, actually. Three FCGI-servers (gems) serving the web interface and files, if necessary. also, one extra gem to collect stale seaside sessions and one extra gem to do async things like update statistics and send out emails. (The Squeak version of that uses Squeak processes for that and is probably not as resilient…)

And yet, when I just went to:

http://ss3.gemtalksystems.com/ss/Projects

and entered "squeaksour" in the Search field to look for your referenced packages, my web browser sat there spinning for AT LEAST 30 seconds (felt like more, but I didn't time it) before returning with the result list. This is one of the issues I've had with squeaksource3 since the beginning. The same first-search on squeaksource.com hosted on single-threaded Squeak 3.x image running on interpreter VM is more than an order-of-magnitude faster.

"Oh hey Tobias, you said you made the mc directory listing fast. Look how slow the search on that site is ooooo"

Really?

I know that the search on that instance is slow. That's one of the several reasons why I/Dale actually want to upgrade the instance to the latest code base. If someone's inclined, we have that all tracked (https://github.com/krono/squeaksource3/issues/, especially https://github.com/krono/squeaksource3/issues/64 and such)

The directory listing is fast. Like these ones: http://ss3.gemtalksystems.com/ss/STON/ https://sdk.krestianstvo.org/sdk/croquet/

...

My point being, I agree that "dragster tech" can be fun, but if the extra complexity it brings sinks the UX, it was all only just masturbation. The real challenge is not to use nginx, but to actually provide a UX that is better-enough to pay for the extra cost and complexity.

"dragster tech"? are you kidding? "extra complexity"? Its called divided responsibilities and industry best practices.

But you know what? Ask Levente to put map.squeak.org out there on port 80. See how fast it dies.

Best of luck.

The instance with the latest code base is btw: https://sdk.krestianstvo.org/sdk, ss3 at gemtalksystems is arguably the oldest one.

But still.

I have no part in this belly dance anymore.

Good day. -Tobias

*plonk*

Chris Muller

10:59 p.m.

Hi Tobias,

Given that we're in agreement, this hostility is unnecessary and puzzling. Lighten up.

...

...
...
...
I know that it mostly a cost and reconfiguration thing, but has there been any thought to maybe make multiple servers? With the front end doing a round robin to distribute the load? I’m saying this without knowing what kind of loads the server is experiencing, or whether there are log files that record the activity.

That's what squeaksource3 on gemstone is doing, actually. Three FCGI-servers (gems) serving the web interface and files, if necessary. also, one extra gem to collect stale seaside sessions and one extra gem to do async things like update statistics and send out emails.

Paul mentions a "cost and configuration" trade-off, which you acknowledge you implemented as "Three FCGI servers" (what I refer to as a "dragster") for squeaksource3. But then this...

...

...
...
(The Squeak version of that uses Squeak processes for that and is probably not as resilient…)

... made an opportunity to draw a distinction between "resilience" and UX, which is _important_ too!

...

...
And yet, when I just went to:

http://ss3.gemtalksystems.com/ss/Projects

and entered "squeaksour" in the Search field to look for your referenced packages, my web browser sat there spinning for AT LEAST 30 seconds (felt like more, but I didn't time it) before returning with the result list. This is one of the issues I've had with squeaksource3 since the beginning. The same first-search on squeaksource.com hosted on single-threaded Squeak 3.x image running on interpreter VM is more than an order-of-magnitude faster.

"Oh hey Tobias, you said you made the mc directory listing fast. Look how slow the search on that site is ooooo"

Really?

I know that the search on that instance is slow.

Then you shouldn't be offended. We're discussing server implementations.

...

That's one of the several reasons why I/Dale actually want to upgrade the instance to the latest code base. If someone's inclined, we have that all tracked (https://github.com/krono/squeaksource3/issues/, especially https://github.com/krono/squeaksource3/issues/64 and such)

The directory listing is fast. Like these ones: http://ss3.gemtalksystems.com/ss/STON/ https://sdk.krestianstvo.org/sdk/croquet/

...
My point being, I agree that "dragster tech" can be fun, but if the extra complexity it brings sinks the UX, it was all only just masturbation. The real challenge is not to use nginx, but to actually provide a UX that is better-enough to pay for the extra cost and complexity.

"dragster tech"? are you kidding? "extra complexity"? Its called divided responsibilities and industry best practices.

Right. But you ignored my point about UX.

...

But you know what? Ask Levente to put map.squeak.org out there on port 80. See how fast it dies.

Best of luck.

You don't have to be so patronizing. You made this point very strongly earlier, and it is something which I was already aware, but not experienced in, so glad to learn from you and Levente about the best tools and practices to have a resilient Squeak-based server app on the net.

...

The instance with the latest code base is btw: https://sdk.krestianstvo.org/sdk, ss3 at gemtalksystems is arguably the oldest one.

But still.

Still what? Chill out.

Regards, Chris

...

I have no part in this belly dance anymore.

Good day. -Tobias

*plonk*

tim Rowledge

30 Jan 30 Jan

11:35 p.m.

Leaving aside the arguments that seem to have sprung up, I'd appreciate thoughts on - a) would it be better for me to rescind the socket retires I added a couple of weeks ago? b) are there any error cases it *is* worth catching and what handling would be beneficial?

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim You can't make a program without broken egos.

Levente Uzonyi

11:46 p.m.

On Wed, 30 Jan 2019, tim Rowledge wrote:

...

Leaving aside the arguments that seem to have sprung up, I'd appreciate thoughts on - a) would it be better for me to rescind the socket retires I added a couple of weeks ago?

Based on Chris's explanation about the 504 responses (data serialization), I don't think those would affect the image in any way, since it is not responding during the operation. I think it would be worth to add delays. I usually use exponential backoff with an upper limit (e.g. 10 seconds).

...

b) are there any error cases it *is* worth catching and what handling would be beneficial?

Maybe. 4xx reponses normally mean the client is doing something wrong. 5xx reponses mean there's something wrong with the server.

Levente

...

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim You can't make a program without broken egos.

Tobias Pape

28 Jan 28 Jan

9:04 a.m.

Hi,

...

On 28.01.2019, at 03:45, Eliot Miranda eliot.miranda@gmail.com wrote:

...
...

[…]

...

...
...
By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server, and it'll just grind to a halt when some load arrives. Every time the external semaphore table is full, a GC is triggered to try clear it up via the finalization process. Reading a file into memory is slow, writing it to a socket is slow. (Compared to nginx which uses sendfile to let the kernel handle that). And Squeak can only use a single process to handle everything.

That’s configurable. Alas because writing lock-free table growth is not easy the external semaphore table doesn’t grow automatically. But the vm does allow its size to be specified in a value cached in the image header and read at startup (IIRC). So we could easily have a 4K entry external semaphore table.

[…]

Eliot, can you give an example invocation so we can add that to the server? -t

Levente Uzonyi

3:22 p.m.

On Mon, 28 Jan 2019, Tobias Pape wrote:

...

Hi,

...
On 28.01.2019, at 03:45, Eliot Miranda eliot.miranda@gmail.com wrote:

...
...
[…]

...
...
...
By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server, and it'll just grind to a halt when some load arrives. Every time the external semaphore table is full, a GC is triggered to try clear it up via the finalization process. Reading a file into memory is slow, writing it to a socket is slow. (Compared to nginx which uses sendfile to let the kernel handle that). And Squeak can only use a single process to handle everything.

That’s configurable. Alas because writing lock-free table growth is not easy the external semaphore table doesn’t grow automatically. But the vm does allow its size to be specified in a value cached in the image header and read at startup (IIRC). So we could easily have a 4K entry external semaphore table.

[…]

Eliot, can you give an example invocation so we can add that to the server?

Not Eliot, but you can set it from the image and it's saved in the header. However you can't set it anytime, because there is a very slim chance of signals being lost during the table update. I have this in all my built images:

Smalltalk maxExternalSemaphores: 8192.

I guess we should bump the number for the next release so that others don't get bitten by this.

Levente

...

-t

Eliot Miranda

29 Jan 29 Jan

1:57 a.m.

Hi Tobias,

...

On Jan 28, 2019, at 12:04 AM, Tobias Pape Das.Linux@gmx.de wrote:

Hi,

...
On 28.01.2019, at 03:45, Eliot Miranda eliot.miranda@gmail.com wrote:

...
...
[…]

...
...
...
By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server, and it'll just grind to a halt when some load arrives. Every time the external semaphore table is full, a GC is triggered to try clear it up via the finalization process. Reading a file into memory is slow, writing it to a socket is slow. (Compared to nginx which uses sendfile to let the kernel handle that). And Squeak can only use a single process to handle everything.

That’s configurable. Alas because writing lock-free table growth is not easy the external semaphore table doesn’t grow automatically. But the vm does allow its size to be specified in a value cached in the image header and read at startup (IIRC). So we could easily have a 4K entry external semaphore table.

[…]

Eliot, can you give an example invocation so we can add that to the server?

Im on my phone in the car right now do I can’t confirm, but IIRC it is an image save thing. You specify the size via a vmParameterAt:put: send which updates the size but if you save the image that size is remembered in the image file header. I need to check but the info should be in the comments for vmParameterAt:[put:]

...

-t

Chris Muller

28 Jan 28 Jan

4:04 a.m.

Whew! :)

...

...
...
...
...
...
Yes, the SqueakMap server image is one part of the dynamic, but I think another is a bug in the trunk image. I think the reason Tim is not seeing 45 seconds before error is because the timeout setting of the high-up client is not being passed all the way down to the lowest-level layers -- e.g., from HTTPSocket --> WebClient --> SocketStream --> Socket. By the time it gets down to Socket which does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

Well, it depends on if, for example, you're in the middle of Antarctica with a slow internet connection in an office with a fast connection. A 30 second timeout is just the maximum amount of time the client will wait for the entire process before presenting a debugger, that's all it can do.

We can be sure that Tim should get subsecond response times instead of timeouts after 30 seconds.

Right, but timeout settings are a necessary tool sometimes, my point was that we should fix client code in trunk to make timeouts work properly.

Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to map.squeak.org and click around and see. For the remaining 1% that aren't, the issue is known and we're working on a new server to fix that.

Great! That was my point: the image needs to be fixed.

But, you're referring to the server image as "the image needs to be fixed", which I've already conceded, whereas I'm referring to the client image -- our trunk image -- as also needing the suspected bug(s) with WebClient (et al) fixed.

...

...
...
...
...
...
It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model after an

and so if in the meantime it can simply be made to wait 45s instead of 30s, then current SqueakMap will only be that occasional delay at worst, instead of the annoying debugger we currently get.

I don't see why that would make a difference: the user would get a debugger anyway, but only 15 seconds later.

No! :) As I said:

...

...
...
...
...
...
It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model

So they would get a response < 15s later, not a debuuger.

The server needs the same amount of time to save every time whenever it happens -- it's very predictable -- and right now to avoid a debugger Squeak trunk image simply needs to be fixed to honor the 45s timeout instead of ignoring it and always defaulting to 30.

...

...
Are alan and andreas co-located?

They are cloud servers in the same data center.

...
...
The file doesn't have to be read from the disk either.

I assume you mean "read from disk" on alan? What about after it's cached so many mcz's in RAM that its paging out to swap file? To me, wasing precious RAM (of any server) to cache old MCZ file contents that no one will ever download (because they become old very quickly) feels wasteful. Dragster cars are wasteful too, but yes, they are "faster"... on a dragstrip. :) I guess there'd have to be some kind of application-specific smart management of the cache...

Nginx's proxy_cache can handle that all automatically. Also, we don't need a large cache. A small, memory-only cache would do it.

How "small" could it be and still contain all the MCZ's you want to use to update an an "old" image?

...

...
Levente, what about the trunk directory listing, can it cache that?

Sure.

...
That is the _#1 thing_ source.squeak.org is accessing and sending back over, and over, and over again -- every time that MC progress box that says, "Updating [repository name]".

Right, unless you update an older image.

System resources should not be allocated to optimizing "build" and "initialize" use-cases. Those UC's are one offs run by developers, typically even in the background.

System resources should be optimized around actual **end-user's interacting with UI's**...

...

...
...
If the client does have the mcz, then we save the complete file transfer.

...
I don't know what the speed between alan <---> andreas is, but I doubt it's much slower than client <---> alan in most cases, so the savings would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and send that through a socket.

By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server,

So, that is that (256 / 4 = 64) concurrent requests for a MCZ before it is full? Probably enough for our small community, but you also said that's just a default we can increase? Something I'd like to know if I need for Magma too, where can I find this setting?

...

and it'll just grind to a halt when some load arrives. Every time the external semaphore table is full, a GC is triggered to try clear it up via the finalization process. Reading a file into memory is slow, writing it to a socket is slow. (Compared to nginx which uses sendfile to let the kernel handle that). And Squeak can only use a single process to handle everything.

To me, it comes back to UX. If we ever get enough load for that to be an issue, it might be worth looking into.

...

...
Yes, it still has to return back through alan but I assume alan does not wait for a "full download" received from andreas before its already pipeing back to the Squeak client. If true, then it seems like it only amounts to saving one hop, which would hardly be noticeable over what we have now.

The goal of caching is not about saving a hop, but to avoid handling files in Squeak.

...
...
Nginx does that thing magnitudes faster than Squeak.

The UX would not be magnitudes faster though, right?

Directly by letting nginx serving the file, no, but the server image would be less likely to get stalled (return 5xx responses).

SqueakMap and SqueakSource.com are old still with plans for upgrading, but are you still getting 5xx's on source.squeak.org?

...

But the caching scheme I described in this thread would make the UX a lot quicker too, because data would not have to be transferred when you already have it.

I assume you mean "data would not have to be transferred" from andreas to alan... from within the same data center..! :)

...

...
...
...
...
...
> That would also let us save bandwidth by not downloading files already > sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe

Not at the same time, but it's possible, and it just happened recently with Chronology-ul.21. It is perfectly possible that a client has a version in its package cache with the same name as a different version on the server.

But we don't want to restrict what's possible in our software design because of that. That situation is already a headache anyway. Same name theoretically can come only from the same person (if we ensure unique initials) and so this is avoidable / fixable by resaving one of them as a different name...

It wasn't me who created the duplicate. If your suggestion had been in place, some images out there, including mine, would have been broken by the update process.

I don't think so, since I said it would open up the .mcz in package-cache and verify the UUID.

I guess I don't know what you mean -- I see only one Chronology-ul.21 in the ancestry currently anyway..

...

...
...
...
we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds

I don't know. It wasn't me who designed it. :)

I meant ANY "cache".

https://en.wikipedia.org/wiki/Cache_(computing)

It still depends on the purpose of the cache. It's possible that package-cache is just a misnomer or it was just a plan to use it as a cache which hasn't happened yet.

...
For Monticello, package-cache's other use-case is when an authentication issue occurs when trying to save to a HTTP repository. At that point the Version object with the new ancestry was already constructed in memory, so rather than worry about trying to "undo" all that, it was simpler and better to save it to a package-cache, persist it safely so the client can simply move forward from there (get access to the HTTP and copy it or whatever).

The package-cache is also handy as a default repository and as an offline storage.

I'm sure you would agree it's better for client images to check their local package-cache first before hitting nginx.

- Chris

Tobias Pape

9:10 a.m.

...

On 28.01.2019, at 04:04, Chris Muller ma.chris.m@gmail.com wrote:

...
...
...
If the client does have the mcz, then we save the complete file transfer.

...
I don't know what the speed between alan <---> andreas is, but I doubt it's much slower than client <---> alan in most cases, so the savings would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and send that through a socket.

By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server,

So, that is that (256 / 4 = 64) concurrent requests for a MCZ before it is full? Probably enough for our small community, but you also said that's just a default we can increase? Something I'd like to know if I need for Magma too, where can I find this setting?

Are you aware that a lot of requests can happen with Travis-CI builds requesting such things?

Also, you should deduct several semaphores for sources and changes files, and conections to magma, and the squeak debug log file and …

-t

Tobias Pape

9:10 a.m.

...

On 28.01.2019, at 04:04, Chris Muller ma.chris.m@gmail.com wrote:

...
Directly by letting nginx serving the file, no, but the server image would be less likely to get stalled (return 5xx responses).

SqueakMap and SqueakSource.com are old still with plans for upgrading, but are you still getting 5xx's on source.squeak.org?

Tim is, that started this thread (see subject line) -t

Tobias Pape

9:13 a.m.

...

On 28.01.2019, at 04:04, Chris Muller ma.chris.m@gmail.com wrote:

...
But the caching scheme I described in this thread would make the UX a lot quicker too, because data would not have to be transferred when you already have it.

I assume you mean "data would not have to be transferred" from andreas to alan... from within the same data center..! :)

No, that is not the point. When nginx knows the data that the image wants, and the image says so, it just can answer 304 not modified and does not send anything. Thats extremely handy. Also, we know that the data center is not the limiting factor here but (server-side) squeak is.

-t

Levente Uzonyi

3:40 p.m.

Hi Chris,

On Sun, 27 Jan 2019, Chris Muller wrote:

...

Whew! :)

...
...
...
...
...
> Yes, the SqueakMap server image is one part of the dynamic, but I > think another is a bug in the trunk image. I think the reason Tim is > not seeing 45 seconds before error is because the timeout setting of > the high-up client is not being passed all the way down to the > lowest-level layers -- e.g., from HTTPSocket --> WebClient --> > SocketStream --> Socket. By the time it gets down to Socket which > does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

Well, it depends on if, for example, you're in the middle of Antarctica with a slow internet connection in an office with a fast connection. A 30 second timeout is just the maximum amount of time the client will wait for the entire process before presenting a debugger, that's all it can do.

We can be sure that Tim should get subsecond response times instead of timeouts after 30 seconds.

Right, but timeout settings are a necessary tool sometimes, my point was that we should fix client code in trunk to make timeouts work properly.

Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to map.squeak.org and click around and see. For the remaining 1% that aren't, the issue is known and we're working on a new server to fix that.

Great! That was my point: the image needs to be fixed.

But, you're referring to the server image as "the image needs to be fixed", which I've already conceded, whereas I'm referring to the client image -- our trunk image -- as also needing the suspected bug(s) with WebClient (et al) fixed.

I don't think anything related to the server timeouts needs to be fixed there. Sure, more user friendly error messages could be useful, but in relation to Tim's problem at the network level there's nothing wrong in the client.

...

...
...
...
...
...
> It is a fixed amount of time, I *think* still between 30 and 45 > seconds, that it takes the SqueakMap server to save its model after an

and so if in the meantime it can simply be made to wait 45s instead of 30s, then current SqueakMap will only be that occasional delay at worst, instead of the annoying debugger we currently get.

I don't see why that would make a difference: the user would get a debugger anyway, but only 15 seconds later.

No! :) As I said:

...
...
...
...
...
> It is a fixed amount of time, I *think* still between 30 and 45 > seconds, that it takes the SqueakMap server to save its model

So they would get a response < 15s later, not a debuuger.

Provided the image is able to answer before 45 seconds, which is very likely not the case here.

...

The server needs the same amount of time to save every time whenever it happens -- it's very predictable -- and right now to avoid a debugger Squeak trunk image simply needs to be fixed to honor the 45s timeout instead of ignoring it and always defaulting to 30.

Uh. Do you say it's because of the image is being saved? I never liked that idea of image-based persistance, and this is a good reason why.

...

...
...
Are alan and andreas co-located?

They are cloud servers in the same data center.

...
...
The file doesn't have to be read from the disk either.

I assume you mean "read from disk" on alan? What about after it's cached so many mcz's in RAM that its paging out to swap file? To me, wasing precious RAM (of any server) to cache old MCZ file contents that no one will ever download (because they become old very quickly) feels wasteful. Dragster cars are wasteful too, but yes, they are "faster"... on a dragstrip. :) I guess there'd have to be some kind of application-specific smart management of the cache...

Nginx's proxy_cache can handle that all automatically. Also, we don't need a large cache. A small, memory-only cache would do it.

How "small" could it be and still contain all the MCZ's you want to use to update an an "old" image?

In my definition of old, it is at most back to the last release. In today's terms of small: less than 1 GB. But in this case I presume less than 100 MB would be enough.

...

...
...
Levente, what about the trunk directory listing, can it cache that?

Sure.

...
That is the _#1 thing_ source.squeak.org is accessing and sending back over, and over, and over again -- every time that MC progress box that says, "Updating [repository name]".

Right, unless you update an older image.

System resources should not be allocated to optimizing "build" and "initialize" use-cases. Those UC's are one offs run by developers, typically even in the background.

System resources should be optimized around actual **end-user's interacting with UI's**...

Indeed. Not that there's a shortage of resources.

...

...
...
...
If the client does have the mcz, then we save the complete file transfer.

...
I don't know what the speed between alan <---> andreas is, but I doubt it's much slower than client <---> alan in most cases, so the savings would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and send that through a socket.

By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server,

So, that is that (256 / 4 = 64) concurrent requests for a MCZ before it is full? Probably enough for our small community, but you also said that's just a default we can increase? Something I'd like to know if I need for Magma too, where can I find this setting?

As Tobias wrote, you'll get bitten by this quickly. It's not enough for any server facing the internet where non-friendly actors are common.

...

...
and it'll just grind to a halt when some load arrives. Every time the external semaphore table is full, a GC is triggered to try clear it up via the finalization process. Reading a file into memory is slow, writing it to a socket is slow. (Compared to nginx which uses sendfile to let the kernel handle that). And Squeak can only use a single process to handle everything.

To me, it comes back to UX. If we ever get enough load for that to be an issue, it might be worth looking into.

It's basic stuff. Without this a single web crawler can render your image unusable.

...

...
...
Yes, it still has to return back through alan but I assume alan does not wait for a "full download" received from andreas before its already pipeing back to the Squeak client. If true, then it seems like it only amounts to saving one hop, which would hardly be noticeable over what we have now.

The goal of caching is not about saving a hop, but to avoid handling files in Squeak.

...
...
Nginx does that thing magnitudes faster than Squeak.

The UX would not be magnitudes faster though, right?

Directly by letting nginx serving the file, no, but the server image would be less likely to get stalled (return 5xx responses).

SqueakMap and SqueakSource.com are old still with plans for upgrading, but are you still getting 5xx's on source.squeak.org?

I never said I had problems. Tim had them with SqueakMap. As I mentioned before, the discussion changed direction.

...

...
But the caching scheme I described in this thread would make the UX a lot quicker too, because data would not have to be transferred when you already have it.

I assume you mean "data would not have to be transferred" from andreas to alan... from within the same data center..! :)

I understand your confusion. There are at least 3 suggestions described in this thread to remedy the situation. All with different effects.

...

...
...
...
...
...
>> That would also let us save bandwidth by not downloading files already >> sitting in the client's package cache. > > How so? Isn't the package-cache checked before hitting the server at > all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe

Not at the same time, but it's possible, and it just happened recently with Chronology-ul.21. It is perfectly possible that a client has a version in its package cache with the same name as a different version on the server.

But we don't want to restrict what's possible in our software design because of that. That situation is already a headache anyway. Same name theoretically can come only from the same person (if we ensure unique initials) and so this is avoidable / fixable by resaving one of them as a different name...

It wasn't me who created the duplicate. If your suggestion had been in place, some images out there, including mine, would have been broken by the update process.

I don't think so, since I said it would open up the .mcz in package-cache and verify the UUID.

What is the UUID of an mcd?

...

I guess I don't know what you mean -- I see only one Chronology-ul.21 in the ancestry currently anyway..

Never said it was in the ancestry. In the Trunk there is:

Name: Chronology-Core-ul.21 Author: dtl Time: 4 January 2019, 1:17:39.848442 pm UUID: 5d9b02fa-8e37-4678-adda-f302163732a1

In the Treated Inbox there is:

Name: Chronology-Core-ul.21 Author: ul Time: 26 December 2018, 1:48:40.220196 am UUID: 2e6f6ce2-d0ec-41a0-b27c-88c642e5afc9

...

...
...
...
...
we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds

I don't know. It wasn't me who designed it. :)

I meant ANY "cache".

https://en.wikipedia.org/wiki/Cache_(computing)

It still depends on the purpose of the cache. It's possible that package-cache is just a misnomer or it was just a plan to use it as a cache which hasn't happened yet.

...
For Monticello, package-cache's other use-case is when an authentication issue occurs when trying to save to a HTTP repository. At that point the Version object with the new ancestry was already constructed in memory, so rather than worry about trying to "undo" all that, it was simpler and better to save it to a package-cache, persist it safely so the client can simply move forward from there (get access to the HTTP and copy it or whatever).

The package-cache is also handy as a default repository and as an offline storage.

I'm sure you would agree it's better for client images to check their local package-cache first before hitting nginx.

Sure, but that can only be possible if the server sends more information about the package the client should download (e.g. the UUID or some hash). Without that the client would assume that it has the right version when it doesn't and failure is unavoidable. (as I described above in relation to Chronology-Core-ul.21). And, as I previously wrote, that would change the way the statistics are handled on SS.

Levente

...

Chris

Chris Muller

29 Jan 29 Jan

1:02 a.m.

Hi again,

...

...
...
...
...
...
>> Yes, the SqueakMap server image is one part of the dynamic, but I >> think another is a bug in the trunk image. I think the reason Tim is >> not seeing 45 seconds before error is because the timeout setting of >> the high-up client is not being passed all the way down to the >> lowest-level layers -- e.g., from HTTPSocket --> WebClient --> >> SocketStream --> Socket. By the time it gets down to Socket which >> does the actual work, it's operating on its own 30 second timeout. > > I would expect subsecond reponse times. 30 seconds is just unacceptably > long.

Well, it depends on if, for example, you're in the middle of Antarctica with a slow internet connection in an office with a fast connection. A 30 second timeout is just the maximum amount of time the client will wait for the entire process before presenting a debugger, that's all it can do.

We can be sure that Tim should get subsecond response times instead of timeouts after 30 seconds.

Right, but timeout settings are a necessary tool sometimes, my point was that we should fix client code in trunk to make timeouts work properly.

Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to map.squeak.org and click around and see. For the remaining 1% that aren't, the issue is known and we're working on a new server to fix that.

Great! That was my point: the image needs to be fixed.

But, you're referring to the server image as "the image needs to be fixed", which I've already conceded, whereas I'm referring to the client image -- our trunk image -- as also needing the suspected bug(s) with WebClient (et al) fixed.

I don't think anything related to the server timeouts needs to be fixed there. Sure, more user friendly error messages could be useful, but in relation to Tim's problem at the network level there's nothing wrong in the client.

I'm not sure if you're saying that timeout setting currently works correctly in trunk, or that it doesn't _need_ to work correctly. Hopefully the former...

...

...
...
...
...
...
>> It is a fixed amount of time, I *think* still between 30 and 45 >> seconds, that it takes the SqueakMap server to save its model after an

and so if in the meantime it can simply be made to wait 45s instead of 30s, then current SqueakMap will only be that occasional delay at worst, instead of the annoying debugger we currently get.

I don't see why that would make a difference: the user would get a debugger anyway, but only 15 seconds later.

No! :) As I said:

...
...
...
...
>> It is a fixed amount of time, I *think* still between 30 and 45 >> seconds, that it takes the SqueakMap server to save its model

So they would get a response < 15s later, not a debuuger.

Provided the image is able to answer before 45 seconds, which is very likely not the case here.

My assertions are based on my experience and observations working on the SMSqueakMap server image, and being the admin of that server image since 2011...

...

...
The server needs the same amount of time to save every time whenever it happens -- it's very predictable -- and right now to avoid a debugger Squeak trunk image simply needs to be fixed to honor the 45s timeout instead of ignoring it and always defaulting to 30.

... the serialized size of the model in 2011 was 715K, today it's 777K. Do you have any reason to think the save times _shouldn't_ be consistent?

I suggest you download the SqueakMap server image and check it out for yourself. I actually can't even remember which server its on, but I could send you a copy of it.

...

Uh. Do you say it's because of the image is being saved? I never liked that idea of image-based persistance, and this is a good reason why.

No. The SqueakMap server image saves the SMSqueakMap object to a file using ReferenceStream. See the files in your Squeak directory /sm/map.[nnnn].gz. The server created those files, and you downloaded a copy of them when the SqueakMap "Update" button was clicked.

...

...
...
...
Are alan and andreas co-located?

They are cloud servers in the same data center.

...
...
The file doesn't have to be read from the disk either.

I assume you mean "read from disk" on alan? What about after it's cached so many mcz's in RAM that its paging out to swap file? To me, wasing precious RAM (of any server) to cache old MCZ file contents that no one will ever download (because they become old very quickly) feels wasteful. Dragster cars are wasteful too, but yes, they are "faster"... on a dragstrip. :) I guess there'd have to be some kind of application-specific smart management of the cache...

Nginx's proxy_cache can handle that all automatically. Also, we don't need a large cache. A small, memory-only cache would do it.

How "small" could it be and still contain all the MCZ's you want to use to update an an "old" image?

In my definition of old, it is at most back to the last release. In today's terms of small: less than 1 GB. But in this case I presume less than 100 MB would be enough.

I never doubted you that nginx is faster, only that it would provide any noticeable difference to most UC's...

...

...
...
...
Levente, what about the trunk directory listing, can it cache that?

Sure.

... except if we did cache this. I think this would alleviate meaningful amount of load off the server and infrastructure.

...

...
...
...
By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server,

So, that is that (256 / 4 = 64) concurrent requests for a MCZ before it is full? Probably enough for our small community, but you also said that's just a default we can increase? Something I'd like to know if I need for Magma too, where can I find this setting?

As Tobias wrote, you'll get bitten by this quickly. It's not enough for any server facing the internet where non-friendly actors are common.

Why haven't we then? Are we being shielded by alan?

...

...
...
and it'll just grind to a halt when some load arrives. Every time the external semaphore table is full, a GC is triggered to try clear it up via the finalization process. Reading a file into memory is slow, writing it to a socket is slow. (Compared to nginx which uses sendfile to let the kernel handle that). And Squeak can only use a single process to handle everything.

To me, it comes back to UX. If we ever get enough load for that to be an issue, it might be worth looking into.

It's basic stuff. Without this a single web crawler can render your image unusable.

Sounds like something that's easy to increase, although wouldn't that just make the server vulnerable to a larger version of the same attack? So I've got to think some of the strategy must depend on _how_ the server responds to invalid requests, too...

...

...
...
...
Yes, it still has to return back through alan but I assume alan does not wait for a "full download" received from andreas before its already pipeing back to the Squeak client. If true, then it seems like it only amounts to saving one hop, which would hardly be noticeable over what we have now.

The goal of caching is not about saving a hop, but to avoid handling files in Squeak.

...
...
Nginx does that thing magnitudes faster than Squeak.

The UX would not be magnitudes faster though, right?

Directly by letting nginx serving the file, no, but the server image would be less likely to get stalled (return 5xx responses).

SqueakMap and SqueakSource.com are old still with plans for upgrading, but are you still getting 5xx's on source.squeak.org?

I never said I had problems. Tim had them with SqueakMap. As I mentioned before, the discussion changed direction.

It's been an enlightening discussion in any case. :)

...

...
...
But the caching scheme I described in this thread would make the UX a lot quicker too, because data would not have to be transferred when you already have it.

I assume you mean "data would not have to be transferred" from andreas to alan... from within the same data center..! :)

I understand your confusion. There are at least 3 suggestions described in this thread to remedy the situation. All with different effects.

Okay, maybe you meant that for client requesting from alan, alan would check a timestamp on the header of the request sent from client, and quickly send back a code saying, "you already got it."

But my point was that client should check the local package-cache first anyway.

...

...
...
...
...
...
>>> That would also let us save bandwidth by not downloading files already >>> sitting in the client's package cache. >> >> How so? Isn't the package-cache checked before hitting the server at >> all? It certainly should be. > > No, it's not. Currently that's not possible, because different files can > have the same name. And currently we have no way to tell them apart.

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe

Not at the same time, but it's possible, and it just happened recently with Chronology-ul.21. It is perfectly possible that a client has a version in its package cache with the same name as a different version on the server.

But we don't want to restrict what's possible in our software design because of that. That situation is already a headache anyway. Same name theoretically can come only from the same person (if we ensure unique initials) and so this is avoidable / fixable by resaving one of them as a different name...

It wasn't me who created the duplicate. If your suggestion had been in place, some images out there, including mine, would have been broken by the update process.

I don't think so, since I said it would open up the .mcz in package-cache and verify the UUID.

What is the UUID of an mcd?

mcd's are the same as mcz's except with fewer MCDefinitions inside. I assume you mean mcm here, which was not part of any of this discussion so far. Still, I don't see any issues. Dup names are simply not supported, period.

...

...
I guess I don't know what you mean -- I see only one Chronology-ul.21 in the ancestry currently anyway..

Never said it was in the ancestry. In the Trunk there is:

Name: Chronology-Core-ul.21 Author: dtl Time: 4 January 2019, 1:17:39.848442 pm UUID: 5d9b02fa-8e37-4678-adda-f302163732a1

In the Treated Inbox there is:

Name: Chronology-Core-ul.21 Author: ul Time: 26 December 2018, 1:48:40.220196 am UUID: 2e6f6ce2-d0ec-41a0-b27c-88c642e5afc9

Okay. Are you the author of both? This is something you yourself need to guard against doing, but as it's in Treated anyway, I don't really see any pertinent impact in the case of Chronology-Core-ul.21.

...

...
I'm sure you would agree it's better for client images to check their local package-cache first before hitting nginx.

Sure, but that can only be possible if the server sends more information about the package the client should download (e.g. the UUID or some hash). Without that the client would assume that it has the right version when it doesn't and failure is unavoidable. (as I described above in relation to Chronology-Core-ul.21).

I think I would need a more detailed and/or concrete example of what you mean, because I'm not understanding the validity of your assertion that it isn't possible without hitting the server. What Use Case are you talking about? I was talking about the UC of "Diffing two Versions". The UUID's are all in the mcz's / mcd's.

So, for example, if I wanted to diff a selected Chronology-Core-ul.21 with its ancestor the process would be:

- client identifies ancestor of a selected Chronology-Core-ul.21. Let's just say its Chronology-Core-ul.20, with id 'abc-xyz-123'. - client looks in package-cache, finds a Chronology-Core-ul.20.mcz file, opens it up and checks the UUID. - if its 'abc-xyz-123', then it uses it.

The server was never hit at all, so it doesn't need to "send more information"... As I said, duplicate names are something to be avoided in the first place, we shouldn't restrict the potential of the tools because of the possibility of a duplicate-named Version.

Best, Chris

David T. Lewis

3:41 a.m.

On Mon, Jan 28, 2019 at 06:02:05PM -0600, Chris Muller wrote:

...

...
...
I guess I don't know what you mean -- I see only one Chronology-ul.21 in the ancestry currently anyway..

Never said it was in the ancestry. In the Trunk there is:

Name: Chronology-Core-ul.21 Author: dtl Time: 4 January 2019, 1:17:39.848442 pm UUID: 5d9b02fa-8e37-4678-adda-f302163732a1

In the Treated Inbox there is:

Name: Chronology-Core-ul.21 Author: ul Time: 26 December 2018, 1:48:40.220196 am UUID: 2e6f6ce2-d0ec-41a0-b27c-88c642e5afc9

Okay. Are you the author of both? This is something you yourself need to guard against doing, but as it's in Treated anyway, I don't really see any pertinent impact in the case of Chronology-Core-ul.21.

This is because of the the following commit that I made to trunk:

Name: Chronology-Core-ul.21 Author: dtl Time: 4 January 2019, 1:17:39.848442 pm UUID: 5d9b02fa-8e37-4678-adda-f302163732a1 Ancestors: Chronology-Core-dtl.20

From Chronology-Core-ul.21 from inbox, and resaved to ensure that version history exactly matches that of trunk. Updated by dtl and saved with original author initials.

This was a case in which I intentionally re-wrote the version history in in order to make the trunk update stream appear to be clean even though it actually contained a merge of a long series of changes that had been developed and maintained elsewhere (http://www.squeaksource.com/UTCDateAndTime).

Regardless of whether you think that "cleaning" the version history is a good thing to do, there is no way that a local package-cache based on file names can be expected to figure out the resulting confusion.

Dave

Tobias Pape

28 Jan 28 Jan

9:03 a.m.

What levente says. He's right here :)

...

On 28.01.2019, at 02:40, Levente Uzonyi leves@caesar.elte.hu wrote:

On Sun, 27 Jan 2019, Chris Muller wrote:

...
Hi,

...
...
...
...
Yes, the SqueakMap server image is one part of the dynamic, but I think another is a bug in the trunk image. I think the reason Tim is not seeing 45 seconds before error is because the timeout setting of the high-up client is not being passed all the way down to the lowest-level layers -- e.g., from HTTPSocket --> WebClient --> SocketStream --> Socket. By the time it gets down to Socket which does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

Well, it depends on if, for example, you're in the middle of Antarctica with a slow internet connection in an office with a fast connection. A 30 second timeout is just the maximum amount of time the client will wait for the entire process before presenting a debugger, that's all it can do.

We can be sure that Tim should get subsecond response times instead of timeouts after 30 seconds.

Right, but timeout settings are a necessary tool sometimes, my point was that we should fix client code in trunk to make timeouts work properly.

Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to map.squeak.org and click around and see. For the remaining 1% that aren't, the issue is known and we're working on a new server to fix that.

Great! That was my point: the image needs to be fixed.

...
...
...
...
...
It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model after an

and so if in the meantime it can simply be made to wait 45s instead of 30s, then current SqueakMap will only be that occasional delay at worst, instead of the annoying debugger we currently get.

I don't see why that would make a difference: the user would get a debugger anyway, but only 15 seconds later.

...
...
...
...
You would save seconds, not milliseconds by not downloading files again.

IIUC, you're saying we would save one hope in the "download" -- instead of client <--> alan <--> andreas, it would just be client <--> alan. Is that right?

No. If the client doesn't have the mcz in the package cache but nginx has it in its cache, then we save the transfer of data between alan and andreas.

Are alan and andreas co-located?

They are cloud servers in the same data center.

...
...
The file doesn't have to be read from the disk either.

I assume you mean "read from disk" on alan? What about after it's cached so many mcz's in RAM that its paging out to swap file? To me, wasing precious RAM (of any server) to cache old MCZ file contents that no one will ever download (because they become old very quickly) feels wasteful. Dragster cars are wasteful too, but yes, they are "faster"... on a dragstrip. :) I guess there'd have to be some kind of application-specific smart management of the cache...

Nginx's proxy_cache can handle that all automatically. Also, we don't need a large cache. A small, memory-only cache would do it.

...
Levente, what about the trunk directory listing, can it cache that?

Sure.

...
That is the _#1 thing_ source.squeak.org is accessing and sending back over, and over, and over again -- every time that MC progress box that says, "Updating [repository name]".

Right, unless you update an older image.

...
...
If the client does have the mcz, then we save the complete file transfer.

...
I don't know what the speed between alan <---> andreas is, but I doubt it's much slower than client <---> alan in most cases, so the savings would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and send that through a socket.

By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server, and it'll just grind to a halt when some load arrives. Every time the external semaphore table is full, a GC is triggered to try clear it up via the finalization process. Reading a file into memory is slow, writing it to a socket is slow. (Compared to nginx which uses sendfile to let the kernel handle that). And Squeak can only use a single process to handle everything.

...
Yes, it still has to return back through alan but I assume alan does not wait for a "full download" received from andreas before its already pipeing back to the Squeak client. If true, then it seems like it only amounts to saving one hop, which would hardly be noticeable over what we have now.

The goal of caching is not about saving a hop, but to avoid handling files in Squeak.

...
...
Nginx does that thing magnitudes faster than Squeak.

The UX would not be magnitudes faster though, right?

Directly by letting nginx serving the file, no, but the server image would be less likely to get stalled (return 5xx responses). But the caching scheme I described in this thread would make the UX a lot quicker too, because data would not have to be transferred when you already have it.

...
...
...
...
...
> That would also let us save bandwidth by not downloading files already > sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe

Not at the same time, but it's possible, and it just happened recently with Chronology-ul.21. It is perfectly possible that a client has a version in its package cache with the same name as a different version on the server.

But we don't want to restrict what's possible in our software design because of that. That situation is already a headache anyway. Same name theoretically can come only from the same person (if we ensure unique initials) and so this is avoidable / fixable by resaving one of them as a different name...

It wasn't me who created the duplicate. If your suggestion had been in place, some images out there, including mine, would have been broken by the update process.

...
...
...
we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds

I don't know. It wasn't me who designed it. :)

I meant ANY "cache".

https://en.wikipedia.org/wiki/Cache_(computing)

It still depends on the purpose of the cache. It's possible that package-cache is just a misnomer or it was just a plan to use it as a cache which hasn't happened yet.

...
For Monticello, package-cache's other use-case is when an authentication issue occurs when trying to save to a HTTP repository. At that point the Version object with the new ancestry was already constructed in memory, so rather than worry about trying to "undo" all that, it was simpler and better to save it to a package-cache, persist it safely so the client can simply move forward from there (get access to the HTTP and copy it or whatever).

The package-cache is also handy as a default repository and as an offline storage.

Levente

...

Chris

...
...
really bad we should fix that, not surrender to it.

Yes, that should be fixed, but it needs changes on the server side. What I always had in mind was to extend the repository listing with hashes/uuids so that the client could figure out if it needs to download a specific version. But care must be taken not to break the code for non-ss repositories (e.g. simple directory listings).

Levente

...

Chris

Tobias Pape

8:46 a.m.

Tobias Pape

8:48 a.m.

...

On 28.01.2019, at 01:39, Chris Muller ma.chris.m@gmail.com wrote:

Hi,

...
...
...
...
Yes, the SqueakMap server image is one part of the dynamic, but I think another is a bug in the trunk image. I think the reason Tim is not seeing 45 seconds before error is because the timeout setting of the high-up client is not being passed all the way down to the lowest-level layers -- e.g., from HTTPSocket --> WebClient --> SocketStream --> Socket. By the time it gets down to Socket which does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

Well, it depends on if, for example, you're in the middle of Antarctica with a slow internet connection in an office with a fast connection. A 30 second timeout is just the maximum amount of time the client will wait for the entire process before presenting a debugger, that's all it can do.

We can be sure that Tim should get subsecond response times instead of timeouts after 30 seconds.

Right, but timeout settings are a necessary tool sometimes, my point was that we should fix client code in trunk to make timeouts work properly.

Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to map.squeak.org and click around and see. For the remaining 1% that aren't, the issue is known and we're working on a new server to fix that.

...
...
...
...
It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model after an

and so if in the meantime it can simply be made to wait 45s instead of 30s, then current SqueakMap will only be that occasional delay at worst, instead of the annoying debugger we currently get.

...
...
...
You would save seconds, not milliseconds by not downloading files again.

IIUC, you're saying we would save one hope in the "download" -- instead of client <--> alan <--> andreas, it would just be client <--> alan. Is that right?

No. If the client doesn't have the mcz in the package cache but nginx has it in its cache, then we save the transfer of data between alan and andreas.

Are alan and andreas co-located?

They're VMs on rackspace. The slowest bandwidth Rackspace has is 200 MBit/s, the fastest 2 GBit/s, i forgot which we have. The network is not the limiting factor here, Squeak is.

...

...
The file doesn't have to be read from the disk either.

I assume you mean "read from disk" on alan? What about after it's cached so many mcz's in RAM that its paging out to swap file? To me, wasing precious RAM (of any server) to cache old MCZ file contents that no one will ever download (because they become old very quickly) feels wasteful. Dragster cars are wasteful too, but yes, they are "faster"... on a dragstrip. :) I guess there'd have to be some kind of application-specific smart management of the cache...

Levente, what about the trunk directory listing, can it cache that? That is the _#1 thing_ source.squeak.org is accessing and sending back over, and over, and over again -- every time that MC progress box that says, "Updating [repository name]".

...
If the client does have the mcz, then we save the complete file transfer.

...
I don't know what the speed between alan <---> andreas is, but I doubt it's much slower than client <---> alan in most cases, so the savings would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and send that through a socket.

By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Yes, it still has to return back through alan but I assume alan does not wait for a "full download" received from andreas before its already pipeing back to the Squeak client. If true, then it seems like it only amounts to saving one hop, which would hardly be noticeable over what we have now.

...
Nginx does that thing magnitudes faster than Squeak.

The UX would not be magnitudes faster though, right?

...
...
...
...
...
That would also let us save bandwidth by not downloading files already sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe

Not at the same time, but it's possible, and it just happened recently with Chronology-ul.21. It is perfectly possible that a client has a version in its package cache with the same name as a different version on the server.

But we don't want to restrict what's possible in our software design because of that. That situation is already a headache anyway. Same name theoretically can come only from the same person (if we ensure unique initials) and so this is avoidable / fixable by resaving one of them as a different name...

...
...
we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds

I don't know. It wasn't me who designed it. :)

I meant ANY "cache".

https://en.wikipedia.org/wiki/Cache_(computing)

For Monticello, package-cache's other use-case is when an authentication issue occurs when trying to save to a HTTP repository. At that point the Version object with the new ancestry was already constructed in memory, so rather than worry about trying to "undo" all that, it was simpler and better to save it to a package-cache, persist it safely so the client can simply move forward from there (get access to the HTTP and copy it or whatever).

Chris

...
...
really bad we should fix that, not surrender to it.

Yes, that should be fixed, but it needs changes on the server side. What I always had in mind was to extend the repository listing with hashes/uuids so that the client could figure out if it needs to download a specific version. But care must be taken not to break the code for non-ss repositories (e.g. simple directory listings).

Levente

...

Chris

Tobias Pape

8:56 a.m.

...

On 28.01.2019, at 01:39, Chris Muller ma.chris.m@gmail.com wrote:

Levente, what about the trunk directory listing, can it cache that? That is the _#1 thing_ source.squeak.org is accessing and sending back over, and over, and over again -- every time that MC progress box that says, "Updating [repository name]".

I speeded that up in squeaksource3 by caching the listing on squeak and pushing that as txt. (see SqueakSource-Caching-Core-topa.2 for the model side and SqueakSource-Core-topa.104.mcz for the view part, especially SSRawDirectoryListing and

SSUrlFilter>>rawListingOfProject:

rawListingOfProject: projectName <get> <path: '/{projectName}/?format=raw'> <produces: 'text/plain'>

self projectNamed: projectName do: [ :project | (self isAllowed: SSAccessPolicy read in: project) ifFalse: [ self authResponseFor: project ] ifTrue: [ self requestContext respond: [:response | response nextPutAll: project rawDirectoryListing]]].

in http://www.squeaksource.com/squeaksource3.html )

TL;DR: serve the directory listing as plain list of file names when ?format=raw is requested. Only invalidate this listing when a version is added or removed.

-t

Chris Muller

29 Jan 29 Jan

1:06 a.m.

...

...
Levente, what about the trunk directory listing, can it cache that? That is the _#1 thing_ source.squeak.org is accessing and sending back over, and over, and over again -- every time that MC progress box that says, "Updating [repository name]".

I speeded that up in squeaksource3 by caching the listing on squeak and pushing that as txt.

Interesting idea, but if your found yourself needing to cache on that level, that would seem to be time to delegate that sort of work to nginx, wouldn't it?

- Chris

...

SSUrlFilter>>rawListingOfProject:

rawListingOfProject: projectName <get> <path: '/{projectName}/?format=raw'> <produces: 'text/plain'>
    self projectNamed: projectName do: [ :project |
            (self isAllowed: SSAccessPolicy read in: project)
                    ifFalse: [ self authResponseFor: project ]
                    ifTrue: [ self requestContext respond: [:response |
                            response nextPutAll: project rawDirectoryListing]]].
in http://www.squeaksource.com/squeaksource3.html )

TL;DR: serve the directory listing as plain list of file names when ?format=raw is requested. Only invalidate this listing when a version is added or removed.

-t

Tobias Pape

8:31 a.m.

...

On 29.01.2019, at 01:06, Chris Muller asqueaker@gmail.com wrote:

...
...
Levente, what about the trunk directory listing, can it cache that? That is the _#1 thing_ source.squeak.org is accessing and sending back over, and over, and over again -- every time that MC progress box that says, "Updating [repository name]".

I speeded that up in squeaksource3 by caching the listing on squeak and pushing that as txt.

Interesting idea, but if your found yourself needing to cache on that level, that would seem to be time to delegate that sort of work to nginx, wouldn't it?

Funny, I had that in 2010 with our local squeaksource when the Squeaksource based on the original sources died every day. We used an apache at that time to serve the mcz. It was fast. But it also sucked, as ACLs were hard to enforce and no manageability whatsoever.

That's why I got on board with SqueakSource3 in the first place.

-t

...

Chris

...
SSUrlFilter>>rawListingOfProject:

rawListingOfProject: projectName <get> <path: '/{projectName}/?format=raw'> <produces: 'text/plain'>
   self projectNamed: projectName do: [ :project |
           (self isAllowed: SSAccessPolicy read in: project)
                   ifFalse: [ self authResponseFor: project ]
                   ifTrue: [ self requestContext respond: [:response |
                           response nextPutAll: project rawDirectoryListing]]].
in http://www.squeaksource.com/squeaksource3.html )

TL;DR: serve the directory listing as plain list of file names when ?format=raw is requested. Only invalidate this listing when a version is added or removed.

-t

Tobias Pape

28 Jan 28 Jan

8:57 a.m.

...

On 28.01.2019, at 01:39, Chris Muller ma.chris.m@gmail.com wrote:

By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Yes, it still has to return back through alan but I assume alan does not wait for a "full download" received from andreas before its already pipeing back to the Squeak client. If true, then it seems like it only amounts to saving one hop, which would hardly be noticeable over what we have now.

No, the difference is actually orders of magnitude. Believe us. W.r.t. file handling nginx is blazing fast and squeak a snail. -t

Eliot Miranda

29 Jan 29 Jan

2:01 a.m.

...

On Jan 27, 2019, at 11:57 PM, Tobias Pape Das.Linux@gmx.de wrote:

...
On 28.01.2019, at 01:39, Chris Muller ma.chris.m@gmail.com wrote:

By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Yes, it still has to return back through alan but I assume alan does not wait for a "full download" received from andreas before its already pipeing back to the Squeak client. If true, then it seems like it only amounts to saving one hop, which would hardly be noticeable over what we have now.

No, the difference is actually orders of magnitude. Believe us. W.r.t. file handling nginx is blazing fast and squeak a snail.

And this is an issue in commercial engagements where we might be trying to displace VW. The file implementation sucks. We need buffered i/o by default and we need good finalization (via ephemerons). And right now there’s a thread on Pharo about de eye slowdowns from 6 to 7 in writing to the changes file because of setToEnd and maybe reopen. In any case the issue is that our time system is cheap and cheerful (= kind of crappy) and being conscious of performance here is important and potentially healthy for the community.

If you want to have impact here is a place to put effort.

...

-t

Tobias Pape

8:28 a.m.

Hi,

...

On 29.01.2019, at 02:01, Eliot Miranda eliot.miranda@gmail.com wrote:

...
On Jan 27, 2019, at 11:57 PM, Tobias Pape Das.Linux@gmx.de wrote:

...
On 28.01.2019, at 01:39, Chris Muller ma.chris.m@gmail.com wrote:

By "the image" I assume you mean the SqueakSource server image. But opening the file takes very little time. Original web-sites were .html files, remember how fast those were? Plus, filesystems "cache" file contents into their own internal caches anyway...

Yes, it still has to return back through alan but I assume alan does not wait for a "full download" received from andreas before its already pipeing back to the Squeak client. If true, then it seems like it only amounts to saving one hop, which would hardly be noticeable over what we have now.

No, the difference is actually orders of magnitude. Believe us. W.r.t. file handling nginx is blazing fast and squeak a snail.

And this is an issue in commercial engagements where we might be trying to displace VW. The file implementation sucks. We need buffered i/o by default and we need good finalization (via ephemerons). And right now there’s a thread on Pharo about de eye slowdowns from 6 to 7 in writing to the changes file because of setToEnd and maybe reopen. In any case the issue is that our time system is cheap and cheerful (= kind of crappy) and being conscious of performance here is important and potentially healthy for the community.

If you want to have impact here is a place to put effort.

Thanks for that insight! :) -t

Tobias Pape

28 Jan 28 Jan

8:38 a.m.

...

On 27.01.2019, at 23:18, Chris Muller ma.chris.m@gmail.com wrote:

Hi Levente,

...
...
Yes, the SqueakMap server image is one part of the dynamic, but I think another is a bug in the trunk image. I think the reason Tim is not seeing 45 seconds before error is because the timeout setting of the high-up client is not being passed all the way down to the lowest-level layers -- e.g., from HTTPSocket --> WebClient --> SocketStream --> Socket. By the time it gets down to Socket which does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

Well, it depends on if, for example, you're in the middle of Antarctica with a slow internet connection in an office with a fast connection. A 30 second timeout is just the maximum amount of time the client will wait for the entire process before presenting a debugger, that's all it can do.

...
...
It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model after an update (e.g., adding a Release, etc.). It's so long because the server is running on a very old 3.x image, interpreter VM. It's running a HttpView2 app which doesn't even compile in modern Squeak. That's why it hasn't been brought forward yet, but I am working on a new API service to replace it with the eventual goal of SqueakMap being an "App Store" experience, and it will not suffer timeouts.

...
...
but also:

we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache

we could make alan not even ask ted when we know the answer already.

Attention: we need a lot of information on what is stable and what not to do this.

(its tempting to try, tho)

(we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)

If squeaksource/mc used ETags, then the squeaksource image could simply return 304 and let nginx serve the cached mczs while keeping the statistics updated.

Tim's email was about SqueakMap, not SqueakSource. SqueakSource

That part of the thread changed direction. It happens sometimes.

...
serves the mcz's straight off the hard-drive platter. We don't need to trade away download statistics to save a few ms on a mcz request.

Download statistics would stay the same despite being flawed (e.g. you'll download everything multiple times even if those files are sitting in your package cache).

Not if we fix the package-cache (more about this, below).

...
You would save seconds, not milliseconds by not downloading files again.

IIUC, you're saying we would save one hope in the "download" -- instead of client <--> alan <--> andreas, it would just be client <--> alan. Is that right?

Yes.

...

I don't know what the speed between alan <---> andreas is, but I doubt it's much slower than client <---> alan in most cases, so the savings would seem to be minimal..?

No. It is not about bandwidth. Nginx is much faster in serving files than (a) squeak/seaside/squeaksource is and (b) there is no network/bookkeeping/requesthandling involved when nginx just serves files. and even if there is (eg, w/ x-accel-redirect), nginx ist just plain faster.

...

...
...
...
That would also let us save bandwidth by not downloading files already sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

No. No two MCZ's may have the same name, certainly not withiin the same repository, because MCRepository cannot support that. So maybe we need project subdirectories under package-cache to properly simulate each cached Repository. I had no idea we were neutering 90% of the benefits of our package-cache because of this too, and just sitting here, I can't help wonder whether this is why MCProxy doesn't work properly either!

That would be only true if we never would rewrite history or move packges from inbox anywhere else…

...

The primary purpose of a cache is to *check it first* to speed up access to something, right? What you say about package-cache sounds really bad we should fix that, not surrender to it.

Chris

Tobias Pape

8:38 a.m.

...

On 27.01.2019, at 21:48, Levente Uzonyi leves@caesar.elte.hu wrote:

On Sun, 27 Jan 2019, Chris Muller wrote:

...
Hi guys,

...
...
...
A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.

I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day. Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ? Am I right in imagining that we can't normally affect that timeout?

Well, we can.

What happens here:

All our websites, including all HTTP services, such as the Map, arrive together at squeak.org, aka alan.box.squeak.org

That is an nginx server. And also the server who eventually spits out the 504.

alan then sees we want a connection to the Map, and does a HTTP request to ted.box.squeak.org (=> alan is a _reverse proxy_)

and upon response gets us that back.

Thanks for the great explanation! I want to learn more about admin'ing, so its great to have this in-context example of a reverse-proxy, thanks for setting that up!

...
...

if ted fails to respond in 60s, alan gives a 504.

60s seems like a ideally balanced timeout setting -- the longest any possible request should be expected to wait ... and yet clients can still shorten to 45s or 30 if they want a shorter timeout.

...
...
Simple as that. This limits the possibility that we wait too long (ie >60s) on ted.

Elephant in the room: why not directly ted? the nginx on alan is configured as hardened as I thought best, and actually can handle a multitude of requests much better than our squeak-based "application servers". This distinction between reverse proxy and application server is btw quite standard and enables some things. For example:

We can tune a lot of things on alan with regards to how it should handle things. The simplest being:

we can tune the timeout: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout

that's where the 60s come from, and we could simply crank it up.

HOWEVER: this could mean we eventually run into other timeouts, for example on the server or even in TCP or so.

so increasing this just like that _may_ help or _may_ make the Map useless altogether, so please be careful y'all :)

Tim reported shorter than 45s timeouts, so it is very likely an issue with the SqueakMap image.

Yes, the SqueakMap server image is one part of the dynamic, but I think another is a bug in the trunk image. I think the reason Tim is not seeing 45 seconds before error is because the timeout setting of the high-up client is not being passed all the way down to the lowest-level layers -- e.g., from HTTPSocket --> WebClient --> SocketStream --> Socket. By the time it gets down to Socket which does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably long.

...
It is a fixed amount of time, I *think* still between 30 and 45 seconds, that it takes the SqueakMap server to save its model after an update (e.g., adding a Release, etc.). It's so long because the server is running on a very old 3.x image, interpreter VM. It's running a HttpView2 app which doesn't even compile in modern Squeak. That's why it hasn't been brought forward yet, but I am working on a new API service to replace it with the eventual goal of SqueakMap being an "App Store" experience, and it will not suffer timeouts.

...
...
but also:

we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache

we could make alan not even ask ted when we know the answer already.

Attention: we need a lot of information on what is stable and what not to do this.

(its tempting to try, tho)

(we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)

If squeaksource/mc used ETags, then the squeaksource image could simply return 304 and let nginx serve the cached mczs while keeping the statistics updated.

Tim's email was about SqueakMap, not SqueakSource. SqueakSource

That part of the thread changed direction. It happens sometimes.

...
serves the mcz's straight off the hard-drive platter. We don't need to trade away download statistics to save a few ms on a mcz request.

Download statistics would stay the same despite being flawed (e.g. you'll download everything multiple times even if those files are sitting in your package cache). You would save seconds, not milliseconds by not downloading files again.

I think we trivially could make that happen by using X-Sendfile (apapche) or X-Accel-Redirect (nginx). (https://www.nginx.com/resources/wiki/start/topics/examples/x-accel/)

The image gets the request but instead of searchign and serving the file, it answers with such a header and the reverse-proxy takes care of the rest. Problem here: reverse-proxy must have access to the files, which it currently has not.

...

...
...
That would also let us save bandwidth by not downloading files already sitting in the client's package cache.

How so? Isn't the package-cache checked before hitting the server at all? It certainly should be.

No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.

Levente

...
Best, Chris

...
We could also use nginx to serve files instead of the image, but then the image would have to know that it's sitting behind nginx.

...

Note: a lot of time is probably spend by ted generating HTTP and by alan parsing HTTP. Using Fcgi, for example, reduces that, and is supported by both nginx (https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html) and GemStone, but I don't know whether we already have one in squeak.

I'm 99% sure http overhead is negligible.

Levente

...
...
If I have any reasonable grasp on this then we should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.

All 500-ish codes essentially say "the server is to blame" and the client can do noghitn about that. I don't think that 504 is meaningfully better handled than 503 or 502 in the WebClient. It think it's ok to pass that through.

...
Except of course a 418 which has well defined error handling...

At least not 451…

Best regards -Tobias

...
tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim You forgot to do your backup 16 days ago. Tomorrow you'll need that version.

1935

Age (days ago)

1938

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

42 comments

8 participants

tags (0)

participants (8)

Chris Muller
Chris Muller
David T. Lewis
Eliot Miranda
John Pfersich
Levente Uzonyi
tim Rowledge
Tobias Pape