Hi, there.
Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.
Best, Marcel
On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote:
Hi, there.
Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.
I think that the problem goes back longer than that, although it does seem to be getting worse in recent months.
My guess (and it is only a guess) is that there are two possible causes:
1) If I recall right, the VM that is installed with source.squeak.org (which is quite old now) came from a time at which there were problems with the garbage collector that led to noticeable delays. It is possible that updating the VM to a more recent version would make this go away.
2) The image is backed by Magma, and it is possible that something there is eating time when an update is made to a repository.
Dave
Hi,
I was able to VNC right into the server image. It is responsive, however, there are a ton of processes apparently stuck on a Mutex>>#critical: block. I think that explains the timeouts.
The service was last restarted 204 days ago. I'll contact box-admins and board about restarting the service, that should clear it up.
- Chris
On Thu, Jan 18, 2018 at 6:12 AM, David T. Lewis lewis@mail.msen.com wrote:
On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote:
Hi, there.
Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.
I think that the problem goes back longer than that, although it does seem to be getting worse in recent months.
My guess (and it is only a guess) is that there are two possible causes:
- If I recall right, the VM that is installed with source.squeak.org (which
is quite old now) came from a time at which there were problems with the garbage collector that led to noticeable delays. It is possible that updating the VM to a more recent version would make this go away.
- The image is backed by Magma, and it is possible that something there is
eating time when an update is made to a repository.
Dave
On Thu, Jan 18, 2018 at 5:42 PM, Chris Muller asqueaker@gmail.com wrote:
Hi,
I was able to VNC right into the server image. It is responsive, however, there are a ton of processes apparently stuck on a Mutex>>#critical: block. I think that explains the timeouts.
The service was last restarted 204 days ago. I'll contact box-admins and board about restarting the service, that should clear it up.
and for my information what version of Squeak and what VM is it running?
- Chris
On Thu, Jan 18, 2018 at 6:12 AM, David T. Lewis lewis@mail.msen.com wrote:
On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote:
Hi, there.
Since several weeks/months now, I cannot update a single package
without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
What's going on there?! That used to work fine. Timeouts were rare.
Gateway errors non-existent.
I think that the problem goes back longer than that, although it does
seem
to be getting worse in recent months.
My guess (and it is only a guess) is that there are two possible causes:
- If I recall right, the VM that is installed with source.squeak.org
(which
is quite old now) came from a time at which there were problems with the
garbage
collector that led to noticeable delays. It is possible that updating
the VM
to a more recent version would make this go away.
- The image is backed by Magma, and it is possible that something there
is
eating time when an update is made to a repository.
Dave
Clearly I'm well out of the loop at this point so I'm likely wrong. But, the way any Squeak hosted service that was setup and managed by the Box-Admins team in the past it will automatically restart if it quits (using daemontools).
On Thu, Jan 18, 2018 at 8:17 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
On Thu, Jan 18, 2018 at 5:42 PM, Chris Muller asqueaker@gmail.com wrote:
Hi,
I was able to VNC right into the server image. It is responsive, however, there are a ton of processes apparently stuck on a Mutex>>#critical: block. I think that explains the timeouts.
The service was last restarted 204 days ago. I'll contact box-admins and board about restarting the service, that should clear it up.
and for my information what version of Squeak and what VM is it running?
- Chris
On Thu, Jan 18, 2018 at 6:12 AM, David T. Lewis lewis@mail.msen.com wrote:
On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote:
Hi, there.
Since several weeks/months now, I cannot update a single package
without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
What's going on there?! That used to work fine. Timeouts were rare.
Gateway errors non-existent.
I think that the problem goes back longer than that, although it does
seem
to be getting worse in recent months.
My guess (and it is only a guess) is that there are two possible causes:
- If I recall right, the VM that is installed with source.squeak.org
(which
is quite old now) came from a time at which there were problems with
the garbage
collector that led to noticeable delays. It is possible that updating
the VM
to a more recent version would make this go away.
- The image is backed by Magma, and it is possible that something
there is
eating time when an update is made to a repository.
Dave
-- _,,,^..^,,,_ best, Eliot
Hi Ken,
On Thu, Jan 18, 2018 at 08:32:54PM -0600, Ken Causey wrote:
Clearly I'm well out of the loop at this point so I'm likely wrong. But, the way any Squeak hosted service that was setup and managed by the Box-Admins team in the past it will automatically restart if it quits (using daemontools).
Yes the daemontools setup is still in effect and works a champ, thank you :-)
I think Chris is just being cautious in asking, since this is our main source repository server.
Dave
On Thu, Jan 18, 2018 at 8:17 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
On Thu, Jan 18, 2018 at 5:42 PM, Chris Muller asqueaker@gmail.com wrote:
Hi,
I was able to VNC right into the server image. It is responsive, however, there are a ton of processes apparently stuck on a Mutex>>#critical: block. I think that explains the timeouts.
The service was last restarted 204 days ago. I'll contact box-admins and board about restarting the service, that should clear it up.
and for my information what version of Squeak and what VM is it running?
- Chris
On Thu, Jan 18, 2018 at 6:12 AM, David T. Lewis lewis@mail.msen.com wrote:
On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote:
Hi, there.
Since several weeks/months now, I cannot update a single package
without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
What's going on there?! That used to work fine. Timeouts were rare.
Gateway errors non-existent.
I think that the problem goes back longer than that, although it does
seem
to be getting worse in recent months.
My guess (and it is only a guess) is that there are two possible causes:
- If I recall right, the VM that is installed with source.squeak.org
(which
is quite old now) came from a time at which there were problems with
the garbage
collector that led to noticeable delays. It is possible that updating
the VM
to a more recent version would make this go away.
- The image is backed by Magma, and it is possible that something
there is
eating time when an update is made to a repository.
Dave
-- _,,,^..^,,,_ best, Eliot
I was able to VNC right into the server image. It is responsive, however, there are a ton of processes apparently stuck on a Mutex>>#critical: block. I think that explains the timeouts.
The service was last restarted 204 days ago. I'll contact box-admins and board about restarting the service, that should clear it up.
and for my information what version of Squeak and what VM is it running?
The production VM released with Squeak 5.1.
5.0-201608171728 Sun Sep 25 16:02:24 UTC 2016 gcc 4.6.3 [Production Spur VM]
It's been a few months since I tried the most recent VM. All the newer ones I'd ever tried since the GC rewrite would crash more often than I could bear.
I run this same code base and VM to support my own code repository as a local daemontools service. It doesn't have the volume source.squeak.org has, but it has been stable for me.
On Thu, Jan 18, 2018 at 09:09:00PM -0600, Chris Muller wrote:
I was able to VNC right into the server image. It is responsive, however, there are a ton of processes apparently stuck on a Mutex>>#critical: block. I think that explains the timeouts.
The service was last restarted 204 days ago. I'll contact box-admins and board about restarting the service, that should clear it up.
and for my information what version of Squeak and what VM is it running?
The production VM released with Squeak 5.1.
5.0-201608171728 Sun Sep 25 16:02:24 UTC 2016 gcc 4.6.3
[Production Spur VM]
It's been a few months since I tried the most recent VM. All the newer ones I'd ever tried since the GC rewrite would crash more often than I could bear.
I run this same code base and VM to support my own code repository as a local daemontools service. It doesn't have the volume source.squeak.org has, but it has been stable for me.
I think that my mention of garbage collection as a possible cause is a red herring. Likewise my mention of Magma backing store. Those were just the only two things I could think of that were obviously different from the other squeaksource image that we are running.
In any case, 204 days of continuous service without a restart is nothing to be unhappy about :-)
Dave
:) Your memory was keener than mine, actually. As I tail'd the log when it came back up, I saw the message "Starting Garbage Collection", and it reminded me about this issue from a couple of years back.. A strange phenomena with this application (SqueakSource+Magma) and VM that, upon completion of the initial loading of the root SSRepository object, at some later time whenever the first garbage collection after that would take like 2 minutes. But, after that, it was pretty much fine, pretty snappy.
So, rather than the enduring that pain at a random time, I decided that at a known time was better. On startup.
On Thu, Jan 18, 2018 at 9:36 PM, David T. Lewis lewis@mail.msen.com wrote:
On Thu, Jan 18, 2018 at 09:09:00PM -0600, Chris Muller wrote:
I was able to VNC right into the server image. It is responsive, however, there are a ton of processes apparently stuck on a Mutex>>#critical: block. I think that explains the timeouts.
The service was last restarted 204 days ago. I'll contact box-admins and board about restarting the service, that should clear it up.
and for my information what version of Squeak and what VM is it running?
The production VM released with Squeak 5.1.
5.0-201608171728 Sun Sep 25 16:02:24 UTC 2016 gcc 4.6.3
[Production Spur VM]
It's been a few months since I tried the most recent VM. All the newer ones I'd ever tried since the GC rewrite would crash more often than I could bear.
I run this same code base and VM to support my own code repository as a local daemontools service. It doesn't have the volume source.squeak.org has, but it has been stable for me.
I think that my mention of garbage collection as a possible cause is a red herring. Likewise my mention of Magma backing store. Those were just the only two things I could think of that were obviously different from the other squeaksource image that we are running.
In any case, 204 days of continuous service without a restart is nothing to be unhappy about :-)
Dave
It's restarted. You should be able to use it normally, however, the last commit Magma got was on 6-Jan-2018 (every one since then got stuck on the Mutex), so every commit since then will be recovered (its revision history indexed into the Magma DB) in the background, so you may experience some sluggishness for the next few days.
Thanks for your patience, sorry for any inconvenience.
- Chris
On Thu, Jan 18, 2018 at 1:10 AM, Marcel Taeumel marcel.taeumel@hpi.de wrote:
Hi, there.
Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.
Best, Marcel
Hi All,
On Wed, Jan 17, 2018 at 11:10 PM, Marcel Taeumel marcel.taeumel@hpi.de wrote:
Hi, there.
Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.
I think the main problem is that the server is unresponsive while it generates the diff email to send to the mailing lists. I say this because committing VMMaker.oscog, a huge package, always times out, and the server can be unresponsive thereafter for many minutes, whereas committing the Cog package to the very same repository, which is far smaller, does not cause a timeout. f course it could be storing the package to the file system, but I doubt that very much.
So I think we need to rewrite the server to move the computation of and mailing of the diff to a lower priority, so that answering and receiving versions gets priority over reporting changes to the mailing list. Ion the case of VMMaker.oscog the diff often gets thrown away anyway because it is often very large.
I'm not familiar with the packages that implement the server, nor what the development, testing and installation process is, but I'd love to pair with someone on fixing the responsiveness issue and learn.
Best,
Marcel
_,,,^..^,,,_ best, Eliot
On Tue, Jan 23, 2018 at 01:26:37PM -0800, Eliot Miranda wrote:
Hi All,
On Wed, Jan 17, 2018 at 11:10 PM, Marcel Taeumel marcel.taeumel@hpi.de wrote:
Hi, there.
Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.
I think the main problem is that the server is unresponsive while it generates the diff email to send to the mailing lists. I say this because committing VMMaker.oscog, a huge package, always times out, and the server can be unresponsive thereafter for many minutes, whereas committing the Cog package to the very same repository, which is far smaller, does not cause a timeout. f course it could be storing the package to the file system, but I doubt that very much.
So I think we need to rewrite the server to move the computation of and mailing of the diff to a lower priority, so that answering and receiving versions gets priority over reporting changes to the mailing list. Ion the case of VMMaker.oscog the diff often gets thrown away anyway because it is often very large.
I'm not familiar with the packages that implement the server, nor what the development, testing and installation process is, but I'd love to pair with someone on fixing the responsiveness issue and learn.
Chris, are you interested in working with Eliot on this? I don't think I can help directly but I do have some experience with the older squeaksource.com system, and I'm interested in getting that updated at some point so if I can offer some help without getting in the way I am happy to do so.
Eliot, I suspect that Chris cleared up one problem when he recently restarted the image, but that the diff processing that you mention is /also/ a problem and is worth follow up separately. The reason I say this is that I was getting commit timeouts on even trivial updates, and that problem went away after the server restart. But if commit timeouts still happen for a VMMaker commit, then it is very likely due to the diff processing.
If in fact the diff processing for mailing list updates is the culprit, and if this is something that could be relegated to a background process completely separate from the user interactions, then I would be tempted to try putting the mailing list processing into a #forkHeadlessSqueakAndDoThenQuit: block. Any interest?
Dave
Hi David,
On Tue, Jan 23, 2018 at 5:08 PM, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jan 23, 2018 at 01:26:37PM -0800, Eliot Miranda wrote:
Hi All,
On Wed, Jan 17, 2018 at 11:10 PM, Marcel Taeumel marcel.taeumel@hpi.de wrote:
Hi, there.
Since several weeks/months now, I cannot update a single package
without
either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.
I think the main problem is that the server is unresponsive while it generates the diff email to send to the mailing lists. I say this
because
committing VMMaker.oscog, a huge package, always times out, and the
server
can be unresponsive thereafter for many minutes, whereas committing the
Cog
package to the very same repository, which is far smaller, does not
cause a
timeout. f course it could be storing the package to the file system,
but
I doubt that very much.
So I think we need to rewrite the server to move the computation of and mailing of the diff to a lower priority, so that answering and receiving versions gets priority over reporting changes to the mailing list. Ion
the
case of VMMaker.oscog the diff often gets thrown away anyway because it
is
often very large.
I'm not familiar with the packages that implement the server, nor what
the
development, testing and installation process is, but I'd love to pair
with
someone on fixing the responsiveness issue and learn.
Chris, are you interested in working with Eliot on this? I don't think I can help directly but I do have some experience with the older squeaksource.com system, and I'm interested in getting that updated at some point so if I can offer some help without getting in the way I am happy to do so.
Eliot, I suspect that Chris cleared up one problem when he recently restarted the image, but that the diff processing that you mention is /also/ a problem and is worth follow up separately. The reason I say this is that I was getting commit timeouts on even trivial updates, and that problem went away after the server restart. But if commit timeouts still happen for a VMMaker commit, then it is very likely due to the diff processing.
If in fact the diff processing for mailing list updates is the culprit, and if this is something that could be relegated to a background process completely separate from the user interactions, then I would be tempted to try putting the mailing list processing into a #forkHeadlessSqueakAndDoThenQuit: block. Any interest?
It's certainly worth looking at. And that suggests that there could be two separate images running concurrently, one doing the serving, and one doing the diffs, possibly prompted by the server image.
Dave
_,,,^..^,,,_ best, Eliot
On Tue, Jan 23, 2018 at 05:36:00PM -0800, Eliot Miranda wrote:
Hi David,
On Tue, Jan 23, 2018 at 5:08 PM, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jan 23, 2018 at 01:26:37PM -0800, Eliot Miranda wrote:
If in fact the diff processing for mailing list updates is the culprit, and if this is something that could be relegated to a background process completely separate from the user interactions, then I would be tempted to try putting the mailing list processing into a #forkHeadlessSqueakAndDoThenQuit: block. Any interest?
It's certainly worth looking at. And that suggests that there could be two separate images running concurrently, one doing the serving, and one doing the diffs, possibly prompted by the server image.
At the risk of embarassing myself by posting untested code that probably will not work, the attached change set shows what I had in mind.
Dave
I'm not familiar with the packages that implement the server, nor what the development, testing and installation process is, but I'd love to pair with someone on fixing the responsiveness issue and learn.
It's all here:
http://wiki.squeak.org/squeak/6365
This is what source.squeak.org is running. It installs and runs clean (in Linux). It never saves the running image.
Every serious Squeak developer should do it on their laptop, so they can have the revision history for their own proprietary code, not just the source.squeak.org repositories.
Anyone wanting to learn about and work on our code repository should do it on their laptops, as it's a great place to test fixes and upgrades before putting them into production source.squeak.org server. Once you do the installation step, my guess is you'll be able to find the diff'ing in the code in a short amount of time. But it needs to be tested.
Chris, are you interested in working with Eliot on this?
Yes, but I am leaving in less than 5 hours to depart for a month long holiday, and I still need to sleep. I just finished all day packing sat down for a brief unwind relax and saw "URGENT".. :)
I went through a lot of work to make the above process lucid and smooth. It's time to cash in. :) If you try it, you will go from 5% to 95% knowledge about it in one evening.
I plan to check on-line things in the evenings during my holiday, I can assist limited.
- Chris
I don't think I can help directly but I do have some experience with the older squeaksource.com system, and I'm interested in getting that updated at some point so if I can offer some help without getting in the way I am happy to do so.
Eliot, I suspect that Chris cleared up one problem when he recently restarted the image, but that the diff processing that you mention is /also/ a problem and is worth follow up separately. The reason I say this is that I was getting commit timeouts on even trivial updates, and that problem went away after the server restart. But if commit timeouts still happen for a VMMaker commit, then it is very likely due to the diff processing.
If in fact the diff processing for mailing list updates is the culprit, and if this is something that could be relegated to a background process completely separate from the user interactions, then I would be tempted to try putting the mailing list processing into a #forkHeadlessSqueakAndDoThenQuit: block. Any interest?
Dave
squeak-dev@lists.squeakfoundation.org