[Seaside] my site is completely dead..

Wed Jan 30 12:26:14 UTC 2013

Hi Sven.

Thanks you for the clarifications and recommendations, really useful. I
implement in my servers some things as you enumerate for minimize problems,
but I put especially atention in save the image in "good state" every time
that I update some package in it, because otherwise it does not matter how
many bad images put in a cluster :)

Regards.

2013/1/30 Sven Van Caekenberghe <sven at stfx.eu>

>
> On 29 Jan 2013, at 19:29, Gastón Dall' Oglio <gaston.dalloglio at gmail.com>
> wrote:
>
> > Hi Sergio.
> >
> > Some weeks ago I had deal with an image that works normally, whereas an
> Seaside app within it was not responding (until that time when an app was
> not responding always was because the image was hung).
> >
> > I dis some forense analisis in this image :), and I saw several zombies
> forked process, in really with an very long timeout in semaphore. See
> screenshot, in left side I inspected the Semaphore to see DelayWaitTimeout
> at one sane forked process, while the right same for one broken. Plus, note
> that these broken process don't die if you stop and start the Seaside
> server adaptor.
> >
> > To see that in your image, open an Process Browser (and turn on
> auto-update) and see if there are several process
> "ZnManagingMultiThreadedServer HTTP worker", if so, then terminate some of
> them and see if site begin to respond. My app began to respond after
> termine several of them.
> >
> > I guess that this problem occurred when I save AND QUIT the image
> whereas exist those forked processes.
> >
> > The solution to that the image begin to respond again was kill
> (terminate) manually all processes "ZnManagingMultiThreadedServer HTTP
> worker", and in the future be aware that there isn't workers running when I
> save the imagen.
> >
> > I don't know if it is a bug, if we think that yes I can give more data
> about context (my image, package versions, SO, …).
>
> Some clarifications: ZnManagingMultiThreadedServer has one server process
> listening for and accepting incoming requests, forking a worker process
> each time. Such a worker process will loop over HTTP 1.1 request/response
> cycles until the other end closes or something goes wrong. There is
> currently no timeout as such but of course the socket connection dies
> eventually, so that is almost the same thing.
>
> The 'Managing' aspect means that the server keeps track of all open
> connections or socket streams. When the server is stopped, all the
> connections will be closed. The idea is that all the worker processes using
> these connections (a one to one mapping) will eventually get an exception
> that is then handled by cleaning up and finally stopping.
>
> This last mechanism, the closing of a socket stream from another process
> resulting in an exception in a process using that connection does not work
> identically or equally well on the different platforms (Mac, Windows,
> Linux) because these have completely different socket implementations in
> the VM. Saving an image interacts with this is various subtle ways.
>
> On my main development platform, Mac, I see no problems. In my production
> deploys on Linux things are fine too. But I do various things to minimise
> problems:
>
> - my images hold no 'running' server(s), these are always created and
> started freshly using a startup script
> - I never save images after that
> - all the images are controlled by init.d scripts to start automatically
> with the machine
> - all my images are controlled by monit so that they restart automatically
> when they stop working
> - most of the time, I have multiple images under a load balancer,
> statefull or stateless, to improve availability and capacity
> - the load balancer also functions as a sanitizer and controller of
> incoming requests protecting the images
> - the load balancer can handle static resources directly, off loading work
> from the images
>
> http://zn.stfx.eu/zn/index.html#livedemo
> http://stfx.eu/pharo-server/
>
> Yes, like any computer program, a Smalltalk vm+image combination has
> limits: there is some maximum number of processes and connections that can
> be running and open at the same time and there are general memory limits. I
> am pretty sure that with a setup like the one I described above production
> systems handling hundreds to thousands requests per second are possible.
>
> Sven
>
> > 2013/1/29 sergio_101 <sergio.rrd at gmail.com>
> > i think i need to bring the image local, and see what's going on.. i am
> moving it to a new server this week anyway..
> >
> > thanks!
> >
> >
> > On Tue, Jan 29, 2013 at 8:18 AM, sergio_101 <sergio.rrd at gmail.com>
> wrote:
> > hey, dale.. it seems like lately, i am seeing this problem at least once
> a week. there were times when i would run problem free for months, but not
> lately..
> >
> >
> > On Tue, Jan 29, 2013 at 1:24 AM, Dale Henrichs <dhenrich at vmware.com>
> wrote:
> > Sergio,
> >
> > Most of my experience is from working with GemStone, which is different
> animal, so take what I say with a grain of salt.
> >
> > If the running image is completely frozen, then you don't have much
> choice but to kill it and restart ... hopefully you haven't lost any data
> ...
> >
> > If after restart you see the problem again, then you might be able to
> debug the issue by copying the image to a local machine and bringing it up
> ... If the problem doesn't reproduce, I'd still be inclined to take a copy
> of the image and attempt to understand the particular problem.
> >
> > It's hard to tell from the screen shot what the thread is doing or even
> which thread it is ... it's not likely that the thread is a seaside
> application thread because those are normally forked and will sit around
> with an open debugger, but not necessarily affect the image itself. So I
> can't really guess what operation is causing trouble ...
> >
> > If you're lucky you can reproduce the problem on your local machine ...
> If you search the pharo bug list you might find a bug in this area and from
> that we might be able to figure out which thread is the bad boy and there
> might even be a fix ..
> >
> > You mentioned stability...are you seeing this particular problem occur
> often or are you seeing different issues?
> >
> > Dale
> >
> > ----- Original Message -----
> > | From: "sergio t. ruiz" <sergio.rrd at gmail.com>
> > | To: "discussion" <seaside at lists.squeakfoundation.org>
> > | Sent: Monday, January 28, 2013 9:59:11 PM
> > | Subject: [Seaside] my site is completely dead..
> > |
> > |
> > | my site completely died today. i tried logging in with vnc, and it
> > | seems just stuck.. i can't do anything to it..
> > |
> > | anyone have any ideas?  i really need this thing to run consistently
> > | ..
> > |
> > | here is a screenshot of its current state:
> > |
> > | http://db.tt/eVxJX6lr
> > |
> > | thanks!
> > |
> > |
> > | ----
> > | peace,
> > | sergio
> > | photographer, journalist, visionary
> > |
> > | http://www.ThoseOptimizeGuys.com
> > | http://www.CodingForHire.com
> > | http://www.coffee-black.com
> > | http://www.painlessfrugality.com
> > | http://www.twitter.com/sergio_101
> > | http://www.facebook.com/sergio101
> > |
> > |
> > |
> > | _______________________________________________
> > | seaside mailing list
> > | seaside at lists.squeakfoundation.org
> > | http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
> > |
> > _______________________________________________
> > seaside mailing list
> > seaside at lists.squeakfoundation.org
> > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
> >
> >
> >
> > --
> > ----
> > peace,
> > sergio
> > photographer, journalist, visionary
> >
> > http://www.ThoseOptimizeGuys.com
> > http://www.CodingForHire.com
> > http://www.coffee-black.com
> > http://www.painlessfrugality.com
> > http://www.twitter.com/sergio_101
> > http://www.facebook.com/sergio101
> >
> >
> >
> > --
> > ----
> > peace,
> > sergio
> > photographer, journalist, visionary
> >
> > http://www.ThoseOptimizeGuys.com
> > http://www.CodingForHire.com
> > http://www.coffee-black.com
> > http://www.painlessfrugality.com
> > http://www.twitter.com/sergio_101
> > http://www.facebook.com/sergio101
> >
> > _______________________________________________
> > seaside mailing list
> > seaside at lists.squeakfoundation.org
> > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
> >
> >
> > <image1.png>_______________________________________________
> > seaside mailing list
> > seaside at lists.squeakfoundation.org
> > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
> --
> Sven Van Caekenberghe
> http://stfx.eu
> Smalltalk is the Red Pill
>
> _______________________________________________
> seaside mailing list
> seaside at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/seaside/attachments/20130130/0a287ba6/attachment.htm