On Apr 9, 2005 10:20 PM, Nevin Pratt nevin@bountifulbaby.com wrote:
I'm not inclined to re-launch my biggest images, because they are on the production machine right now. And, I'm not too hip on trying to download a 1 gig image down to my dev machine. But, I *did* look at the number of sessions that were around, and if I remember correctly, it was about 1500 of them.
Ok, 1500 concurrent sessions is a very big number; that's pretty consistent with your image sizes, so I agree it probably doesn't have anything to do with GLORP etc. But even with a 100 minute time out that sounds awfully high; do you really think you get, say, 10000 visitors per day? Or is something else going on? Either the peaks are very, very heavy, or there's something wrong with the expiry.
One instrumentation that would help here would be to simply log every time a session is created. That should barely affect performance and would be a good indication of what's going on.
I might have gotten hit with a DOS attack or something on Thursday and Friday, I don't know. The traffic numbers certainly would seem to support that possibility.
Yes. I wonder what strategies we can use to detect and cope with that. One I can think of is to link the expiry time to how much the application has been used: if all you do is request the homepage, your session will expire very quickly, but if you look around a little more you're given more time. That seem reasonable?
Avi
On Sat, 9 Apr 2005 23:11:59 +0200, Avi Bryant avi.bryant@gmail.com wrote:
Ok, 1500 concurrent sessions is a very big number; that's pretty consistent with your image sizes, so I agree it probably doesn't have anything to do with GLORP etc. But even with a 100 minute time out that sounds awfully high; do you really think you get, say, 10000 visitors per day? Or is something else going on? Either the peaks are very, very heavy, or there's something wrong with the expiry.
Sounds a lot like a broken spider that cannot deal with Seaside's URLs.... I'd do some request logging, especially of User Agent.
I had the MSN bot go wild on dynamic sites of me, much the same symptoms (different world, this was a VW-backed site with a homebrew toolkit, but bad bots will killl anything). I blocked the MSN range (who wants to be listed in MSN anyway... :P)
We just run into strange high session counts recently. With each request the number of sessions magically increased by one... The problem, as we figured out is the favicon.ico that the browser requests. Our Apache rewrite rule forwarded that request to Seaside which created a new session! So, I don't know if that has anything to do with your high number of sessions but its something to take care of in any case...
Adrian
On Apr 11, 2005, at 9:46 PM, Cees de Groot wrote:
On Sat, 9 Apr 2005 23:11:59 +0200, Avi Bryant avi.bryant@gmail.com wrote:
Ok, 1500 concurrent sessions is a very big number; that's pretty consistent with your image sizes, so I agree it probably doesn't have anything to do with GLORP etc. But even with a 100 minute time out that sounds awfully high; do you really think you get, say, 10000 visitors per day? Or is something else going on? Either the peaks are very, very heavy, or there's something wrong with the expiry.
Sounds a lot like a broken spider that cannot deal with Seaside's URLs.... I'd do some request logging, especially of User Agent.
I had the MSN bot go wild on dynamic sites of me, much the same symptoms (different world, this was a VW-backed site with a homebrew toolkit, but bad bots will killl anything). I blocked the MSN range (who wants to be listed in MSN anyway... :P) _______________________________________________ Seaside mailing list Seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/listinfo/seaside
___________________ Adrian Lienhard www.adrian-lienhard.ch www.netstyle.ch
Just a me-too here to confirm that this can be a catch when using seaside, at least v2.3. I've recently seen the same two problems:
i. spiders hitting seaside every few seconds, apparently due to the dynamically generated unique urls, and bringing the session count to thousands (in my case the spider was jeeves/teoma)
ii. WADocumentHandler never expiring (isActive always returns true) and thus never getting collected, again inflating the count of seaside entities sticking around.
Michal
On Apr 11, 2005 11:36 PM, Michal miso.list@auf.net wrote:
Just a me-too here to confirm that this can be a catch when using seaside, at least v2.3. I've recently seen the same two problems:
i. spiders hitting seaside every few seconds, apparently due to the dynamically generated unique urls, and bringing the session count to thousands (in my case the spider was jeeves/teoma)
Ok, so clearly this is a problem we need to deal with (I've never seen it because all the apps I've deployed have login pages :). Does anyone have any suggestions? Cees already has code in Janus for detecting spiders; what do we do with them once we have them?
ii. WADocumentHandler never expiring (isActive always returns true) and thus never getting collected, again inflating the count of seaside entities sticking around.
Yeah, that's a bit of a problem, but I'm not really sure what to do about it. You should have a finite number of static documents being used anyway, shouldn't you?
Avi
On Apr 11, 2005, at 4:48 PM, Avi Bryant wrote:
On Apr 11, 2005 11:36 PM, Michal miso.list@auf.net wrote:
Just a me-too here to confirm that this can be a catch when using seaside, at least v2.3. I've recently seen the same two problems:
i. spiders hitting seaside every few seconds, apparently due to the dynamically generated unique urls, and bringing the session count to thousands (in my case the spider was jeeves/teoma)
Ok, so clearly this is a problem we need to deal with (I've never seen it because all the apps I've deployed have login pages :). Does anyone have any suggestions? Cees already has code in Janus for detecting spiders; what do we do with them once we have them?
Heh, we can hardcode a handler for /robots.txt and tell them to go away.... or point to some static pages ;)
ii. WADocumentHandler never expiring (isActive always returns true) and thus never getting collected, again inflating the count of seaside entities sticking around.
Yeah, that's a bit of a problem, but I'm not really sure what to do about it. You should have a finite number of static documents being used anyway, shouldn't you?
Avi _______________________________________________ Seaside mailing list Seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/listinfo/seaside
Ok, so clearly this is a problem we need to deal with (I've never seen it because all the apps I've deployed have login pages :). Does anyone have any suggestions? Cees already has code in Janus for detecting spiders; what do we do with them once we have them?
When it happened, as a stop-gap I patched WAKom>>process: to return a hardcoded deny-all string to /robots.txt requests, just like Brian suggests (and temporarily kicked out Jeeves/Toema). Worked for me.
ii. WADocumentHandler never expiring (isActive always returns true) and thus never getting collected, again inflating the count of seaside entities sticking around.
Yeah, that's a bit of a problem, but I'm not really sure what to do about it. You should have a finite number of static documents being used anyway, shouldn't you?
I didn't spend time looking into it, but I did note that I had about 130 handlers for the same document...
On Tue, 12 Apr 2005 00:48:21 +0200, Avi Bryant avi.bryant@gmail.com wrote:
Ok, so clearly this is a problem we need to deal with (I've never seen it because all the apps I've deployed have login pages :). Does anyone have any suggestions?
Seaside should catch requests for /?.*/robots.txt probably and return a file prohibiting robots from accessing any URL. That would stop 99.99% of the bots before they can do any damage.
Then invent some mechanism (Janus-like, maybe - it is quite a generic thing) to selectively open up parts of Seaside apps to bots. Or use 'my' HV+Seaside suggestion, forbidding bots to enter the Seaside part.
Cees de Groot wrote:
On Tue, 12 Apr 2005 00:48:21 +0200, Avi Bryant avi.bryant@gmail.com wrote:
Ok, so clearly this is a problem we need to deal with (I've never seen it because all the apps I've deployed have login pages :). Does anyone have any suggestions?
Seaside should catch requests for /?.*/robots.txt probably and return a file prohibiting robots from accessing any URL. That would stop 99.99% of the bots before they can do any damage.
Then invent some mechanism (Janus-like, maybe - it is quite a generic thing) to selectively open up parts of Seaside apps to bots. Or use 'my' HV+Seaside suggestion, forbidding bots to enter the Seaside part.
Forgive this late response to Cees' post, but I just noticed it.
Anyway, Bountiful Baby uses Janus-like code to detect spider bots and feed them cached pages. The cached pages, in turn, have each link on the page linking to yet another cached page. So, if the spider is detected, it has no particular effect on the site. It is when the spider is undetected that the damage can be done.
If everyone remembers, I started this thread by commenting that I've had my Bountiful Baby image grow to some pretty huge numbers recently (around a gig). Well, the image is currently hovering at around 40 MB, which is very reasonable, and has been hovering around there for a couple of weeks now, without an image restart or anything. And, as anybody can see, 40 MB is much more reasonable of a size.
I now think my runaway-growth image was due to either an undetected spider(s) run amock, or else a deliberate denial-of-service attack. The way it ramped up, though, over a period of more than a week (it definitely wasn't sudden), leads me to think it might have been an undetected spider run amock rather than a DOS. And, I think that the way it suddenly disappeared is because the spider author changed their spider code to be more benign-- and I'd bet that was because Bountiful Baby was not the only site that the spider "bothered". But all of that is just a guess-- I have no hard data to substantiate it.
But, I also think the following suggestion by Avi is a genious suggestion:
Yes. I wonder what strategies we can use to detect and cope with that. One I can think of is to link the expiry time to how much the application has been used: if all you do is request the homepage, your session will expire very quickly, but if you look around a little more you're given more time. That seem reasonable?
This should be a preferences-tunable parameter.
Nevin
seaside@lists.squeakfoundation.org