[Seaside] Static sites and spider handling
Nevin Pratt
nevin at smalltalkpro.com
Tue Aug 26 07:08:04 CEST 2003
Cees de Groot wrote:
>On Tue, 2003-08-26 at 10:07, Avi Bryant wrote:
>
>
>>['that will not work']
>>
>>
>
>Needless to say, Avi was right. We chatted over this and this is roughly
>the proposed solution:
>
>1. every time you send GWSession>>addToPath:, you effectively indicate
>that you are about to setup some semi-static page. This might not always
>be true, but then you don't need to use the code I'm about to write :-);
>
>2. so when you do this, a flag is set; the next time a HTMLResponse is
>seen, its HTMLDocument object is added to a cache keyed by the path (and
>the site prefix, etcetera);
>
>3. when a robot hits the site (robot detection mechanism can still be
>done through the /robots.txt + UserAgent recognition trick), it gets an
>index page containing a static link to every page in the cache;
>
>4. robot follows index page, happily munches all the pages, and puts
>your site on rank #1. All the pages are rendered so that they don't have
>IMG or local HREF links, effectively presenting a flattened version of
>your site to the bot.
>
>I'm working on this (although I don't think I'll finish before tonight).
>Tentative package name is Janus, after the Roman God with two faces.
>
>I'll probably enhance this so you can have a dictionary keyed by page
>name and pointing to metatags around, so you can do quality SEO with
>Seaside.
>
>
>_______________________________________________
>Seaside mailing list
>Seaside at lists.squeakfoundation.org
>http://lists.squeakfoundation.org/listinfo/seaside
>
>
>
>
Here's what I tried (and what the result was):
1. For "robots" detection, I merely checked the log to see the IP range
for the google bots, then test for that IP range (your "robots"
detection scheme is obviously better).
2. I created an abstract superclass from which to subclass for my
components, and it has the following three methods (note that I also
added the 'komRequest' instance variable to the session class):
isGoogleIP
^ self session komRequest ipString beginsWith: '64.68.80'
renderContentOn: html
self isGoogleIP
ifTrue: [self renderGoogleContentOn: html]
ifFalse: [self renderNormalContentOn: html]
renderGoogleContentOn: html
^ self renderNormalContentOn: html
3. Now I placed the component code that the google bot is supposed to
see in #renderGoogleContentOn:, and the normal content in
#renderNormalContentOn:. Common content shared between those two
methods (which actually is most of the render code) is refactored into
separate methods.
4. For #renderGoogleContentOn:, I turned links into explicit URL's using
the scheme I discussed a couple months ago (in a Seaside thread titled
"[Seaside] anchors behaving like buttons?").
*******************************
OK, the above was mainly just a clumsy experiment. Did it work? Not
really-- I can't tell that it made any difference to the google bot.
Possibly part of the reason is because even though I turned links into
explicit URL's, Seaside just changes them right back to the usual funny
Seaside URL, and I don't think the google bot liked that.
Plus, it left me with two sets of rendering code, much of which needed
to now be kept in sync.
In other words, I don't have a solution. I only have another experience
data point.
Nevin
--
Nevin Pratt
Bountiful Baby
http://www.bountifulbaby.com
(801) 992-3137
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/seaside/attachments/20030826/314c5c59/attachment.htm
More information about the Seaside
mailing list