[Seaside] Static sites and spider handling

Tue Aug 26 07:08:04 CEST 2003

Cees de Groot wrote:

>On Tue, 2003-08-26 at 10:07, Avi Bryant wrote:
>  
>
>>['that will not work']
>>    
>>
>
>Needless to say, Avi was right. We chatted over this and this is roughly
>the proposed solution:
>
>1. every time you send GWSession>>addToPath:, you effectively indicate
>that you are about to setup some semi-static page. This might not always
>be true, but then you don't need to use the code I'm about to write :-);
>
>2. so when you do this, a flag is set; the next time a HTMLResponse is
>seen, its HTMLDocument object is added to a cache keyed by the path (and
>the site prefix, etcetera);
>
>3. when a robot hits the site (robot detection mechanism can still be
>done through the /robots.txt + UserAgent recognition trick), it gets an
>index page containing a static link to every page in the cache;
>
>4. robot follows index page, happily munches all the pages, and puts
>your site on rank #1. All the pages are rendered so that they don't have
>IMG or local HREF links, effectively presenting a flattened version of
>your site to the bot.
>
>I'm working on this (although I don't think I'll finish before tonight).
>Tentative package name is Janus, after the Roman God with two faces.
>
>I'll probably enhance this so you can have a dictionary keyed by page
>name and pointing to metatags around, so you can do quality SEO with
>Seaside.
>
>
>_______________________________________________
>Seaside mailing list
>Seaside at lists.squeakfoundation.org
>http://lists.squeakfoundation.org/listinfo/seaside
>
>
>  
>

Here's what I tried (and what the result was):

1. For "robots" detection, I merely checked the log to see the IP range 
for the google bots, then test for that IP range (your "robots" 
detection scheme is obviously better).

2. I created an abstract superclass from which to subclass for my 
components, and it has the following three methods (note that I also 
added the 'komRequest' instance variable to the session class):

isGoogleIP
    ^ self session komRequest ipString beginsWith: '64.68.80'

renderContentOn: html
    self isGoogleIP
        ifTrue: [self renderGoogleContentOn: html]
        ifFalse: [self renderNormalContentOn: html]

renderGoogleContentOn: html
    ^ self renderNormalContentOn: html

3.  Now I placed the component code that the google bot is supposed to 
see in #renderGoogleContentOn:, and the normal content in 
#renderNormalContentOn:.  Common content shared between those two 
methods (which actually is most of the render code) is refactored into 
separate methods.

4. For #renderGoogleContentOn:, I turned links into explicit URL's using 
the scheme I discussed a couple months ago (in a Seaside thread titled 
"[Seaside] anchors behaving like buttons?").

*******************************

OK, the above was mainly just a clumsy experiment.  Did it work?  Not 
really-- I can't tell that it made any difference to the google bot. 
 Possibly part of the reason is because even though I turned links into 
explicit URL's, Seaside just changes them right back to the usual funny 
Seaside URL, and I don't think the google bot liked that.

Plus, it left me with two sets of rendering code, much of which needed 
to now be kept in sync.

In other words, I don't have a solution.  I only have another experience 
data point.

Nevin

-- 
Nevin Pratt
Bountiful Baby
http://www.bountifulbaby.com
(801) 992-3137

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/seaside/attachments/20030826/314c5c59/attachment.htm