<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
<br>
<br>
Cees de Groot wrote:<br>
<blockquote type="cite"
cite="mid1061887809.30119.626.camel@home.home.cdegroot.com">
<pre wrap="">On Tue, 2003-08-26 at 10:07, Avi Bryant wrote:
</pre>
<blockquote type="cite">
<pre wrap="">['that will not work']
</pre>
</blockquote>
<pre wrap=""><!---->
Needless to say, Avi was right. We chatted over this and this is roughly
the proposed solution:
1. every time you send GWSession>>addToPath:, you effectively indicate
that you are about to setup some semi-static page. This might not always
be true, but then you don't need to use the code I'm about to write :-);
2. so when you do this, a flag is set; the next time a HTMLResponse is
seen, its HTMLDocument object is added to a cache keyed by the path (and
the site prefix, etcetera);
3. when a robot hits the site (robot detection mechanism can still be
done through the /robots.txt + UserAgent recognition trick), it gets an
index page containing a static link to every page in the cache;
4. robot follows index page, happily munches all the pages, and puts
your site on rank #1. All the pages are rendered so that they don't have
IMG or local HREF links, effectively presenting a flattened version of
your site to the bot.
I'm working on this (although I don't think I'll finish before tonight).
Tentative package name is Janus, after the Roman God with two faces.
I'll probably enhance this so you can have a dictionary keyed by page
name and pointing to metatags around, so you can do quality SEO with
Seaside.
_______________________________________________
Seaside mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Seaside@lists.squeakfoundation.org">Seaside@lists.squeakfoundation.org</a>
<a class="moz-txt-link-freetext" href="http://lists.squeakfoundation.org/listinfo/seaside">http://lists.squeakfoundation.org/listinfo/seaside</a>
</pre>
</blockquote>
<br>
Here's what I tried (and what the result was):<br>
<br>
1. For "robots" detection, I merely checked the log to see the IP range for
the google bots, then test for that IP range (your "robots" detection scheme
is obviously better).<br>
<br>
2. I created an abstract superclass from which to subclass for my components,
and it has the following three methods (note that I also added the 'komRequest'
instance variable to the session class):<br>
<br>
isGoogleIP<br>
^ self session komRequest ipString beginsWith: '64.68.80'<br>
<br>
renderContentOn: html <br>
self isGoogleIP<br>
ifTrue: [self renderGoogleContentOn: html]<br>
ifFalse: [self renderNormalContentOn: html]<br>
<br>
renderGoogleContentOn: html <br>
^ self renderNormalContentOn: html<br>
<br>
3. Now I placed the component code that the google bot is supposed to see
in #renderGoogleContentOn:, and the normal content in #renderNormalContentOn:.
Common content shared between those two methods (which actually is most
of the render code) is refactored into separate methods.<br>
<br>
4. For #renderGoogleContentOn:, I turned links into explicit URL's using
the scheme I discussed a couple months ago (in a Seaside thread titled "[Seaside]
anchors behaving like buttons?").<br>
<br>
*******************************<br>
<br>
OK, the above was mainly just a clumsy experiment. Did it work? Not really--
I can't tell that it made any difference to the google bot. Possibly part
of the reason is because even though I turned links into explicit URL's,
Seaside just changes them right back to the usual funny Seaside URL, and
I don't think the google bot liked that.<br>
<br>
Plus, it left me with two sets of rendering code, much of which needed to
now be kept in sync.<br>
<br>
In other words, I don't have a solution. I only have another experience
data point.<br>
<br>
Nevin<br>
<br>
<pre class="moz-signature" cols="$mailwrapcol">--
Nevin Pratt
Bountiful Baby
<a class="moz-txt-link-freetext" href="http://www.bountifulbaby.com">http://www.bountifulbaby.com</a>
(801) 992-3137
</pre>
<br>
</body>
</html>