[Seaside] Static sites and spider handling

Cees de Groot cg at cdegroot.com
Tue Aug 26 11:50:10 CEST 2003


On Tue, 2003-08-26 at 10:07, Avi Bryant wrote:
> ['that will not work']

Needless to say, Avi was right. We chatted over this and this is roughly
the proposed solution:

1. every time you send GWSession>>addToPath:, you effectively indicate
that you are about to setup some semi-static page. This might not always
be true, but then you don't need to use the code I'm about to write :-);

2. so when you do this, a flag is set; the next time a HTMLResponse is
seen, its HTMLDocument object is added to a cache keyed by the path (and
the site prefix, etcetera);

3. when a robot hits the site (robot detection mechanism can still be
done through the /robots.txt + UserAgent recognition trick), it gets an
index page containing a static link to every page in the cache;

4. robot follows index page, happily munches all the pages, and puts
your site on rank #1. All the pages are rendered so that they don't have
IMG or local HREF links, effectively presenting a flattened version of
your site to the bot.

I'm working on this (although I don't think I'll finish before tonight).
Tentative package name is Janus, after the Roman God with two faces.

I'll probably enhance this so you can have a dictionary keyed by page
name and pointing to metatags around, so you can do quality SEO with
Seaside.




More information about the Seaside mailing list