[Seaside] Static sites and spider handling

Cees de Groot cg at cdegroot.com
Tue Aug 26 10:46:12 CEST 2003


Here's my idea on how to handle spiders for static sites:

1. assumption: a site like http://www.tric.nl/ which has largely static
pages that are directly reachable;

2. when a client fetches /robots.txt, the user agent is added to the UA
database and a special session id based on the user agent string is
created;

3. every time the user agent has a value known in the UA database when a
session needs to be created, the same session id is used. This ensures
that URL's appear static to the spider;

4. when a link is followed with a 'robot session id' but the UA doesn't
match, a new session is created - this represents a real user coming in.

I want to make robots.txt accessible maybe with a simple comanche module
(or is there a way with Seaside to return data without a redirect to a
session first?), so that all hits on the file can be handled from the
application server image. 

Does this sound like a reasonable idea?



More information about the Seaside mailing list