Blocks from strings
Lukas Renggli
renggli at student.unibe.ch
Thu Jan 1 20:18:31 UTC 2004
> To see this whole thing in action, first go here:
>
> http://www.bountifulbaby.com/robots.txt
>
> This will make the system think you are a spider.
>
> Now just hit the site normally:
>
> http://www.bountifulbaby.com
>
> You will see that all the Seaside gobble-de-goop is now missing from
> all URL's and all pages.
>
> Go ahead and hover the mouse over any anchor, or any "link" image.
> You will see that the URL that was generated for each link is also a
> static-looking link.
>
> The site will continue to think you are a spider until one hour after
> your last access.
I suppose you are working either with the referee-header or you are
remembering the IP of the spider accessing robots.txt. Are you sure this
actually works?
As far as I know from Google they are running their spiders on a cluster
of linux boxes. A site isn't scanned all at once, but every page is
scheduled, fetched and indexed from different machines with different
IPs. Are you using a different trick to keep the 'isSpider' information?
Cheers,
Lukas
--
Lukas Renggli
http://renggli.freezope.org
More information about the Squeak-dev
mailing list
|