Blocks from strings

Lukas Renggli renggli at student.unibe.ch
Thu Jan 1 20:18:31 UTC 2004


> To see this whole thing in action, first go here:
> 
>    http://www.bountifulbaby.com/robots.txt
> 
> This will make the system think you are a spider.
> 
> Now just hit the site normally:
> 
>    http://www.bountifulbaby.com
> 
> You will see that all the Seaside gobble-de-goop is now missing from 
> all  URL's and all pages.
>
> Go ahead and hover the mouse over any anchor, or any "link" image.  
> You  will see that the URL that was generated for each link is also a 
> static-looking link.
> 
> The site will continue to think you are a spider until one hour after 
> your last access.

I suppose you are working either with the referee-header or you are 
remembering the IP of the spider accessing robots.txt. Are you sure this 
actually works? 

As far as I know from Google they are running their spiders on a cluster 
of linux boxes. A site isn't scanned all at once, but every page is 
scheduled, fetched and indexed from different machines with different 
IPs. Are you using a different trick to keep the 'isSpider' information?

Cheers,
Lukas


-- 
Lukas Renggli
http://renggli.freezope.org




More information about the Squeak-dev mailing list