Blocks from strings

Lukas Renggli renggli at student.unibe.ch
Fri Jan 2 08:36:25 UTC 2004


> Lukas, I'm guessing based on your response above, that a better 
> algorithm for the "autoSpider" detection feature might be to only use 
> the first 3 numbers of the IP instead of all 4 numbers for the 
> "autoSpider" feature.  Thus, if, say, a request comes in for a 
> "robots.txt" file, and it has a "FooBar" for the user-agent field, and 
> the IP of that request is, say, 10.25.50.75, then any subsequent 
> requests from any IP beginning with 10.25.50 that also have "FooBar" 
> for  the user-agent field would be deemed to be from the same spider.
> 
> What does everybody think?

Thanks for the description. 

I think that most smaller spiders do not have that many machines as 
google does and therefor modifications to yours/Cee's code are not worth 
the trouble. Furthermore I think my concerns are not valid anymore, as 
long as you keep #knownSpiderUserAgentPatterns updated ...

Lukas

-- 
Lukas Renggli
http://renggli.freezope.org




More information about the Squeak-dev mailing list