Blocks from strings
renggli at student.unibe.ch
Fri Jan 2 08:36:25 UTC 2004
> Lukas, I'm guessing based on your response above, that a better
> algorithm for the "autoSpider" detection feature might be to only use
> the first 3 numbers of the IP instead of all 4 numbers for the
> "autoSpider" feature. Thus, if, say, a request comes in for a
> "robots.txt" file, and it has a "FooBar" for the user-agent field, and
> the IP of that request is, say, 10.25.50.75, then any subsequent
> requests from any IP beginning with 10.25.50 that also have "FooBar"
> for the user-agent field would be deemed to be from the same spider.
> What does everybody think?
Thanks for the description.
I think that most smaller spiders do not have that many machines as
google does and therefor modifications to yours/Cee's code are not worth
the trouble. Furthermore I think my concerns are not valid anymore, as
long as you keep #knownSpiderUserAgentPatterns updated ...
More information about the Squeak-dev