[squeak-dev] SqueakSource indexability (aka should we just ask crawlers to desist?)

Levente Uzonyi leves at elte.hu
Wed Apr 28 21:05:45 UTC 2010


On Wed, 28 Apr 2010, Bert Freudenberg wrote:

> On 28.04.2010, at 22:08, Ken Causey wrote:
>>
>>> -------- Original Message --------
>>> Subject: Re: [squeak-dev] SqueakSource indexability (aka should we just
>>> ask crawlers to desist?)
>>> From: Bert Freudenberg <bert at freudenbergs.de>
>>> Date: Wed, April 28, 2010 2:59 pm
>>> To: The general-purpose Squeak developers list
>>> <squeak-dev at lists.squeakfoundation.org>
>>>
>>>
>>> On 28.04.2010, at 21:07, Ken Causey wrote:
>>>>
>>>> At times access to source.squeak.org becomes slower, as has been the
>>>> case today.  I can see in the logs that various web-crawlers are the
>>>> likely culprit.  Having the information there accessible via search
>>>> engines is a wornderful thing but I have to suspect that the Seaside
>>>> session IDs eliminate this option.  (Of course when URLs like
>>>> http://source.squeak.org/trunk.html are found on other sites they then
>>>> become indexed.)
>>>
>>> Which URLs are the bots accessing?
>>
>> Well, without detailed analysis it seems to be everything.  Feel free to
>> look at ~squeaksource/apachelogs/.
>>
>>>
>>>> Unless I'm mistaken about this, and I would appreciate any guidance, it
>>>> seems like we need to add a robots.txt to the site which guides or
>>>> simply asks crawlers to stay away.  Thoughts?  I'm no SEO export.
>>>
>>> We do have a robots.txt:
>>> http://source.squeak.org/robots.txt
>>
>> Aha.  Well, I know little about this subject.  But if this means what I
>> think it means it seems that the crawlers are ignoring it.
>
> I just read up on it. Glob patterns are *not* allowed, the single asterisk in the user agent is a special char and not a pattern match. We used
>
> User-agent: *
> Disallow: /@*
>
> But it should be
>
> User-agent: *
> Disallow: /@

Just realized that links generated by Seaside begin with @. Tricky. :)


Levente

>
> I'm going to fix that, let's see how it works out.
>
> - Bert -
>
>
>
>



More information about the Squeak-dev mailing list