View Full Version : re: blocked bots list
TiMoGo
20-06-2006, 16:55/04:55PM
Split from blocked bots list (http://www.ihelpyou.com/forums/showthread.php?s=&threadid=22485)
Deb, excellent list! Thank You!
Perhaps adding a brief explanation about where and how to use it might be helpful for people that do not know what the list is for?
Honestly, until a few weeks ago, I never blocked any bot using this method.
WebSavvy
20-06-2006, 20:17/08:17PM
Yep, I should have added that into the original post. I'll add that in now.
gio3
21-06-2006, 12:39/12:39PM
Thanks a lot Deb for your precious post.
I never found such a comprehensive list of bad robots on the web.
I think it may become a reference and a useful resource for webmasters.
I hope we all will post the other suspicious boots we should find in our logs.
Gio
Comeran
21-06-2006, 15:45/03:45PM
Deb,
What a great list! It is much larger than I had expected :p
I am going to get my tech guy to e-mail me ours. From what I understand there are 2, 1 is blocked from .htaccess and then another from known scrapers that is scripted in to point them back to themselves.
Thanks again.
Comeran-
WebSavvy
21-06-2006, 17:53/05:53PM
I have a few that get turned back onto themselves too. You can do that in .htaccess also. :D
Connie
07-07-2006, 19:19/07:19PM
I'm locking this thread temporarily. There have been so many different bots discussed it is confusing.
If you have a question about a particular bot please start a new thread in regard to that bot.
In the meantime I will try to straighten this thread out for general discussion.
WebSavvy
07-07-2006, 19:53/07:53PM
I had a few minutes so I've sorted them out, Connie. :)
WebSavvy
07-07-2006, 20:10/08:10PM
Discussions of other bots have been moved out of this thread and into their own specific threads:
MojeekBot (http://www.ihelpyou.com/forums/showthread.php?s=&threadid=22643)
psycheclone / a.k.a. Digital Infinity (http://www.ihelpyou.com/forums/showthread.php?s=&threadid=22641)
Exabot (http://www.ihelpyou.com/forums/showthread.php?s=&threadid=22642)
bot questions (http://www.ihelpyou.com/forums/showthread.php?s=&threadid=22644)
Connie
07-07-2006, 20:10/08:10PM
Originally posted by savvy1
I had a few minutes so I've sorted them out, Connie. :)
Your fast. That is exactly what I was going to do. Then I'm old and slow. But I'm good based on what I have been told. :D
Connie
07-07-2006, 20:16/08:16PM
The other good thing about you doing this. I would probably have lost some post or something in the splitting.
WebSavvy
07-07-2006, 20:21/08:21PM
Hey Connie, maybe you can list that huge bot resource site in a post in the locked bots list thread?
That way it's in the same thread (maybe put it in the opening post I made), that way anyone wanting to research some of the bots can do so at that bot site.
WebSavvy
20-08-2006, 13:59/01:59PM
When blocking bots using this method, it's better to set the block rule this way:
SetEnvIfNoCase User-agent "^BlockedBotName" blocked=yes
Vs this way:
SetEnvIfNoCase User-agent "BlockedBotName" blocked=yes
The difference is this: ^
Using "in" as an example:
If we block "in" ... it also will block:
instant
begin
fine
If we block "^in" ... it will only block bad bots starting with "in"
like:
injure
incase
Tony (grungee) couldn't get to my site because he was somehow matching something in the blocked bots rule.
I changed it to "^
and now he can get to my site just fine.
jfrovich
07-10-2006, 12:45/12:45PM
does this work
SetEnvIfNoCase User-agent "BlockedBotName" blocked=no
Im seem to haved blocked ask
im checking the list.. ill keep checking
or is this a fake bot?
/Spyware-Block-List.html
Http Code: 403 Date: Oct 07 11:40:18 Http Version: HTTP/1.0 Size in Bytes: -
Referer: -
Agent: Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html)
/test/spyware-detection.html
Http Code: 403 Date: Oct 07 11:50:42 Http Version: HTTP/1.0 Size in Bytes: -
Referer: -
Agent: Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html)
i do have this blocked
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/2.0\ \(compatible;\ NEWT\ ActiveX;\ Win32\) [NC,OR]
can i just remove compatible; and will ask be able to crawl my site?
this is asks page
http://about.ask.com/en/docs/about/webmasters.shtml
User-: Mozilla/2.0 (compatible; Ask Jeeves/Teoma)
vBulletin® v3.8.3, Copyright ©2000-2010, Jelsoft Enterprises Ltd.