View Full Version : Question for IncrediBill
WebSavvy
19-09-2006, 01:26/01:26AM
Bill, you mentioned that the best way to block bots is to allow access to good bots only and then all others would be blocked by default.
I brought our bad bots list back out from the private forums. There's some bots on this list that have been hitting my site, and I haven't seen them listed anywhere else.
Just thought you might want to have a look through it, and see if any of them are the same as the ones you've talked to me about.
bad bots list (http://www.ihelpyou.com/forums/showthread.php?s=&threadid=22485)
IncrediBILL
19-09-2006, 20:48/08:48PM
See, your bot block list is a BLACKLIST which is a waste of time.
I WHITELIST only, meaning Googlebot, Slurp, MSNbot and Teoma get thru as well as IE, FIREFOX, SAFARI, NETSCAPE and OPERA and everything else bounced.
At the moment the uber nerds trying to surf with cell phones and Treos get spanked as well and I don't care, their hands should be on the wheel of the car anyway.
Then I do check the SE bots against their IP ranges to bounce fakers and proxy hijackers, and the browsers are filtered for obvious errors and only the best get thru.
Also, I filter out HTML/SGML errors from page requests and bounce idiots that look for "/#top" and other stupid things.
Basically I'm doing it 180 degrees different than most people and I'm not the only one.
Try a whitelist and you'll sleep better at night knowing the latest and greatest bot from the Ukraine just splattered on the first page access.
WebSavvy
19-09-2006, 21:29/09:29PM
Please provide an example of how to do a whitelist as others reading this will benefit from this as well.
Thanks Bill! :cheers:
IncrediBILL
19-09-2006, 21:38/09:38PM
I actually use a dynamic real-time script to do this, not .htaccess, but here's a conceptual example of how I would go about it.
Note, this is just a starter sample of HOW to code it in .htaccess, nothing I've actually tested...
#allow just search engines we like, we're OPT-IN only
#a catch-all for Google
BrowserMatchNoCase Google good_pass
#a couple for Yahoo
BrowserMatchNoCase Slurp good_pass
BrowserMatchNoCase Yahoo-MMCrawler good_pass
#looks like all MSN starts with MSN or Sand
BrowserMatchNoCase ^msnbot good_pass
BrowserMatchNoCase SandCrawler good_pass
#don't forget ASK/Teoma
BrowserMatchNoCase Teoma good_pass
BrowserMatchNoCase Jeeves good_pass
#allow Firefox, MSIE, Opera etc., will punt Lynx, cell phones and PDAs, don't care
BrowserMatchNoCase ^Mozilla good_pass
BrowserMatchNoCase ^Opera good_pass
#Let just the good guys in, punt everyone else to the curb
#which includes blank user agents as well
<Limit GET POST PUT HEAD>
order deny,allow
deny from all
allow from env=good_pass
</Limit>
Some day I may flesh out a complete .htaccess file as it will stop a lot of noise with very little maintenance.
WebSavvy
19-09-2006, 21:47/09:47PM
Some people use Lynx browsers, or they use Avant, (and others like it). Will they still be able to gain access if one is using the codes from above?
If the browser agent gives a string like this:
irejspekpeeneypfnahthEtim
-- OR --
7ppvi7u_hqfgyjagxyyV_urohjsohVg
-- OR --
eiUjscpsqlhlkowteqh_Uyorftscql
Will they be stopped dead in their tracks?
Those are not ones I pulled from thin air. They're from my actual logs just yesterday. I get over 200 of these daily on my directory, and it's been going on for months now.
IncrediBILL
19-09-2006, 21:52/09:52PM
Remember, I posted a shell to start your own file ;)
You would need to add a line for Lynx but the gibberish user agent strings, as well as blank user agents, would bounce off your site harder than a rubber ball dropped from a 747 at 30K feet.
IncrediBILL
19-09-2006, 21:54/09:54PM
FYI, I bounce Lynx because almost nobody uses it except snoopy SEO's and some other wacky scripts use it. More hits from mobile devices than Lynx and I bounce them too ;)
WebSavvy
19-09-2006, 22:05/10:05PM
Cool, thanks so much for this, Bill ... really. :)
I'll use this as a starting point, and come up with something that accounts for everything, and then post it here as a working example for anyone that wants to use it.
:cheers:
You really are incrediBill! ;)
jfrovich
19-09-2006, 22:42/10:42PM
Originally posted by savvy1
Bill, you mentioned that the best way to block bots is to allow access to good bots only and then all others would be blocked by default.
bad bots list (http://www.ihelpyou.com/forums/showthread.php?s=&threadid=22485)
THats genius...
Ive never seen anyone think of it that way..
IncrediBILL
19-09-2006, 22:49/10:49PM
There are other caveats, like RSS feeds and robots.txt need full access for anyone, but that's nit picky ;)
WebSavvy
25-09-2006, 14:37/02:37PM
I did the .htaccess whitelist this morning. Over the next few days I'll monitor my logs for any "adjustments" that might be needed.
Bill, I made some modifications to your .htaccess shell, and put allow, deny with their own rules into one block.
<Limit GET POST PUT HEAD>
order allow,deny
allow from env=good_pass
deny from env=bad_pass
</Limit>
#allow Firefox, MSIE, Opera
SetEnvIfNoCase User-agent "Mozilla" good_pass
SetEnvIfNoCase User-agent "Opera" good_pass
SetEnvIfNoCase User-agent "Msie" good_pass
SetEnvIfNoCase User-agent "Firefox" good_pass
SetEnvIfNoCase User-agent "Netscape" good_pass
SetEnvIfNoCase User-agent "Safari" good_pass
SetEnvIfNoCase User-agent "Lynx" good_pass
SetEnvIfNoCase User-agent "Konqueror" good_pass
SetEnvIfNoCase User-agent "WebTV" good_pass
SetEnvIfNoCase User-agent "Camino" good_pass
SetEnvIfNoCase User-agent "K-Meleon" good_pass
SetEnvIfNoCase User-agent "Galeon" good_pass
# allow Google
SetEnvIfNoCase User-agent "Google" good_pass
# allow Yahoo
SetEnvIfNoCase User-agent "Slurp" good_pass
SetEnvIfNoCase User-agent "Yahoo" good_pass
SetEnvIfNoCase User-agent "MMCrawler" good_pass
# allow MSN
SetEnvIfNoCase User-agent "^msnbot" good_pass
SetEnvIfNoCase User-agent "SandCrawler" good_pass
SetEnvIfNoCase User-agent "^MSRBOT" good_pass
# allow ASK/Teoma
SetEnvIfNoCase User-agent "Teoma" good_pass
SetEnvIfNoCase User-agent "Jeeves" good_pass
# allow CaRP
SetEnvIfNoCase User-agent "^CaRP" good_pass
# allow Clush
SetEnvIfNoCase User-agent "^Clushbot" good_pass
# allow Voyager
SetEnvIfNoCase User-agent "^Voyager" good_pass
#allow Voila
SetEnvIfNoCase User-agent "^Voila" good_pass
# allow MoJeek
SetEnvIfNoCase User-agent "^MoJeek" good_pass
# allow WISENutbot
SetEnvIfNoCase User-agent "^WISENutbot" good_pass
# deny spammers
SetEnvIfNoCase User-agent "Indy" bad_pass
The Indy Library is a harvester/scraper and gives the UA as Mozilla 3.0
By open allowing *mozilla* Indy Library would be allowed in too. So, I set the rule in deny to keep them out.
Also, MSN has another bot "MSRBOT", and it does crawl my site so I've added that to the # allow lines for MSN.
If anyone else has anything to contribute to this, please do.
IncrediBILL
25-09-2006, 15:09/03:09PM
Savvy, nice to see you giving it a shot!
Couple of things I would consider mistakes, such as introducing:
"Indy" bad_pass
That's a blacklist method and it's already blocked by default as only what you get to good_pass gets it, so don't muddy the waters with bad_pass as that's old hat.
The next was taking off the anchor "^" and "/" from "^Mozilla/" and "^Opera/" as those two entries alone allow most, if not all legit browsers, including MSIE, OPERA, NETSCAPE, SAFARI, FIREFOX, etc. without having to name them explicitly.
I use all those browsers to test my bot blocker so I know they all get in ;)
Maybe I didn't explain so well the first time but this is all good as we're all learning from each other and I'll know better how to explain it next time.
WebSavvy
25-09-2006, 15:19/03:19PM
Actually, I didn't have Indy there the first time and then I went to CPanel and looked at the "Latest Visitors" and there were about 40+ accesses from Indy Library all with 200 Server Response Code.
Then, I added that line to block it and now it's back to serving Indy a nice fat juicy 403. :D
I don't use any other browsers except FF, IE, Opera, and AOL -- so I didn't know if the others were able to get through or not.
I added Lynx because I have a few editors that have Linux OS and use that browser, plus we get about 200-300 users a month who also use it. We get around 100+ for WebTV so I added that also.
jfrovich
25-09-2006, 22:36/10:36PM
so Deb
this is ready for us and will allow all the main spiders.
IncrediBILL
25-09-2006, 22:45/10:45PM
AH HA!
I looked up Indy to see why it got thru for you, remember, I use a script, not htaccess...
"Mozilla/3.0 (compatible; Indy Library):
I don't permit anything that starts with "^Mozilla/" or "^Opera/" that only has 2 parameters like this, without the OS in the string I kick 'em to the curb.
So, you did the right thing with what I call a 2nd pass filter in my code, which is to filter out extraneous stuff that starts with "^Mozilla/". and some other things I would punt also is....
SetEnvIfNoCase User-agent "script" bad_pass
SetEnvIfNoCase User-agent "a href" bad_pass
However, I do all the 2nd pass in, yes, a second pass after all the GOOD stuff is filtered out, then I subfilter for BAD stuff after the fact. So it's OPT-IN pass, then a FILTER pass, which is much smaller than the old BLACKLIST.
Keeps the rules clean ;)
WebSavvy
25-09-2006, 23:51/11:51PM
Yes, Jason. It's ready to use.
I also have added the following to my own:
SetEnvIfNoCase User-agent "^Mozilla/3" bad_pass
SetEnvIfNoCase User-agent "^Mozilla/2" bad_pass
Indy Library sends the UA as Mozilla 3 and Mozilla 2. I've never had legit users with the Mozilla 2/3 UA, ever.
Since blocking Indy using the bad_pass I've found some instances where they've removed the Indy Library from the UA string sending it as just Mozilla/3
Now that's been applied to the rules it should keep them out.
Also, I've added something to .htaccess for images to prevent others from hotlinking.
SetEnvIfNoCase Referer "domain\.com" local_ref=1
<FilesMatch "\.(gif|jpg|png|swf)$">
Order Allow,Deny
Allow from env=local_ref
</FilesMatch>
Replace domain.com with your own domain name.
This will stop your images from being called to other domains via hotlinking and will only allow your images to be served from your own site to users that are physically viewing your pages.
jfrovich
26-09-2006, 00:10/12:10AM
so if i need to add say
dogpile,Mamma,Lycos and a few other smaller se's
would this work
# allow dogpile
SetEnvIfNoCase User-agent "dogpile" good_pass
WebSavvy
26-09-2006, 00:14/12:14AM
Yep, that would work Jason. :)
SetEnvIfNoCase User-agent "dogplie" good_pass
SetEnvIfNoCase User-agent "mamma" good_pass
SetEnvIfNoCase User-agent "lycos" good_pass
:)
jfrovich
26-09-2006, 19:16/07:16PM
well i will watch my server logs
traffic and see what happens now that i uploaded the changes
also i only need to add se's that crawl my site right? no every one that gives me traffic?
i would assume these are the errors
client denied by server configuration
im getting several of these
failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n in
http://www.supportcave.com/spyware/spyware_protection/Spyware-Blaster.html
its blocking my review script...
DOH!!
could it be my kontra ads,digitalpoint.com tracker, my StatCounter or google analytics?
ill see if i can nail it down
ps how can i tell if its a bot or a person?
ive been to http://www.dnsstuff.com/
to check them out, but im not sure
thanks
jfrovich
26-09-2006, 21:05/09:05PM
guess i have to allow xenu?
but it was easy..
very cool
WebSavvy
26-09-2006, 23:36/11:36PM
Jason, it's not blocking your review script. That's reporting to you the page that was being accessed by someone/thing that didn't have permission to access.
Yes, you will see a rise in reports on your error log and that's how you know it's working. Whatever isn't supposed to access your site and tries to anyway will automatically generate an error which gets reported to you on your log file.
The "cannot open stream" bit is telling you what "page" the access attempt was made on.
Yes, you'd need to allow Xenu if you want it to have access. I don't have it allowed on mine because I don't use it but others who have it are crawling my site with it and that's BS!
So far, it seems to be going along smoothly. I haven't checked the logs for today yet but will in a little while.
jfrovich
26-09-2006, 23:56/11:56PM
HI Deb
for some reason it IS blocking my reviews
i remove the lines and its works
I put them back and its down again
not sure why its just php
Ive done this several time tonight
ive looked in the code , no idea why its doing this..
the odd thing is they all work when viewed here
http://www.supportcave.com/review/
click any link and they show up fine
i posted this at the scripts forum as well
:confused:
WebSavvy
27-09-2006, 09:44/09:44AM
Jason, do you have CPanel where SupportCave is at? I know you have it on my server.
Go into CPanel, and click the stats area to go into AwStats. Then, from the left column click the link for "Unknown Browsers" and see what's listed there that might be a UA for your script.
If there's one there, add that to .htaccess as allowed, and it should fix the problem.
I use CaRP on my server and it has a UA of "CaRP" and I added that to .htaccess as allowed, and so it runs without a hitch.
jfrovich
27-09-2006, 10:06/10:06AM
cool never noticed that before
its guite a list
Mediapartners-Google/2.1 27 Sep 2006 - 07:53
Feedfetcher-Google;_(_http://www.google.com/feedfetcher.html) 27 Sep 2006 - 07:31
AWSM_Bot_1 26 Sep 2006 - 23:54
AWSM_Bot_2 26 Sep 2006 - 23:54
- 26 Sep 2006 - 23:53
Xenu_Link_Sleuth_1.2h 26 Sep 2006 - 21:03
Firefox_kastaneta03@hotmail.com 26 Sep 2006 - 19:41
contype 26 Sep 2006 - 19:13
SiteUptime.com 26 Sep 2006 - 19:09
Missigua_Locator_1.9 26 Sep 2006 - 18:43
AWSM_Bot_3 26 Sep 2006 - 18:38
teoma_agent1 26 Sep 2006 - 17:01
AWSM_Bot_4 26 Sep 2006 - 13:01
MSRBOT 26 Sep 2006 - 12:58
PubSub-RSS-Reader/1.1_(http://www.pubsub.com/) 26 Sep 2006 - 11:56
AR 26 Sep 2006 - 11:47
Liferea/1.0.23_(Linux;_en_ZW.UTF-8;_http://liferea.sf.net/) 26 Sep 2006 - 11:26
dragonfly(ebingbong@playstarmusic.com) 26 Sep 2006 - 10:35
Java/1.5.0_06 26 Sep 2006 - 08:29
Forum_Poster_-_fp.icontool.com 26 Sep 2006 - 08:24
larbin_2.6.3_larbin2.6.3@unspecified.mail 26 Sep 2006 - 08:20
Liferea/1.0.12_(Linux;_en_ZW.UTF-8;_http://liferea.sf.net/) 26 Sep 2006 - 04:57
Snappy/1.1_(_http://www.urltrends.com/_) 26 Sep 2006 - 03:59
BlogSearch/1.1__http://www.icerocket.com/ 26 Sep 2006 - 03:06
SuperBot/4.6.0.69_(Windows_XP) 26 Sep 2006 - 02:13
lwp-trivial/1.40 25 Sep 2006 - 23:46
Java/1.4.1_04 25 Sep 2006 - 22:09
sdcresearchlabs-testbot/0.8-dev_(www.shopping.com/bot.html;_http://lucene.apache.org/nutch/bot.html;_researchbot@shopping.com) 25 Sep 2006 - 20:54
FDM_2.x 25 Sep 2006 - 20:04
AWSM_Bot_6 25 Sep 2006 - 19:32
AWSM_Bot_5 25 Sep 2006 - 18:56
AWSM_Bot_7 25 Sep 2006 - 18:56
AWSM_Bot_8 25 Sep 2006 - 18:55
Jyxobot/1 25 Sep 2006 - 16:47
MJ12bot/v1.0.8_(http://majestic12.co.uk/bot.php?_) 25 Sep 2006 - 13:15
Blog_Conversation_Project;_blogs@iq.harvard.edu;_http://gking.harvard.edu/blogconv/ 25 Sep 2006 - 07:32
Snoopy_v1.2 25 Sep 2006 - 07:24
SurveyBot/2.3_(Whois_Source) 25 Sep 2006 - 06:58
Python-urllib/1.16 25 Sep 2006 - 02:41
lwp-trivial/1.41 24 Sep 2006 - 20:27
LWP::Simple/5.803 24 Sep 2006 - 20:27
cks 24 Sep 2006 - 19:50
Google-Sitemaps/1.0 24 Sep 2006 - 19:40
miniRank/2.0_(miniRank;_http://minirank.com/;_website_ranking_engine) 24 Sep 2006 - 16:52
POE-Component-Client-HTTP/0.65_(perl;_N;_POE;_en;_rv:0.650000) 24 Sep 2006 - 16:15
Exabot/3.0 24 Sep 2006 - 01:25
ZixyBot_3.16.2 24 Sep 2006 - 00:06
Internet_Ninja_6.0 23 Sep 2006 - 22:27
Java/1.5.0_04 23 Sep 2006 - 21:14
Java/1.4.2_10 23 Sep 2006 - 19:11
CJNetworkQuality;_http://www.cj.com/networkquality 23 Sep 2006 - 14:58
yacybot_(x86_Windows_XP_5.1;_java_1.5.0_08;_Europe/de)_yacy.net 23 Sep 2006 - 10:36
NG/2.0 23 Sep 2006 - 10:10
TravelLazerBot/1.0 23 Sep 2006 - 06:28
UtilMind_HTTPGet 22 Sep 2006 - 18:39
SBIder/0.8-dev_(SBIder;_http://www.sitesell.com/sbider.html;_http://support.sitesell.com/contact-support.html) 22 Sep 2006 - 11:20
Webdup/0.9 22 Sep 2006 - 08:57
Seekbot/1.0_(http://www.seekbot.net/bot.html)_HTTPFetcher/2.2 22 Sep 2006 - 00:36
Secure_IE 21 Sep 2006 - 22:49
Java/1.4.2_05 21 Sep 2006 - 17:35
Java/1.4.2_02 21 Sep 2006 - 16:48
Zeus_15109_Webster_Pro_V2.9_Win32 21 Sep 2006 - 16:17
GT::WWW/1.022 21 Sep 2006 - 14:15
wwwster/1.4_(Beta,_mailto:gue@cis.uni-muenchen.de) 21 Sep 2006 - 10:52
Technoratibot/0.7 20 Sep 2006 - 20:08
Jakarta_Commons-HttpClient/3.0 20 Sep 2006 - 20:00
ozelot/2.7.3_(Search_engine_indexer;_www.flying-cat.de/ozelot;_ozelot@flying-cat.de) 20 Sep 2006 - 19:57
LWP::Simple/5.65 20 Sep 2006 - 19:35
lwp-trivial/1.35 20 Sep 2006 - 19:35
bloghunter.net 20 Sep 2006 - 17:49
OmniExplorer_Bot/6.70_(_http://www.omni-explorer.com)_WorldIndexer 20 Sep 2006 - 17:49
Blogslive_(info@blogslive.com) 20 Sep 2006 - 17:49
Jakarta_Commons-HttpClient/3.0-rc2 20 Sep 2006 - 17:49
Java/1.5.0_03 20 Sep 2006 - 17:49
ping.blo.gs/2.0 20 Sep 2006 - 17:48
NetMonitor/2.0_(See_http://expertmonitor.com/explanation/) 20 Sep 2006 - 10:54
Xenu_Link_Sleuth_1.2g 20 Sep 2006 - 02:33
Download_Master 19 Sep 2006 - 22:32
Xenu_Link_Sleuth_1.2f 19 Sep 2006 - 19:38
Exabot-Images/1.0 19 Sep 2006 - 10:38
Maxthon 19 Sep 2006 - 10:37
LWP::Simple/5.45 19 Sep 2006 - 07:10
cfetch/1.0 19 Sep 2006 - 05:42
Java/1.5.0 18 Sep 2006 - 17:07
BilgiBot/1.0(beta)_(bilgi.com(beta);_http://lucene.apache.org/nutch/bot.html;_nutch-agent@lucene.apache.org) 18 Sep 2006 - 12:30
Download_Ninja_7.0 18 Sep 2006 - 08:49
Microsoft_URL_Control_-_6.01.9782 18 Sep 2006 - 08:25
ShopWiki/1.0_(__http://www.shopwiki.com/wiki/Help:Bot) 17 Sep 2006 - 23:40
Ken_Church 16 Sep 2006 - 20:07
Microsoft_URL_Control_-_6.00.8862 16 Sep 2006 - 08:26
AWSM_Bot_9 15 Sep 2006 - 23:01
NaverBot-1.0_(NHN_Corp._/__82-31-784-1989_/_nhnbot@naver.com) 15 Sep 2006 - 18:47
Avant_Browser_(http://www.avantbrowser.com) 15 Sep 2006 - 15:45
CFNetwork/129.16 15 Sep 2006 - 07:00
Java/1.5.0_01 15 Sep 2006 - 01:43
Marvin_v0.3 15 Sep 2006 - 00:17
Microsoft_Internet_Explorer 14 Sep 2006 - 21:01
Java/1.5.0_07 14 Sep 2006 - 17:35
Nutch-test/Nutch-0.9-dev 14 Sep 2006 - 14:11
hgrepurl/1.0 14 Sep 2006 - 05:33
GT::WWW/1.026 13 Sep 2006 - 19:09
Snapbot/1.0 13 Sep 2006 - 16:42
<a_href='http://www.netforex.org'>_Forex_Trading_Network_Organization_</a>_info@netforex.org 13 Sep 2006 - 13:04
Bitacle_bot/1.1 13 Sep 2006 - 10:42
Java/1.5.0_05 13 Sep 2006 - 07:26
Java/1.5.0_08 13 Sep 2006 - 02:36
PEAR_HTTP_Request_class_(_http://pear.php.net/_) 12 Sep 2006 - 20:03
blogsearchbot-pumpkin-2 12 Sep 2006 - 11:41
Java/1.4.2_03 12 Sep 2006 - 10:24
Browsershots_URL_Check 11 Sep 2006 - 00:26
BDFetch 09 Sep 2006 - 20:40
nicebot 09 Sep 2006 - 13:54
Java/1.4.1_03 09 Sep 2006 - 07:09
NutchCVS/0.7.1_(Nutch;_http://lucene.apache.org/nutch/bot.html;_nutch-agent@lucene.apache.org) 09 Sep 2006 - 03:43
Wells_Search_II 08 Sep 2006 - 19:25
!Susie_(http://www.sync2it.com/susie) 07 Sep 2006 - 20:25
Mister_Pix_II_2.15 07 Sep 2006 - 18:28
Link_Checker/1.2.3_(PPC_Mac_OS_X) 07 Sep 2006 - 17:45
FDM_1.x 06 Sep 2006 - 14:14
Xenu_Link_Sleuth_1.2d 05 Sep 2006 - 07:45
Link_Valet_Online_1.1 04 Sep 2006 - 17:48
DA_7.0 04 Sep 2006 - 16:00
page_verifier_http://www.securecomputing.com/goto/pv 03 Sep 2006 - 22:55
www.adressendeutschland.de 03 Sep 2006 - 15:22
Liferea/1.0.21_(Linux;_en_US.UTF-8;_http://liferea.sf.net/) 03 Sep 2006 - 10:44
iSearch/2.16 03 Sep 2006 - 00:55
NutchCVS/0.8-dev_(Nutch;_http://lucene.apache.org/nutch/bot.html;_nutch-agent@lucene.apache.org) 01 Sep 2006 - 15:20
IlseBot/1.0 01 Sep 2006 - 14:54
Xenu's_Link_Sleuth_1.1c 01 Sep 2006 - 11:43
jfrovich
27-09-2006, 10:14/10:14AM
or could it be unknown os's
YahooFeedSeeker/2.0_(compatible;_Mozilla_4.0;_MSIE_5.5;_http://publisher.yahoo.com/rssguide;_users_3;_views_122) 27 Sep 2006 - 09:14
YahooFeedSeeker/2.0_(compatible;_Mozilla_4.0;_MSIE_5.5;_http://publisher.yahoo.com/rssguide) 27 Sep 2006 - 09:14
Mozilla/4.0_(compatible;) 27 Sep 2006 - 09:08
Mediapartners-Google/2.1 27 Sep 2006 - 07:53
YahooFeedSeeker/2.0_(compatible;_Mozilla_4.0;_MSIE_5.5;_http://publisher.yahoo.com/rssguide;_users_0;_views_0) 27 Sep 2006 - 07:45
Feedfetcher-Google;_(_http://www.google.com/feedfetcher.html) 27 Sep 2006 - 07:31
Mozilla/8.0 27 Sep 2006 - 04:17
AWSM_Bot_1 26 Sep 2006 - 23:54
AWSM_Bot_2 26 Sep 2006 - 23:54
- 26 Sep 2006 - 23:53
YahooFeedSeeker/2.0_(compatible;_Mozilla_4.0;_MSIE_5.5;_http://publisher.yahoo.com/rssguide;_users_62;_views_4010) 26 Sep 2006 - 22:33
Xenu_Link_Sleuth_1.2h 26 Sep 2006 - 21:03
Mozilla/5.0_(compatible;_Google_Desktop) 26 Sep 2006 - 20:33
Firefox_kastaneta03@hotmail.com 26 Sep 2006 - 19:41
contype 26 Sep 2006 - 19:13
SiteUptime.com 26 Sep 2006 - 19:09
YahooFeedSeeker_Testing/2.0_(compatible;_Mozilla_4.0;_MSIE_5.5;_http://publisher.yahoo.com/rssguide;_users_0;_views_0) 26 Sep 2006 - 18:52
Missigua_Locator_1.9 26 Sep 2006 - 18:43
AWSM_Bot_3 26 Sep 2006 - 18:38
Mozilla/5.0_(compatible;_heritrix/1.6.0__http://www.worio.com/) 26 Sep 2006 - 17:30
teoma_agent1 26 Sep 2006 - 17:01
Mozilla/3.0_(compatible;_Indy_Library) 26 Sep 2006 - 16:44
findlinks/1.1.3-beta9_(_http://wortschatz.uni-leipzig.de/findlinks/) 26 Sep 2006 - 14:44
AWSM_Bot_4 26 Sep 2006 - 13:01
MSRBOT 26 Sep 2006 - 12:58
Mozilla/5.0_(compatible;_LinksManager.com_bot__http://linksmanager.com/linkchecker.html) 26 Sep 2006 - 12:10
PubSub-RSS-Reader/1.1_(http://www.pubsub.com/) 26 Sep 2006 - 11:56
AR 26 Sep 2006 - 11:47
dragonfly(ebingbong@playstarmusic.com) 26 Sep 2006 - 10:35
Mozilla/5.0_(compatible;_BecomeBot/2.3;_MSIE_6.0_compatible;__http://www.become.com/site_owners.html) 26 Sep 2006 - 09:34
Wget/1.5.3.1 26 Sep 2006 - 09:11
Java/1.5.0_06 26 Sep 2006 - 08:29
Forum_Poster_-_fp.icontool.com 26 Sep 2006 - 08:24
larbin_2.6.3_larbin2.6.3@unspecified.mail 26 Sep 2006 - 08:20
Mozilla/4.0_(compatible;_MSIE_6.0;) 26 Sep 2006 - 05:06
Snappy/1.1_(_http://www.urltrends.com/_) 26 Sep 2006 - 03:59
BlogSearch/1.1__http://www.icerocket.com/ 26 Sep 2006 - 03:06
lwp-trivial/1.40 25 Sep 2006 - 23:46
Mozilla/4.0_(compatible;_Globel;_Traffic_Sent_From:_http://www.free-stuff.me.uk) 25 Sep 2006 - 23:46
YahooFeedSeeker_Testing/2.0_(compatible;_Mozilla_4.0;_MSIE_5.5;_http://publisher.yahoo.com/rssguide) 25 Sep 2006 - 23:06
Java/1.4.1_04 25 Sep 2006 - 22:09
sdcresearchlabs-testbot/0.8-dev_(www.shopping.com/bot.html;_http://lucene.apache.org/nutch/bot.html;_researchbot@shopping.com) 25 Sep 2006 - 20:54
FDM_2.x 25 Sep 2006 - 20:04
AWSM_Bot_6 25 Sep 2006 - 19:32
AWSM_Bot_5 25 Sep 2006 - 18:56
AWSM_Bot_7 25 Sep 2006 - 18:56
AWSM_Bot_8 25 Sep 2006 - 18:55
Jyxobot/1 25 Sep 2006 - 16:47
MJ12bot/v1.0.8_(http://majestic12.co.uk/bot.php?_) 25 Sep 2006 - 13:15
Mozilla/4.0_(compatible;_Google_Desktop) 25 Sep 2006 - 08:32
Blog_Conversation_Project;_blogs@iq.harvard.edu;_http://gking.harvard.edu/blogconv/ 25 Sep 2006 - 07:32
Snoopy_v1.2 25 Sep 2006 - 07:24
SurveyBot/2.3_(Whois_Source) 25 Sep 2006 - 06:58
Python-urllib/1.16 25 Sep 2006 - 02:41
Mozilla/3.01_(compatible;) 24 Sep 2006 - 22:53
lwp-trivial/1.41 24 Sep 2006 - 20:27
LWP::Simple/5.803 24 Sep 2006 - 20:27
cks 24 Sep 2006 - 19:50
Mozilla/3.0_(compatible;) 24 Sep 2006 - 19:41
Google-Sitemaps/1.0 24 Sep 2006 - 19:40
miniRank/2.0_(miniRank;_http://minirank.com/;_website_ranking_engine) 24 Sep 2006 - 16:52
POE-Component-Client-HTTP/0.65_(perl;_N;_POE;_en;_rv:0.650000) 24 Sep 2006 - 16:15
RMA/1.0_(compatible;_RealMedia) 24 Sep 2006 - 14:00
Exabot/3.0 24 Sep 2006 - 01:25
ZixyBot_3.16.2 24 Sep 2006 - 00:06
Internet_Ninja_6.0 23 Sep 2006 - 22:27
Mozilla/2.0_(compatible;_MS_FrontPage_5.0) 23 Sep 2006 - 21:48
Java/1.5.0_04 23 Sep 2006 - 21:14
Java/1.4.2_10 23 Sep 2006 - 19:11
CJNetworkQuality;_http://www.cj.com/networkquality 23 Sep 2006 - 14:58
NG/2.0 23 Sep 2006 - 10:10
TravelLazerBot/1.0 23 Sep 2006 - 06:28
W3C_Validator/1.432.2.10 22 Sep 2006 - 22:31
UtilMind_HTTPGet 22 Sep 2006 - 18:39
Mozilla/4.0_(compatible;_MSIE_5.5) 22 Sep 2006 - 11:37
SBIder/0.8-dev_(SBIder;_http://www.sitesell.com/sbider.html;_http://support.sitesell.com/contact-support.html) 22 Sep 2006 - 11:20
Webdup/0.9 22 Sep 2006 - 08:57
Seekbot/1.0_(http://www.seekbot.net/bot.html)_HTTPFetcher/2.2 22 Sep 2006 - 00:36
Secure_IE 21 Sep 2006 - 22:49
StackRambler/2.0_(MSIE_incompatible) 21 Sep 2006 - 22:25
Java/1.4.2_05 21 Sep 2006 - 17:35
Java/1.4.2_02 21 Sep 2006 - 16:48
Mozilla/5.0_(compatible;_MOSBookmarks/v2.6-Plus;_Link_Checker) 21 Sep 2006 - 15:51
Windows-Media-Player/9.00.00.3349 21 Sep 2006 - 15:42
GT::WWW/1.022 21 Sep 2006 - 14:15
wwwster/1.4_(Beta,_mailto:gue@cis.uni-muenchen.de) 21 Sep 2006 - 10:52
Technoratibot/0.7 20 Sep 2006 - 20:08
Jakarta_Commons-HttpClient/3.0 20 Sep 2006 - 20:00
ozelot/2.7.3_(Search_engine_indexer;_www.flying-cat.de/ozelot;_ozelot@flying-cat.de) 20 Sep 2006 - 19:57
LWP::Simple/5.65 20 Sep 2006 - 19:35
lwp-trivial/1.35 20 Sep 2006 - 19:35
bloghunter.net 20 Sep 2006 - 17:49
OmniExplorer_Bot/6.70_(_http://www.omni-explorer.com)_WorldIndexer 20 Sep 2006 - 17:49
Blogslive_(info@blogslive.com) 20 Sep 2006 - 17:49
Jakarta_Commons-HttpClient/3.0-rc2 20 Sep 2006 - 17:49
Java/1.5.0_03 20 Sep 2006 - 17:49
ping.blo.gs/2.0 20 Sep 2006 - 17:48
NetMonitor/2.0_(See_http://expertmonitor.com/explanation/) 20 Sep 2006 - 10:54
SIE-M65/50_UP.Browser/7.0.2.2.d.3(GUI)_MMP/2.0_Profile/MIDP-2.0_Configuration/CLDC-1.1 20 Sep 2006 - 10:29
Xenu_Link_Sleuth_1.2g 20 Sep 2006 - 02:33
Download_Master 19 Sep 2006 - 22:32
Xenu_Link_Sleuth_1.2f 19 Sep 2006 - 19:38
Mozilla/5.0_(compatible;_linktiger/1.0;__http://www.linktiger.com/) 19 Sep 2006 - 12:44
Exabot-Images/1.0 19 Sep 2006 - 10:38
Maxthon 19 Sep 2006 - 10:37
LWP::Simple/5.45 19 Sep 2006 - 07:10
LinkScan/11.6c_Windows 19 Sep 2006 - 06:05
cfetch/1.0 19 Sep 2006 - 05:42
Windows-Media-Player/11.00.00.4715 19 Sep 2006 - 01:59
Java/1.5.0 18 Sep 2006 - 17:07
Opera/8.01_(J2ME/MIDP;_Opera_Mini/2.0.4012/1316;_en;_U;_ssr) 18 Sep 2006 - 17:02
BilgiBot/1.0(beta)_(bilgi.com(beta);_http://lucene.apache.org/nutch/bot.html;_nutch-agent@lucene.apache.org) 18 Sep 2006 - 12:30
Download_Ninja_7.0 18 Sep 2006 - 08:49
Microsoft_URL_Control_-_6.01.9782 18 Sep 2006 - 08:25
YahooSeeker/1.2_(compatible;_Mozilla_4.0;_MSIE_5.5;_yahooseeker_at_yahoo-inc_dot_com_;_http://help.yahoo.com/help/us/shop/merchant/) 18 Sep 2006 - 07:15
ShopWiki/1.0_(__http://www.shopwiki.com/wiki/Help:Bot) 17 Sep 2006 - 23:40
Mozilla/4.0_(compatible;_MS_FrontPage_6.0) 17 Sep 2006 - 12:20
Windows-Media-Player/10.00.00.4036 17 Sep 2006 - 08:43
Ken_Church 16 Sep 2006 - 20:07
Jigsaw/2.2.5_W3C_CSS_Validator_JFouffa/2.0 16 Sep 2006 - 17:17
Windows-Media-Player/10.00.00.3646 16 Sep 2006 - 14:41
Forum_Poster_-_fp.icontool.com,__DynaWeb_http://www.dit-inc.us/disclaimer.php 16 Sep 2006 - 10:43
Mozilla/4.0_compatible 16 Sep 2006 - 08:31
Microsoft_URL_Control_-_6.00.8862 16 Sep 2006 - 08:26
Nokia3220/2.0_(05.50)_Profile/MIDP-2.0_Configuration/CLDC-1.1 16 Sep 2006 - 01:45
AWSM_Bot_9 15 Sep 2006 - 23:01
NaverBot-1.0_(NHN_Corp._/__82-31-784-1989_/_nhnbot@naver.com) 15 Sep 2006 - 18:47
Avant_Browser_(http://www.avantbrowser.com) 15 Sep 2006 - 15:45
Windows-Media-Player/9.00.00.3250 15 Sep 2006 - 12:25
CFNetwork/129.16 15 Sep 2006 - 07:00
Java/1.5.0_01 15 Sep 2006 - 01:43
Marvin_v0.3 15 Sep 2006 - 00:17
libwww-perl/5.803 15 Sep 2006 - 00:16
Mozilla/4.0_compatible_FurlBot/Furl_Search_2.0_(FurlBot;_http://www.furl.net;_wn.furlbot@looksmart.net) 14 Sep 2006 - 22:42
Microsoft_Internet_Explorer 14 Sep 2006 - 21:01
Java/1.5.0_07 14 Sep 2006 - 17:35
Mozilla/5.0 14 Sep 2006 - 15:39
Nutch-test/Nutch-0.9-dev 14 Sep 2006 - 14:11
hgrepurl/1.0 14 Sep 2006 - 05:33
Lynx/2.8.5dev.16_libwww-FM/2.14_SSL-MM/1.4.1_OpenSSL/0.9.7a 14 Sep 2006 - 00:28
GT::WWW/1.026 13 Sep 2006 - 19:09
Snapbot/1.0 13 Sep 2006 - 16:42
<a_href='http://www.netforex.org'>_Forex_Trading_Network_Organization_</a>_info@netforex.org 13 Sep 2006 - 13:04
Bitacle_bot/1.1 13 Sep 2006 - 10:42
Java/1.5.0_05 13 Sep 2006 - 07:26
Mozilla/5.0_(compatible;_OnetSzukaj/5.0;__http://szukaj.onet.pl) 13 Sep 2006 - 06:20
Java/1.5.0_08 13 Sep 2006 - 02:36
Wget/1.10.1 12 Sep 2006 - 23:37
PEAR_HTTP_Request_class_(_http://pear.php.net/_) 12 Sep 2006 - 20:03
Mozilla/4.0_(compatible;_IE-Favorites-Check-0.5) 12 Sep 2006 - 18:05
blogsearchbot-pumpkin-2 12 Sep 2006 - 11:41
Java/1.4.2_03 12 Sep 2006 - 10:24
Wget/1.10.2_(Red_Hat_modified) 12 Sep 2006 - 07:56
Mozilla/4.0_(compatible;_MSIE_5.0) 11 Sep 2006 - 23:54
Windows-Media-Player/10.00.00.3802 11 Sep 2006 - 21:35
Mozilla/5.0_(compatible;_heritrix/1.6.0__http://www.researcher.cz) 11 Sep 2006 - 11:38
Browsershots_URL_Check 11 Sep 2006 - 00:26
BDFetch 09 Sep 2006 - 20:40
nicebot 09 Sep 2006 - 13:54
Lynx/2.8.4pre.5_libwww-FM/2.14FM 09 Sep 2006 - 11:30
Java/1.4.1_03 09 Sep 2006 - 07:09
NutchCVS/0.7.1_(Nutch;_http://lucene.apache.org/nutch/bot.html;_nutch-agent@lucene.apache.org) 09 Sep 2006 - 03:43
Wells_Search_II 08 Sep 2006 - 19:25
Mozilla/4.0 08 Sep 2006 - 02:17
!Susie_(http://www.sync2it.com/susie) 07 Sep 2006 - 20:25
Mister_Pix_II_2.15 07 Sep 2006 - 18:28
Mozilla/3.0_(compatible) 07 Sep 2006 - 03:20
Mozilla/2.0_(compatible;_T-H-U-N-D-E-R-S-T-O-N-E) 06 Sep 2006 - 22:40
FDM_1.x 06 Sep 2006 - 14:14
NSPlayer/10.0.0.3650_WMFSDK/10.0 06 Sep 2006 - 13:27
IP*Works!_HTTP_Component_-_www.dev-soft.com,_Mozilla/4.0_(compatible;_MSIE_5.0) 06 Sep 2006 - 06:01
Mozilla/4.0_(compatible;_NaverBot/1.0;_nhnbot@naver.com) 06 Sep 2006 - 01:13
Windows-Media-Player/11.0.5705.5043 05 Sep 2006 - 23:22
Xenu_Link_Sleuth_1.2d 05 Sep 2006 - 07:45
Mozilla/4.0_(compatible;_Arachmo) 05 Sep 2006 - 06:49
Mozilla/3.0_(compatible;_TweakMASTER) 04 Sep 2006 - 23:58
Link_Valet_Online_1.1 04 Sep 2006 - 17:48
DA_7.0 04 Sep 2006 - 16:00
page_verifier_http://www.securecomputing.com/goto/pv 03 Sep 2006 - 22:55
www.adressendeutschland.de 03 Sep 2006 - 15:22
findlinks/1.1.3-beta8_(_http://wortschatz.uni-leipzig.de/findlinks/) 03 Sep 2006 - 11:21
Wget/1.9.1 03 Sep 2006 - 07:00
iSearch/2.16 03 Sep 2006 - 00:55
Lynx/2.8.5rel.4_libwww-FM/2.14 02 Sep 2006 - 07:43
Mozilla/5.0_(compatible;_heritrix/1.8.0__http://wiki.office.aol.com/wiki/SEO) 01 Sep 2006 - 16:13
NutchCVS/0.8-dev_(Nutch;_http://lucene.apache.org/nutch/bot.html;_nutch-agent@lucene.apache.org) 01 Sep 2006 - 15:20
IlseBot/1.0 01 Sep 2006 - 14:54
Xenu's_Link_Sleuth_1.1c 01 Sep 2006 - 11:43
Mozilla/5.0_(000000000;_0;_000_000_00_0_000000;_00000;_0000000000)_00000000000000_000000000000000 01 Sep 2006 - 00:21
WebSavvy
27-09-2006, 10:21/10:21AM
You're getting hit by a lot of the same spam crap/bad bot crap, as I am.
This one: Firefox_kastaneta03
is a spam bot.
Add that to bad_pass
SetEnvIfNoCase User-agent "^Firefox_kastaneta" bad_pass
I have no idea what the name of your script is, or what UA it might possibly have. Maybe at that script author's forums they can let you know this?
Just ask them if the script has a UA or some type of agent string identifier?
If it does, and they let you know, post it here and I'll post how to add it to good_pass if you're not sure.
:)
jfrovich
27-09-2006, 10:37/10:37AM
thanks will do
added the spam bot
i posted asking him and posted the bots as well
and man this IP is hamering my forum
69.88.74.194
Pages 14075
Hits 14075
MB 478.13 MB
26 Sep 2006 - 23:54
this is what my sever is showing
[client 69.88.74.194] client denied by server configuration: /home/support/public_html/403.shtml
[Wed Sep 27 10:49:05 2006] [error] [client 69.88.74.194] client denied by server configuration: /home/support/public_html/forum/member.php
jfrovich
27-09-2006, 11:09/11:09AM
shoot
i broke this site also hosted on my ip
http://www.scarfaceblog.com
none of the video's work
i also get this
client denied by server configuration: /home/support/public_html/videos/ebaysong.wmv
will check the logs
WebSavvy
27-09-2006, 12:10/12:10PM
From your logs you posted this:
Forum_Poster_-_fp.icontool.com 26 Sep 2006 - 08:24
It's probably what's hammering your forum. Add that to bad_pass
SetEnvIfNoCase User-agent "Forum_Poster" bad_pass
If you added the image hotlinking bit to .htaccess you need to add the .wmv format
edit that section to reflect the following:
SetEnvIfNoCase Referer "domain\.com" local_ref=1
<FilesMatch "\.(gif|jpg|png|swf|wmv)$">
Order Allow,Deny
Allow from env=local_ref
</FilesMatch>
Change domain.com to your own domain
jfrovich
27-09-2006, 12:59/12:59PM
Thanks Deb
I dont have image hotlinking on as it also broke that site...
if i remove the code you provided the video's work..
ill go test it now after adding what you recommended
jfrovich
27-09-2006, 15:10/03:10PM
Just got this back from the guy
Something you put in your .htaccess file is blocking the script from being included. You have an insert on your page to display review_insert.php. Since the script works when not using the include that means that your .htaccess file is blocking the include stream.
The script does not use "UA or some type of agent string identifier".
Connie
27-09-2006, 15:20/03:20PM
I think you have an error somewhere in the .htaccess file.
For stopping hot linking you might want to try this.
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?domain.com.*$ [NC]
RewriteCond %{HTTP_REFERER} !^https://secure.domain.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?IP address(/)?.*$ [NC]
RewriteRule \.(gif|jpg|jpeg|png|bmp|tif|zip)$ - [F,NC]
Of course make changes to apply to your site, and add any files to the blocked list that pertain to your site, such as your video file.
This may not be the best way but it works for me.
WebSavvy
27-09-2006, 15:47/03:47PM
Jason, when Bill suggested to add the following:
SetEnvIfNoCase User-agent "script" bad_pass
SetEnvIfNoCase User-agent "a href" bad_pass
Did you add that to yours?
I didn't add that to mine at all. If you did add it, that's what's causing the problems.
Your script is review_script
which is a match for "script" in bad_pass
If it's there, remove that line from your .htaccess file and see if your script is working properly or not?
jfrovich
27-09-2006, 15:49/03:49PM
This is what was hitting my forum
yesterday..
Forum_Poster_-_fp.icontool.com
I found this
http://fp.icontool.com/
Forum Poster allow you to post any message you want to over 26000 forum boards in just minutes.
The current version can post to
1. phpBB Forum Boards from version 2.0.0 to 2.0.21 (http://www.phpbb.com/)
2. phpBB 3.0 "Olympus" Beta 2 (http://www.phpbb.com/development/)
3. Invision Power Board (http://www.invisionboard.com/)
4. Snitz Forums 2000 (http://forum.snitz.com/)
5. vBulletin 3 (http://www.vbulletin.com/)
Forum Poster automatically register a user with the username, e-mail and password you typed on the board. It login as the registered user on the board and then post it. All made automatically. With just one click! (Please browse our demo boards for posting)
Forum Poster support add, edit, delete forums URL, Import and export forum URL list.
Forum boards are an effective way to drive traffic to your site. With your posted Ads you can bring hundreds of new visitors to your site and increase your search
engine rankings which counts on link popularity like Google.
Please download the Trial version and see yourself. The trial version can make 10 complete postings.
but little did he know i changed the default signup page..
hehe
and now he's blocked..
jerks
WebSavvy
27-09-2006, 15:51/03:51PM
Yep, I posted that a few posts up and also the code to add to .htaccess to block the little s-ucker. :D
jfrovich
27-09-2006, 15:54/03:54PM
Originally posted by savvy1
Yep, I posted that a few posts up and also the code to add to .htaccess to block the little s-ucker. :D
ya thanks
i was wondering what that was.. dam spam software..
jfrovich
27-09-2006, 16:06/04:06PM
Originally posted by savvy1
Jason, when Bill suggested to add the following:
SetEnvIfNoCase User-agent "script" bad_pass
SetEnvIfNoCase User-agent "a href" bad_pass
Did you add that to yours?
I didn't add that to mine at all. If you did add it, that's what's causing the problems.
Your script is review_script
which is a match for "script" in bad_pass
If it's there, remove that line from your .htaccess file and see if your script is working properly or not?
Almost missed this post
Deb
all i added from YOUR list was
SetEnvIfNoCase User-agent "dogplie" good_pass
SetEnvIfNoCase User-agent "mamma" good_pass
SetEnvIfNoCase User-agent "lycos" good_pass
SetEnvIfNoCase User-agent "xenu" good_pass
# deny spammers
SetEnvIfNoCase User-agent "Indy" bad_pass
SetEnvIfNoCase User-agent "^Firefox_kastaneta" bad_pass
SetEnvIfNoCase User-agent "Forum_Poster" bad_pass
so no i didnt add why bill suggested
and why does it work in the /review/ folder
why should that matter
here is a working example
http://www.supportcave.com/review/index2.php?item_id=1
im confused:confused:
WebSavvy
27-09-2006, 16:46/04:46PM
Jason, I don't know. There's no way for me to know unless I knew what all else was in your .htaccess file (for security reasons do not post the contents to this thread).
Some servers act differently than others. For example, I set the cms software up on Connie's site, Glo's site, grungee's site, and yours.
On all of them, mod_rewrite works using the standard rewrite format except on Connie's domain.
He has other rewrite rules in there that's needed for his blog software and there's a conflict between that and the other mod_rewrite for the cms.
So, no matter what ... the cms mod_rewrite can't and won't work because of the other rewrite (which is a different format) for his blog software.
These things are quite common. What works on one server might not work on another depending on what modules are installed on the server, depending on how mod_rewrite is installed (e.g., as a cgi) and depending on what other rules live in a .htaccess file.
Were you having any troubles with your scripts when you were using the blacklist method (from the original bad bots thread)?
If not, then I'd suggest just going with the old format instead of using a whitelist, at least until we can figure out the problem.
I'm not a "code guru" ... I know only enough to be dangerous. ;)
Maybe Bill might be able to shed some light on some of this?
Glo
27-09-2006, 18:59/06:59PM
This info is priceless but I have a question about feeds. If I wanted to allow feed bot from say Yahoo do I need to add it to:
# allow Yahoo
SetEnvIfNoCase User-agent "Slurp" good_pass
SetEnvIfNoCase User-agent "Yahoo" good_pass
SetEnvIfNoCase User-agent "MMCrawler" good_pass
SetEnvIfNoCase User-agent "YahooFeedSeeker" good_pass
Or do I need to do it another way?
WebSavvy
27-09-2006, 19:14/07:14PM
Yep, that's exactly how you'd do it Glo. Then watch your logs over a few days and make changes if necessary to allow in or push out, something you hadn't noticed.
So far, it seems to work pretty well for me with regard to my site. I still am finding one or two bots that are getting through and I plan to go over my logs and find what's needed to block them completely.
The only thing I'd added that I've now removed is the image hotlinking.
It does allow users to your site to see your images as long as they're not using Opera.
I looked at my site using Opera and every one of my images were missing on every page. Took that out of .htaccess and then looked again using Opera, and they were all back.
So for now, I'm leaving that out. There might be a better way of blocking it than the codes I posted above which I found on the linux site itself.
Glo
27-09-2006, 19:28/07:28PM
Thanks, Deb!
I have a php script that prevents hotlinking very well - will even allow google to cash your site if that's what you want. I'll send it to you if you want it, or tell you where to get it.
WebSavvy
27-09-2006, 19:39/07:39PM
Hey thanks! That'd be great. I'll PM you my email address and you can send it to me if you would?
Thanks!
IncrediBILL
27-09-2006, 19:43/07:43PM
Yes, you'd need to allow Xenu if you want it to have access. I don't have it allowed on mine because I don't use it but others who have it are crawling my site with it and that's BS!
Yup, I hacked a copy of Xenu and replaced the User Agent name in the binary file with something like "IncrediBILL" so I could let it in without letting other Xenu's on my server ;)
From looking over some posts, and rememer I said my code was a SAMPLE and I don't use Googles' Sitemaps, WAY too complicated with my dynamic site figuring out what to submit, you probably want to add the following to your list:
Let in a a minimum...
Feedfetcher-Google
YahooFeedSeeker
Google-Sitemaps
Don't be fooled by crap like Microsoft_URL_Control as it's garbage, nothing to do with MS.
Now you see why I do it this way, there's a TON of junk that just gets nailed but you do have to spend a week or two making sure your blog bots and readers you allow are getting into your site.
What I actually do is let EVERYTHING pass if they are looking for my .xml file but I forbid RSS readers and bots from pulling whole artcles off the site, which is important for controlling your content AND getting eyeballs on your ads :)
They can read the snippets anywhere but they have to click to come to my site to see the full page.
jfrovich
27-09-2006, 21:10/09:10PM
Originally posted by savvy1
Jason, I don't know. There's no way for me to know unless I knew what all else was in your .htaccess file (for security reasons do not post the contents to this thread).
Some servers act differently than others. For example, I set the cms software up on Connie's site, Glo's site, grungee's site, and yours.
On all of them, mod_rewrite works using the standard rewrite format except on Connie's domain.
He has other rewrite rules in there that's needed for his blog software and there's a conflict between that and the other mod_rewrite for the cms.
So, no matter what ... the cms mod_rewrite can't and won't work because of the other rewrite (which is a different format) for his blog software.
These things are quite common. What works on one server might not work on another depending on what modules are installed on the server, depending on how mod_rewrite is installed (e.g., as a cgi) and depending on what other rules live in a .htaccess file.
Were you having any troubles with your scripts when you were using the blacklist method (from the original bad bots thread)?
If not, then I'd suggest just going with the old format instead of using a whitelist, at least until we can figure out the problem.
I'm not a "code guru" ... I know only enough to be dangerous. ;)
Maybe Bill might be able to shed some light on some of this?
Deb
I Removed every thing from my .htaccess file except the code you provided and my review and video site are broken..
i hate to remove it, but untill i figure it out, i have no choice, and i dont use any other kind of block list, this was my first.
WebSavvy
27-09-2006, 21:41/09:41PM
Yeah, there might be something about it that's not compatible with some of your scripts.
Bill uses a php script to do his with Vs using .htaccess
You can do basically the same thing using PHP instead of .htaccess if you'd rather do it that way?
I can write something up a little later tonight and post it to this thread for you after I've tested it out to make sure it works.
jfrovich
28-09-2006, 00:04/12:04AM
if you have the time
for sure
thanks
g1smd
28-09-2006, 06:24/06:24AM
>> I Removed every thing from my .htaccess file except the code you provided and my review and video site are broken.. <<
I hope you don't mean everything - without checking what all the other code was actually for?
jfrovich
28-09-2006, 09:37/09:37AM
Originally posted by g1smd
>> I Removed every thing from my .htaccess file except the code you provided and my review and video site are broken.. <<
I hope you don't mean everything - without checking what all the other code was actually for?
well i just removed everyting for a few minutes to test, i put it all back..
i have several backup files and kept the orginal content in a open notepad file..
and i know what all the code is for
i have a few lines to redirect non www to www, direct index.html to www
and a few other redirects
nothing to complicated..
I also want to make sure it wasnt the other code that broke the script and video's
jfrovich
02-10-2006, 13:43/01:43PM
Originally posted by IncrediBILL
Remember, I posted a shell to start your own file ;)
You would need to add a line for Lynx but the gibberish user agent strings, as well as blank user agents, would bounce off your site harder than a rubber ball dropped from a 747 at 30K feet.
IncrediBILL how would i create a file to do this and then i just added the file to my footer use a php include to add it to every page?
I didnt have much luck with the .htaccess file, i was hoping to try your method
jfrovich
05-10-2006, 02:28/02:28AM
im going to try the bad bots list
as it doesnt break my php review script..
is that list updated?
WebSavvy
05-10-2006, 02:34/02:34AM
Yeah, it's updated Jason.
In the thread itself, there's two or three other bots listed, just add those to the main one.
You can remove/add whatever you wish to. The ones in the list were just from my personal block file.
btw ... has anyone seen geo3 lately? He kind of just disappeared. Wonder if he's OK?
jfrovich
05-10-2006, 23:15/11:15PM
this might be of interest to all of us who want to block bad bots
http://www.closetnoc.org/tinytrap/
not sure if the project is still going
the dates are old
WebSavvy
05-10-2006, 23:22/11:22PM
Thanks, Jas. I'll have to take a look at it (probably later tonight).
I'm soooo swamped. :(
IncrediBILL
05-10-2006, 23:43/11:43PM
What might be breaking your video is the PUT portion of the sample code.
Try chaning to this:
<Limit GET POST HEAD>
order allow,deny
allow from env=good_pass
deny from env=bad_pass
</Limit>
That might fix the problem, unless the code downloading the video uses a user agent not in the list.
Check your log files or the error log and see if anything obvious shows up.
jfrovich
05-10-2006, 23:57/11:57PM
Thanks Bill
didnt help
still kills the video's ( i told my bro to move them to google video) so this shouldnt matter.
But it still kills my php reviews
the page with the review shows this
Warning: main(http://www.supportcave.com/review/review_insert.php?item_id=42): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in /home/support/public_html/anti_virus/virus-antivirus.html on line 53
lots of these
ter(http://www.supportcave.com/footer.php): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n in /home/support/public_html/review/body.php on line 58
[Thu Oct 5 23:55:12 2006] [error] [client 72.29.76.101] client denied by server configuration: /home/support/public_html/403.shtml
[Thu Oct 5 23:55:12 2006] [error] [client 72.29.76.101] client denied by server configuration: /home/support/public_html/footer.php
[Thu Oct 5 23:55:12 2006] [error] PHP Warning: bodyfooter(http://www.supportcave.com/footer.php): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n in /home/support/public_html/review/body.php on line 58
[Thu Oct 5 23:55:12 2006] [error] [client 72.29.76.101] client denied by server configuration: /home/support/public_html/403.shtml
[Thu Oct 5 23:55:12 2006] [error] [client 72.29.76.101] client denied by server configuration: /home/support/public_html/footer.php
[Thu Oct 5 23:55:12 2006] [error] PHP Warning: bodyfooter(): Failed opening 'http://www.supportcave.com/footer.php' for inclusion (include_path='.') in /home/support/public_html/review/body.php on line 46
[Thu Oct 5 23:55:12 2006] [error] PHP Warning: bodyfooter(http://www.supportcave.com/footer.php): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n in /home/support/public_html/review/body.php on line 46
[Thu Oct 5 23:55:12 2006] [error] [client 72.29.76.101] client denied by server configuration: /home/support/public_html/403.shtml
[Thu Oct 5 23:55:12 2006] [error] [client 72.29.76.101] client denied by server configuration: /home/support/public_html/footer.php
[Thu Oct 5 23:55:12 2006] [error] PHP Warning: bodyfooter(http://www.supportcave.com/footer.php): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n in /home/support/public_html/review/body.php on line 46
[Thu Oct 5 23:55:12 2006] [error] [client 72.29.76.101] client denied by server configuration: /home/support/public_html/403.shtml
[Thu Oct 5 23:55:12 2006] [error] [client 72.29.76.101] client denied by server configuration: /home/support/public_html/footer.php
[Thu Oct 5 23:55:12 2006] [error] PHP Warning: bodyheader(): Failed opening 'http://www.supportcave.com/t-menu.php' for inclusion (include_path='.') in /home/support/public_html/review/body.php on line 35
[Thu Oct 5 23:55:12 2006] [error] PHP Warning: bodyheader(http://www.supportcave.com/t-menu.php): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n in /home/support/public_html/review/body.php on line 35
[Thu Oct 5 23:55:12 2006] [error] [client 72.29.76.101] client denied by server configuration: /home/support/public_html/403.shtml
[Thu Oct 5 23:55:12 2006] [error] [client 72.29.76.101] client denied by server configuration: /home/support/public_html/t-menu.php
[Thu Oct 5 23:55:12 2006] [error] PHP Warning: bodyheader(http://www.supportcave.com/t-menu.php): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n in /home/support/public_html/review/body.php on line 35
i still dont understand WHY the reviews work in this folder
http://www.supportcave.com/review/index2.php?item_id=34&PHPSESSID=56a8e36a6a6b82c7b7bedf5f010106e5&PHPSESSID=56a8e36a6a6b82c7b7bedf5f010106e5
but are broken in all others
I so want this to work
IncrediBILL
05-10-2006, 23:58/11:58PM
What's the user agent name?
Is it showing anything at all?
Of course this is why I wrote my own, .htaccess is just to damn annoying to debug.
jfrovich
06-10-2006, 00:01/12:01AM
Originally posted by IncrediBILL
What's the user agent name?
Is it showing anything at all?
Of course this is why I wrote my own, .htaccess is just to damn annoying to debug.
im checking
but this is what the programmer is telling me
Something you put in your .htaccess file is blocking the script from being included. You have an insert on your page to display review_insert.php. Since the script works when not using the include that means that your .htaccess file is blocking the include stream.
The script does not use "UA or some type of agent string identifier".
nothing shows up on
Unknown OS (useragent field)
Unknown browsers (useragent field)
its still odd it working in the install folder /review/
IncrediBILL
06-10-2006, 00:08/12:08AM
BTW, you can always try this anti-spam approach:
http://www.blackholenews.com/Forum/index.php
jfrovich
06-10-2006, 00:10/12:10AM
is thier a better way to parse php
When i have the link the FULL URL
<?php include("http://www.supportcave.com/t-menu.php"); ?>
i get this
Warning: main(http://www.supportcave.com/t-menu.php): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in
but it works fine coded this way
<?php include("t-menu.php"); ?>
this might be the issue
bill i get this
For all you spammers...You are the reason the forum has shutdown. Take your viagra and stick it up your ass.
and register doesnt work
WebSavvy
06-10-2006, 00:10/12:10AM
Jason, look at this URI string you posted (the one where it works)
http://www.supportcave.com/review/index2.php?item_id=34&PHPSESSID=56a8e36a6a6b82c7b7bedf5f010106e5&PHPSESSID=56a8e36a6a6b82c7b7bedf5f010106e5
It contains double (e.g., TWO) php session ids!
It's not supposed to do that! Looks like there's some bad wrapping going on with that script. Without seeing the actual code behind it all I can do is guess.
See if there's a session_start(); variable in there before the headers are sent.
Make sure it's only in there once and not twice.
jfrovich
06-10-2006, 00:12/12:12AM
no idea what that means
did you see the post above yours
is that possible
WebSavvy
06-10-2006, 00:15/12:15AM
Jason my dear ... lol ... it's because the way you're doing your includes. :D
Try this instead:
<?php
@include("/home/path/to/t-menu.php");
?>
Change /home/path/to/
To your actual path from root, to the file to be included.
You can't add the "http://" into the include when it's local.
If you're including a file from a different server, it works -- but not when they're living on the same site.
jfrovich
06-10-2006, 00:25/12:25AM
Originally posted by savvy1
Jason my dear ... lol ... it's because the way you're doing your includes. :D
Try this instead:
<?php
@include("/home/path/to/t-menu.php");
?>
Change /home/path/to/
To your actual path from root, to the file to be included.
You can't add the "http://" into the include when it's local.
If you're including a file from a different server, it works -- but not when they're living on the same site.
COol
im a pc tech and a half hack and this stuff..
i got my menu to work with your code
im going to try the review script...
and it doesnt work
http://www.supportcave.com/spyware/spyware_removal/spy-sweeper.html
no errors but no script
ill take this to the programmer
thanks
WebSavvy
06-10-2006, 00:32/12:32AM
Take the at sign @ off the front of include
e.g., use
include
Vs using
@include
and see what error it gives you on that one.
Using @include will supress errors so that way if your script kacks on you, scrapers/hackers/snoops
won't see your full path from root (security danger!)
jfrovich
06-10-2006, 00:39/12:39AM
i get this
Warning: main(/home/support/public_html/review/review_insert.php?item_id=1): failed to open stream: No such file or directory in /home/support/public_html/spyware/spyware_removal/spy-sweeper.html on line 69
Warning: main(/home/support/public_html/review/review_insert.php?item_id=1): failed to open stream: No such file or directory in /home/support/public_html/spyware/spyware_removal/spy-sweeper.html on line 69
Warning: main(): Failed opening '/home/support/public_html/review/review_insert.php?item_id=1' for inclusion (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/support/public_html/spyware/spyware_removal/spy-sweeper.html on line 69
jfrovich
06-10-2006, 00:41/12:41AM
as a side note how do i allow
validator.w3.org
I now get this error
I got the following unexpected response when trying to retrieve <http://www.supportcave.com/>:
403 Forbidden
sorry for being such a pain in the a:confused: s
WebSavvy
06-10-2006, 00:51/12:51AM
Change the single quotes to double quotes. Single quotes should never be used in a URI or variable path. The programmer should know this!
OK, it looks like there's a mis-matched path here:
you have:
home | support | public_html | review
and also:
home | support | public_html | spyware | spyware_removal
Not sure w/o seeing the actual backend code. I could take a look for you and see if I can locate the problem, but I won't be able to do that for at least a week.
I'm so packed under with work right now there's just zero free time for anything.
RE: validator.org
Look in CPanel (last 300 visitors)
Find the reference for validator.org there
It should give you the UA
add that UA to good_pass and then it should be fine.
Jason, you're not a pain in the @$$
You're simply asking for help, which is what these forums are here for.
If any one of us had problems with offering help, we wouldn't be members here to begin with.
So don't worry. Ask as many questions as you need to, and as often as you need to. It's how we all learn and grow.
jfrovich
06-10-2006, 01:09/01:09AM
thanks
the latest visitors has a lot of cool stuff in it
i now see why my forum is showing a TON of hits
this AWSM Bot 1-4 is hitting it like crazy..
is from the forum poster with this ip
69.88.74.194
this is where the php scripts is installed
home | support | public_html | review
this is one of the folders where the script is running
home | support | public_html | spyware | spyware_removal
WebSavvy
06-10-2006, 01:22/01:22AM
That one's actively set up on supportcave right now, right?
If you give me a few days, I can probably find the error and fix it for you. It's probably something pretty simple and shouldn't take long.
You can add this line to the top of the page and it'll help with debugging:
error_reporting(E_ALL ^ E_NOTICE);
I'm exhausted. It's been a very long day and I've had migraine now for 3 days. So, at this point ... I'm becoming more and more unclear and I need to go get some sleep.
I'll pop in tomorrow Jas, and check up on this. I'm sure we'll be able to get it working.
:)
jfrovich
06-10-2006, 21:37/09:37PM
thanks
im just going to have to stick to a block list..
jfrovich
07-10-2006, 09:54/09:54AM
WOW im AMAZED at how well the block list is working..
my visits and unigues are about the same
but my
page loads/views are down 6-10 TIMES
hits are cut in HALF
not sure if thats a big deal, but it will help me better analyze my stats.
As all these bots skew the true infomation.
Thanks for all your help every one.
ps is thier no way to make these bots stop?
or make them think my site is gone or block them so they dont showup in my server logs?
WebSavvy
07-10-2006, 13:36/01:36PM
There's no way to make them go away. Once they know your site is alive they will always try to gain access because they think there's something there to steal.
They won't stop unless your site is dead.
They'll always show up in your log files as long as they're making "attempts" ... and you need to know about that. If one actually "slips through the cracks" it's the only way you're going to catch that it has.
So, it's both a pro and a con. Your number of reported 403 errors is going to skyrocket. Mine has too. In fact, mine is at about 83%
We went down by 2+ GIG BW just over a 10 day period.
That's really A LOT!
jfrovich
28-01-2007, 10:30/10:30AM
Since ive changed the site
and no longed use that review script
im going to use this block list again
Im already using it on my family website to block everything but the user..
will try it on supportcave again.
jfrovich
02-09-2007, 22:56/10:56PM
Wow Its been a while
But why my new website design
Ive gone back to the suggested way of blocking bad bots in this thread..
Guess ill monitor it and see how well it works..
Connie
03-09-2007, 19:45/07:45PM
Jason have you installed a bot trap? That won't catch all the bad guys, but it will catch a lot.
Yes it would be based on the black list or deny method that you have used before.
A bot trap would automatically ban bots that follow certain links on your web page.
I have written a little about bot traps (http://www.spam-whackers.com/blog/2007/07/23/googlebot-banned/). This is the
bot trap (http://danielwebb.us/software/bot-trap/) I use right now. It's fairly simple to install and works with html or php sites.
For a site that is totally reliant on .htaccess you can not have a 100% optin list. Bill does a lot with scripts he has written that most of us have to do with .htaccess.
What Deb posted was a good compromise for those of us totally reliant on .htaccess to what Bill was talking about in regard to optin only.
Most of us will never get rid of a deny or black list.
IMHO what Deb posted (based on Bill's original suggestion of optin only) was a way to cut down on the size of your black list.
jfrovich
04-09-2007, 00:06/12:06AM
HI Connie
Thanks but i cant use that my CMS doesn't allow ErrorDocument in the .htaccess to work, or some how ignores them.
I cant find a way around this..
If i do then i can use this..
Connie
04-09-2007, 00:36/12:36AM
All I did was add ErrorDocument 403 /bot-trap/forbid.php to the .htaccess file. Works for me on my html sites, and my WP blogs, which I think are CMS.
What CMS are you using?
WebSavvy
04-09-2007, 08:13/08:13AM
That's because you're issuing the wrong HTTP response.
ErrorDocument = 404
Forbidden = 403
Try changing that around and see if it works for you, Jas?
jfrovich
04-09-2007, 16:36/04:36PM
Originally posted by WebSavvy
That's because you're issuing the wrong HTTP response.
ErrorDocument = 404
Forbidden = 403
Try changing that around and see if it works for you, Jas?
joomla seems to completely ignore that command, so i had to really hack it to get a pathetic 404 page.
But ya no idea why it doesn't work, i did search for a fix a few months ago, none
Connie
04-09-2007, 18:17/06:17PM
Without knowing how you installed joomla I'm not sure what is going on, but I would think joomla would work similar to WP.
Did you install joomla directly into your public html directory or did you install it in a folder within the public html directory.
Joomla should not really have anything to do with your 403 error files.
The sever not joomla should be the one sending the 403 error file, and the directions I provided should tell the server to send that error to the custom error page rather than the generic 403 error page which will basically say something like Forbidden.
jfrovich
04-09-2007, 20:53/08:53PM
Connie
I have used custom 404's in the .htaccess file for years with basic html & php websites.
But Joomla just wont work with it.
I just tired it on 3 joomla sites
Joomla 1.5 beta , at least has a better 404 page then joomla 1
and the install is always in the root folder
Friendly Bruno
05-11-2007, 11:54/11:54AM
Hi there,
I just read the posts from 25/09/2006, where WebSavvy shows an example of a very good .htaccess file.
This code whitelists some visitors and blacklists some other visitors.
I would kindly like to ask this to WebSavvy: in your case, what happens to the visitors which are not whitelisted but also not blacklisted?
Thanks,
Friendly Bruno
WebSavvy
05-11-2007, 12:37/12:37PM
All browsers (user agents) are whitelisted (allowed access by default) with exception of those that are specifically denoted as blacklisted -- so there aren't any visitors that fall between the cracks.
I've been using this for almost a year now, and so far haven't had any problems. You have to monitor your logs though, just to make sure that there isn't something you're blocking that you didn't intend to.
g1smd
05-11-2007, 15:00/03:00PM
The code you posted on 2006-09-25 allows only the listed User Agents specified in the file to gain access, and blacklists everything else.
Connie
05-11-2007, 15:03/03:03PM
The code you posted on 2006-09-25 allows only the listed User Agents specified in the file to gain access, and blacklists everything else.
I think that is the idea.
g1smd
05-11-2007, 15:55/03:55PM
Sure. I was replying to Debbie who said:
All browsers (user agents) are whitelisted (allowed access by default) with exception of those that are specifically denoted as blacklisted.
Friendly Bruno
05-11-2007, 15:59/03:59PM
That is good to know, WebSavvy. I do understand that was the idea, Connie, thank you very much.
I asked the question because I figured the server could also interpret the .htaccess file precisely the other way around:
Not all bad visitors are blacklisted (denied by default) with exception of those that are denoted as whitelisted -- if the sever interprets the .htaccess file this way, then some visitors which are not blacklisted could fall between the cracks as well.
But as you already said, it does not happens to you.
IncrediBill wrote he uses .htaccess to whitelist and scripts to blacklist (please correct me if I understood him wrongly). WebSavvy managed to do both only with .htaccess and I am amazed by that as I do not understand why it works.
Thank you, WebSavvy.
WebSavvy
05-11-2007, 16:33/04:33PM
Browsers that aren't IE are usually Mozilla based. Therefore by adding allow Mozilla like this "Mozilla"
it means anything with mozilla in it (a match) is allowed to pass through.
If you begin the rule like this ^Mozilla
it would mean it must START with Mozilla in order to be a match and allowed in.
By using it just inside quotes, it allows for a "match" to happen anywhere in the string.
You can use just .htaccess to accomplish this. Bill uses .htaccess and scripts because he wants to. His scripts are a bit more intense and he actually has programs that he's written to keep bad bots out of his sites.
A few years ago I used to use .htaccess and a script I wrote because of the type of bots my site was getting hit by. They were rank checking bots and weren't passing a user-agent string so I had to devise other ways to prevent them from hammering my server.
Connie
05-11-2007, 16:48/04:48PM
By the way Deb something Ive been meaning to ask about. I can't use this on Condells. It causes problems with Authorize.net. In short when using this and a order is placed the customer gets a 403 error page from Authorize.net rather than my order confirmation page.
The actual order is approved, but Authorize.net can not post that information back to Condells which results in the error page.
extta4f.authorize.net - - [30/Sep/2007:08:10:35 -0400] "POST /cgi-bin/checkout.pl HTTP/1.1" 403 2500 "-" "-"
Other blank referrers do not cause a problem, and my other shopping cart pages which also use Post do not cause a problem.
Any thoughts.
WebSavvy
05-11-2007, 17:07/05:07PM
Probably because the user-agent string being passed in this is "authorize.net" and you don't have that user-agent string set as "allowed" in your file.
Add it to good_pass and use it inside quotes like this "authorize.net" which will allow it to match anywhere in the string and will grant access.
That should fix it. :)
Connie
05-11-2007, 18:36/06:36PM
Nope just tried it. Still get the same error.
An error occurred while trying to report this transaction to the merchant. An e-mail has been sent to the merchant informing them of the error. The following is the result of the attempt to charge your credit card.
This transaction has been approved.
It is advisable for you to contact the merchant to verify that you will receive the product or service.
At this point the page the customer should get is a page that shows a summary of the order. A receipt, confirmation page.
Where do you see Authorize.net as the user agent? It appears to me the user agent is blank. Then again there is a lot I'm still learning about all this.
WebSavvy
05-11-2007, 18:39/06:39PM
Connie, I was taking that from this part: extta4f.authorize.net
Sorry, but I thought you posted that as that being the "user-agent" ... ignore me. I've had a migraine since yesterday so am explaning things less than clear today.
I might well leave this alone until tomorrow when I can think.
Maybe Bill might catch this and have some suggestions? :)
ihelpyou
05-11-2007, 18:43/06:43PM
Uncle Bill Uncle Bill?
Connie
05-11-2007, 20:08/08:08PM
Sorry, but I thought you posted that as that being the "user-agent" ... ignore me. I've had a migraine since yesterday so am explaning things less than clear today.
Thats the entire line from the logs. "extta4f.authorize.net" is where you would normally see a IP address. I suppose that is the referrer.
IncrediBILL
05-11-2007, 20:49/08:49PM
I cases where the user agent is blank you need to authorize by the IP range.
Personally, I'd write to authorize.net and ask them what kind of amateur hour operation they run when they can't even set the user agent to "authorize.net" so you know it's them!
Friendly Bruno
16-03-2008, 09:53/09:53AM
Hi there,
Since November 2007 I’ve be using my own adapted version of WebSavvy’s .htaccess file and, indeed, it works very well! Thank you all!
Now I am wanting to go a step further and build a coherent spider trap in order to block visitors using fake user agents. For months I have a question in my mind which is actually keeping me from even start working as if the answer to that question is “no”, then I can better not even start building it:
-> Are there behavioural differences between a bad spider and an innocent user’s web accelerator?
If the answer is “yes”, then:
-> I don’t want web accelerators clicking everywhere. Are there ways to block people’s web accelerators without keeping those people from visiting my website?
As far as possible, could you guys help me with some “yes” or “no” answers and some links as reference?
Thanks in advance,
Bruno
WebSavvy
16-03-2008, 14:26/02:26PM
Hi Bruno,
IMO, I'd say there are differences between requests from regular users Vs requests from let's say, someone's "script."
I had one guy that kept requesting index.php on my server (which does NOT exist) and he was doing it 1000s of times a day. For what purpose, I have no clue -- but the guy was still doing it.
I had to end up blocking that one by IP range because he kept switching user-agent strings with every request, though they all kept the same IP address. This was definitely a script being ran from someone's site.
I found a really nice site that posts a lot of .htaccess examples:
http://www.askapache.com/
There's some code examples to prevent prefetching from X-MOZILLA browsers and then adds another snippet to allow FF in.
I haven't tested that one yet, myself -- though IncrediBill uses (and has used for a long time) prefetching prevention on his sites.
Note that some of the code samples in that askapache site cause 500 errors on some servers and in some cases, the code simply does not work.
Not all servers are set up the same, so what might not work on my server, might work on yours.
kensplace
24-03-2008, 00:44/12:44AM
Argh 10 pages, after several drinks :)
Impossible task...
OK, it may have been said already, so apoligies if so, but you cant rely on user agents or IPs.
You can block KNOWN bad ones, but thats it, as anyone with half a brain can get a new ip, or fake the user agent.
Your on a never ending task trying to block the bad.
And you cant win trying to allow the good either, as the same applies, you can fake user agents (but not ip's as far as I know yet....)
Spend more time on content and monitoring, and block when something happens is my advice. But I am a little tipsy :_)
Friendly Bruno
28-03-2008, 09:50/09:50AM
Quote:"You're on a never ending task trying to block the bad (visitors)."
You may say that again! I am very glad I don't work for a bank hihihi ;)
---
Quote:"I found a really nice site that posts a lot of .htaccess examples:
http://www.askapache.com/"
Thanks again! :)
---
Kind regards,
Bruno
vBulletin® v3.8.3, Copyright ©2000-2010, Jelsoft Enterprises Ltd.