View Full Version : How many bad bots are there?
Forrest
21-06-2007, 20:17/08:17PM
17. No, wait, 171. Er, make that 1,771. I'm not looking for an actual number, just wondering how prevalent they are. Anyone who hasn't looked at IncrediBill's blog should, it's ... disturbing.
Google Analytics underreports by about 1/3 compared to Awstats, and people are suggesting a lot of the difference should be chalked up to bots that my server doesn't recognize. I'm not sure if that's the case, but it probably makes up some of the mystery traffic.
Now I get lots of page "views" from Inktomi Slurp, MSNBot, GoogleBot, and all the rest, and I really can't think of much reason I'd see daily visits from bots cloaking themselves as browsers to get by my server logs. Except for the obvious, to grab content and cloak it to the SEs. I hadn't realized that this has you competing against yourself, and potentially being associated with spam, porn, illegal software, and all kinds of great sites. I don't mind if someone is writing a bot thinking they're going to invent the next great search engine, but I don't think I'm getting a couple hundred trial runs a day from student bots.
So, I wonder if other people see the same kind of discrepancy and think it really is due to bots? And how people deal with this? I know captchas and IP range blocks are really stylish, but I wonder if I should be looking more deeply into the server logs, searching for snippets of text I've written, or anything else like that?
And if I see a lot of referral spam from porn, sports, and gambling sites, should I take that as a sign of interest from these neighborhoods, and assume I'm probably being crawled by them? Or are the two unrelated?
Dave Hawley
21-06-2007, 21:07/09:07PM
How do you know that Google Analytics underreports and AwStats doesn't over report. For me both are pretty close.
IMO, what you describe is nothing to worry about. SE's wont ascociate you with other sites unless you link to them.
Connie
21-06-2007, 21:17/09:17PM
I really can't answer your question. I block around 350 by user agent or IP address.
I suspect that is just a drop in the bucket to what Bill blocks.
I don't find a lot of discrepancies between Awstats, Google Analytics, and a private tracking script I use. This is on my e commerce site.
Anytime you compare stats with different programs you will see some variations.
I don't use GA on spam whackers, but there is a lot of variation there between Awstats, and my private tracking script.
I suspect that is partly because Awstats records screen readers.
If your logs are outside your public directory, there is nothing to worry about in regard to referrer spam, but It will throw your visitor stats off some with Awstats.
WebSavvy
21-06-2007, 21:22/09:22PM
Connie, recording "screen readers" isn't a separate instance. Screen readers do not have an identifiable user-agent or "browser string" because they are browser dependent -- meaning they rely on whatever browser the user has installed on their computers.
It'd be nice if they did have a defined user-agent because it'd make accessibility a lot easier for webmasters as they could serve accessible pages/css to screen readers when needed and other visitors could have pages given normally (e.g., minus tabindex, accesskeys, skiplinks, etc.)
Forrest
21-06-2007, 21:23/09:23PM
Originally posted by Dave Hawley
How do you know that Google Analytics underreports and AwStats doesn't over report. For me both are pretty close.
You're right ... I don't know for a fact which of them is right, or even that either of them is. But since awstats is a front end for the server logs, and Google Analytics is a javascript, I would expect it to be 5 to 10 % lower. And maybe for some "age of data" issues to crop up ... my server takes 25 to 26 hours to update, so I'm forever looking at yesterday and a fraction of today.
If the two are pretty close on your end, that makes me wonder if I should be looking into more things...
Forrest
21-06-2007, 21:26/09:26PM
Originally posted by Connie
If your logs are outside your public directory, there is nothing to worry about in regard to referrer spam, but It will throw your visitor stats off some with Awstats.
The logs aren't published by default, and I've never been able to think of a good reason to change that. Up 'till now I've always thought the referral spam was a minor inconvenience, but since bandwidth is so cheap, not a real problem. Then I started wondering if it could be people looking for server vulnerabilities, or your site winding up on a spammer's list ... but it sounds like that's not the case.
ihelpyou
21-06-2007, 21:28/09:28PM
There isn't any such thing as "accurate" stats. It does not matter what you do as no two programs will yield the same. All are estimates with some giving you more info than others and some giving you the info you need per your requirements. To each his own. Find one you like and stick with it.
Heck; you hear owner say I get 10000 hits per day!!
What the hell does that mean? LOL It could mean anything. In reality though, that same owner is really getting about 50 unique visitors per day but doesn't know how to read his stats. :D
Dave Hawley
21-06-2007, 21:30/09:30PM
Forrest, are you positive you have the Google JavaScript properly added to ALL pages?
Forrest
21-06-2007, 21:39/09:39PM
Originally posted by ihelpyou
There isn't any such thing as "accurate" stats. It does not matter what you do as no two programs will yield the same. All are estimates with some giving you more info than others and some giving you the info you need per your requirements. To each his own. Find one you like and stick with it.
Well, Analytics gives me much more useful data than awstats, but they each have their place. But to the point, this is worrying. My background is in IT, specifically database and logic tier programming, with too much reporting... It makes sense that they wouldn't agree 100 %, but a person ought to be able to explain the difference until it's zero, right?
Originally posted by ihelpyou
Heck; you hear owner say I get 10000 hits per day!!
What the hell does that mean? LOL It could mean anything. In reality though, that same owner is really getting about 50 unique visitors per day but doesn't know how to read his stats. :D
Probably that s/he doesn't know how to read the stats ... it takes about 20 "hits" on average to render one page on my site. I've been seeing about 350 unique visitors a day, which I realize is pretty low, but that's what it is.
Forrest
21-06-2007, 21:40/09:40PM
Originally posted by Dave Hawley
Forrest, are you positive you have the Google JavaScript properly added to ALL pages?
For a while, I didn't, and that's what I thought would explain this. Now, I'm pretty sure the urchin call is at the bottom of all but two or three pages that probably aren't linked to anymore ... sort of died on the vine.
ihelpyou
21-06-2007, 21:48/09:48PM
No. You cannot reconcile two different stats programs that way. It's impossible. There are waaaaay too many variables involved and also server issues as well. One hit may not be counted on one or both as either one or both servers may have had a hitch at that second. ONE image may not load for some reason. One stats code may not load for a whole session. Too many variables to compare. Impossible. They will never equate to zero sum.
Connie
21-06-2007, 22:39/10:39PM
Originally posted by WebSavvy
Connie, recording "screen readers" isn't a separate instance. Screen readers do not have an identifiable user-agent or "browser string" because they are browser dependent -- meaning they rely on whatever browser the user has installed on their computers.
It'd be nice if they did have a defined user-agent because it'd make accessibility a lot easier for webmasters as they could serve accessible pages/css to screen readers when needed and other visitors could have pages given normally (e.g., minus tabindex, accesskeys, skiplinks, etc.)
Sorry to confuse you Deb. I'm refering to RSS feeds, and RSS readers.
WebSavvy
21-06-2007, 22:46/10:46PM
Ah, OK, well that makes better sense now ... LOL
Connie
22-06-2007, 00:58/12:58AM
I'm playing around with a bot trap. Since bots follow links what prevents them from following a link on the 403 error page?
If you provide a link on the 403 error page in case a real user ends up there, that will allow the real user to access the site. By clicking the link they have unbanned themselves.
What prevents a bad bot from following that link?
chrishirst
22-06-2007, 04:19/04:19AM
bots wouldn't necessarily "see" the page, depending on how they are programmed.
A 403 response should mean they simply turn & leave.
vBulletin® v3.7.3, Copyright ©2000-2009, Jelsoft Enterprises Ltd.