View Full Version : Finding out when Mr Google drops by
Proofreader
17-04-2002, 03:57/03:57AM
Hi all
What's the easiest, cheapest, quickest way to take a peep at my web logs, say daily, to see if/when the Google spider visited? What's the best gizmo to do that?
Many thanks
liketoseeyoutry
17-04-2002, 04:22/04:22AM
Using linux and apache i generally just type:
"fgrep googlebot name-of-access-log"
Easy :)
marcus-miller
17-04-2002, 05:47/05:47AM
Hi Proofreader,
I do it like this:
edit the crontab for the site that you are interested in so that it greps the logs, and sends you the results every day.
50 3 * * * cat /this/is/log/location | grep "sitename" | grep "Googlebot" | mail -s "Google Visits" marcus
the first bit cat's the results, the second bit greps for the site name so only get those details, the third bit greps for the good old google bot and finally it mails it to me.
oh, yeah, marcus is set up as a user for the site and therefore has an associated email address, you could just add your full address at the end.
If you check this one mail everyday you will see when googlebot has just popped by and when he has had a good look around.
( i think i must be reading this forum 2 much, starting to write about googlebot as if he,.. she,.. it were a person.. )
Hope this helps, mail me if you have a problem..
ihelpyou
17-04-2002, 08:23/08:23AM
( i think i must be reading this forum 2 much, starting to write about googlebot as if he,.. she,.. it were a person.. )
LOL. It seems she does become larger than life from time to time. Very good PR on her part. LOL
( PR )
lol
bigblock
17-04-2002, 14:21/02:21PM
You beat me to it MM :). I go like this:
grep googlebot /path/to/access_log > gbot.txt ;
mail -s "`date`" gbot@mysite.com < gbot.txt
Then I cron it to run daily. Only major difference is I have the current server date as my subject.
Question for you UNIX gurus: how can we get this to only send the googlebot hits for that particular day?
Maybe somehow tweak the output of `date` so that it matches the date format in the logfiles, store it in a variable, and then use it to grep with?
So that all daily googlebot hits would match something kinda like this:
`date`.*googlebot
So we could just grep out lines like that, and have them emailed to us?
Any ideas?
JellyBelly
17-04-2002, 16:01/04:01PM
I just download our logs every night and check them with Analog (http://www.analog.cx/). Analog is a free and highly configurable logfile analyzer.
bigblock
17-04-2002, 16:28/04:28PM
Yup -- Analog is a good tool, especially since it is free :).
I've used Analog, WebTrends, Wusage, and others. Summary blows them all away. There is simply no comparison, especially when you need to analyze for SEO purposes :D .
Proofreader
17-04-2002, 18:57/06:57PM
Hi BB
When you refer to "Summary" is that a program in its own right, or are you referring to a summary part of one of the others?
Fraid all this talk of crons and fregs and frogs and grips (or whatever the heck it is) might as well be in Swahili to me!
Thanks anyway :confused:
Alan Perkins
17-04-2002, 19:10/07:10PM
Originally posted by bigblock
how can we get this to only send the googlebot hits for that particular day?Rotate your logs daily. :)
You can then append your daily log to offline weekly, monthly, quarterly and annual log files (take your pick).
You can run your stats package on these individual log files to generate results for the periods they represent.
Blue
17-04-2002, 19:54/07:54PM
crons and fregs and frogs and grips LOL......got my laugh for the day.
Not laughing AT you, Nona, just laughing at your humor :D .
bigblock
17-04-2002, 20:23/08:23PM
You can then append your daily log to offline weekly, monthly, quarterly and annual log files
mmmmmm....there might be a less clunky solution (c'mon UNIX geeks -- kick down ;) )
My server date format looks like this:
Wed Apr 17 21:50:44 EDT 2002
My logfile date format looks like this:
07/Apr/2002:04:36:05 -0400
So if we can change the output of the `date` command to something like this: 07/Apr , and store it in something called $date or whatever, then we could grep with that value, like this:
$date.*Googlebot
and only send the matching lines every day. Wish I knew more UNIX. That should do it though, right?
marcus-miller
18-04-2002, 10:02/10:02AM
The easiest way I can think of BB is to rotate the logs. I am fairly new to the linux way of things (3 months ish) but learning fast. (hard, painfully but still fairly fast) :)
setup your logrotate as daily cron job then your above grep'ed details will only be a couple of lines long unless the googlebot has had a good old look around.
Alan Perkins
18-04-2002, 14:17/02:17PM
Originally posted by bigblock
there might be a less clunky solutionHi bigblock
Not sure what version of Unix you have, but mine supports date formatting. This works for me:
grep `date '+%d/%b'` /path/to/access_log | grep googlebot
Make sure you get those single quotes right!
bigblock
18-04-2002, 14:25/02:25PM
mmmm....what about people on a shared box? They don't have control over when their logs are rotated.
I think that having a cleaner grep command is preferable to rotating the logs. Bleh -- I hate these situations -- when my lack of knowledge forces me to compromise and do the "dirty" solution.
Measuring the number of the grepped-out lines before sending them might not be a bad idea, as you said, if google has a good look around. I usually only get a few pages pulled per day, but last spidering, during a deep-crawl, she pulled over 5,000 pages per day. Maybe gzip the grepped lines if there are over a certain amount of them? Or don't send them altogether, in favor of an email saying that the file was too large.
Alan Perkins
18-04-2002, 14:40/02:40PM
mmmmm ... doughnuts.
Seems like we cross posted, bigblock (see above for the "clean grep"). But if you have access to the shell, cron and the ability to create your own files, that's all you need for your own log rotator.
bigblock
18-04-2002, 15:28/03:28PM
d'oh!
Thanks Alan -- you tha man :).
Proofreader
19-04-2002, 07:12/07:12AM
Hi BB
WRT Summary, what do you suggest are the best reports to get used to for my purposes? Yes, I actually got it to work!
Thanks
bigblock
19-04-2002, 15:50/03:50PM
Proofreader-- nice job. Summary has a significant learning curve, a good portion of which is overcome when you get it installed :).
Here's what helps me the most:
1. Goal and value reports. If you can, establish relevant dollar values to certain parts of your site. If the dollar values would just be arbitrary, then establish goals. A newsletter "thank you for subscribing" page is an excellent example of a goal. This will give you relative values for visitors from each referrer. Understanding the value of a visitor as it differs from each referrer is crucial to me.
2. Crunch, configure, and re-crunch the referrer reports until you are blue in the face. You'll notice that every single report is separately configurable via the little wrench graphic in the upper right. Same for the search engine report. The "search engines by keyword" report is especially useful. Since I am a total nutcase, I check my "daily referrers by search engine" every night. I also like to track the "referrers by search engine" over time, to see how my optimization efforts are working.
3. If you have CGI on your site, use the CGI reports. Very useful. Also, make sure you add ".cgi" as a legitimate suffix for a pageview.
4. Pore over the docs. They are superb. Everything is in there.
5. Many of the reports are searchable. Very useful.
6. Sub-reports. Sub-reports. Sub-reports.
7. Sub-reports.
8. Log-crunching is addictive. Be careful. Especially as it's something that is traditionally done at the end of a day. 8 PM turns into 3 AM mighty quick for me.
Don't forget to laugh at all the chumps who keep saying that "When it comes to log analyzing, WebTrends is the standard."
bigblock
19-04-2002, 18:40/06:40PM
I forgot this. Here's how to set up a separate sub-report for google traffic.
From the main overview screen:
Configure==>Add New Sub-Report
name it Google Visitors or whatever.
Then, go back to that main configuration screen. Go to:
Edit Settings for Google Visitors==>Configure:Filtering==>Visit initiating referrers to ignore
In the text box, enter:
+*google*
*
Then save your settings and re-crunch the logs. You'll get your main, regular analysis, and you'll get a separate analysis for all your google traffic. You can do this for every search engine :thebomb:.
Proofreader
19-04-2002, 23:27/11:27PM
Thanks so much BB. I think I have a spare year lurking somewhere in this weekend. :hi:
ihelpyou
05-07-2002, 10:32/10:32AM
**bump**
scottiecl
23-07-2002, 08:23/08:23AM
I missed the link to the Summary log analyzer program. Where can it be found?
vBulletin® v3.8.3, Copyright ©2000-2010, Jelsoft Enterprises Ltd.