View Full Version : Kick spiders out!
usbnuts
27-08-2001, 16:07/04:07PM
How do you prevent Google spiders from entering a site? I heard there's a META TAG that does this.
Thanks!
Sharon & Roy
27-08-2001, 23:36/11:36PM
Hello usbnuts,
If you want to prevent all robots from indexing individual pages on your site, then you can place the following meta tag element into the page's HTML code: (inside the <head></head> tags)
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
If you want to allow other robots to index individual pages on your site, preventing only Google's robots from indexing the pages, use the following tag: (inside the <head></head> tags)
<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">
Mel
28-08-2001, 00:00/12:00AM
Hi USBnuts;
As Sharon and Roy suggested you can use the robots meta tag on individual pages or you can use the robots .txt file to exclude entire portions of your site.
The robots.txt file is just a strict text file located in the root directory of your site (http://mysite.com/robots.txt) and must be named robots.txt.
The contents of the file are similiar to this typical file:
User-agent: *
Disallow: /websitepromotion-logs
Disallow: /websitepromotion-mail
Disallow: /websitepromotion-secure
Disallow: /websitepromotion-www/_private
Disallow: /websitepromotion-www/_vti_bin
Disallow: /websitepromotion-www/_vti_cnf
Disallow: /websitepromotion-www/_vti_log
Disallow: /websitepromotion-www/_vti_pvt
Disallow: /websitepromotion-www/_vti_txt
Disallow: /websitepromotion-www/images
Disallow: /websitepromotion-www/stats
The above file prevents all spiders from indexing the directories listed in the file.
You can also specify the name of the spider such as -
User-agent: googlebot
Disallow: /websitepromotion-logs
Disallow: /websitepromotion-mail
Disallow: /websitepromotion-secure
Disallow: /websitepromotion-www/_private
Disallow: /websitepromotion-www/_vti_bin
Disallow: /websitepromotion-www/_vti_cnf
Disallow: /websitepromotion-www/_vti_log
Disallow: /websitepromotion-www/_vti_pvt
Disallow: /websitepromotion-www/_vti_txt
Disallow: /websitepromotion-www/images
Disallow: /websitepromotion-www/stats
Which would prevent only the googlebot from indexing these directories - all others would be allowed by default.
JuniorHarris
28-08-2001, 01:12/01:12AM
Also you can use the robots.txt syntax checker (http://www.tardis.ed.ac.uk/~sxw/robots/check/) to verify the syntax.
usbnuts
28-08-2001, 02:26/02:26AM
If I do something like this, spider won't come into my site, right?
User-agent: *
Disallow: /stats/
Disallow: /
I'm working on a new site and spiders are not welcome at the moment.
Mel
28-08-2001, 04:37/04:37AM
Hi USBnuts:
yep that will keep them all out of the root and so everything under it, but I would be a bit careful if this is a rework of an old site. If the spider comes around a couple of times and finds nothing to index he may decide not to visit again,
I would suggest giving him at least one keyword rich file to chew on.
highman
28-08-2001, 08:12/08:12AM
>I would suggest giving him at least one keyword rich file to chew on.
Agreed, a holding page out lining the new site using a few keywords.... not many, just to keep the little googlebot happy
JuniorHarris
29-08-2001, 08:50/08:50AM
Great idea about leaving a "seed" page for the engines...not only to "reserve" your indexing position, but it also could be leveraged to provide links to all the new pages once the site is complete!~;)
vBulletin® v3.7.3, Copyright ©2000-2009, Jelsoft Enterprises Ltd.