PDA

View Full Version : exclude bot?


loki
02-12-2004, 06:40/06:40AM
is this the correct way to exclude googlebot from parsing a specific page?

User-agent: googlebot
Disallow: specific-page.html


the googlebot visits every day, any idea how long before gg drop the page from their serps?

WebSavvy
02-12-2004, 08:44/08:44AM
Hi loki,

Yes, that's the correct format to block a bot from a specific page.

You can also block certain file extensions from within a folder like this:

/folder/*.php

Where folder is -- replace with your folder name.

If you have a page that generates dynamically and adds an appending number to the file, you can block this way:

/folder/file$.php

Where /folder/file/ -- replace with your folder & file name.

loki
02-12-2004, 09:13/09:13AM
thanks deb.

any thoughts on question #2?

WebSavvy
02-12-2004, 09:54/09:54AM
Oh, sorry I missed that one, loki. Usually the page you don't want indexed any longer will be dropped from the index within the next one to two indexing cycles after Google asks for your robots.txt file again.

:)

loki
02-12-2004, 11:55/11:55AM
hmmm. i wonder if there's a faster way to get the page out of their serps?

noindex possibly?

tia

WebSavvy
02-12-2004, 12:11/12:11PM
I'm not sure? Maybe someone else knows?

Blue
02-12-2004, 14:26/02:26PM
Check this (http://www.google.com/remove.html) loki.

loki
02-12-2004, 15:14/03:14PM
many thanks to you both.

loki
10-10-2005, 13:47/01:47PM
Originally posted by savvy1


If you have a page that generates dynamically and adds an appending number to the file, you can block this way:

/folder/file$.php



differnet job, similar query...

will the disallow below work to disallow dynamic pages such as these:

ApartmentInquiry.aspx?CultureCode=en-GB&RegionId=1&PropertyId=101&ControlConfig=0
ApartmentInquiry.aspx?CultureCode=en-GB&RegionId=1&PropertyId=102&ControlConfig=0

Disallow: ApartmentInquiry.aspx?CultureCode=en-GB&RegionId=1&PropertyId=$&ControlConfig=0

g1smd
10-10-2005, 15:34/03:34PM
Generally the disallow should start with a / in most cases I believe.

All names that exactly match "from the left" will be disallowed.


Putting disallow statements for various files and folders into the robots.txt file, for files that are already indexed, will NOT cause Google to drop them. The only way to do that is to have the <meta name="robots" content="noindex"> tag on the page itself.

For new pages, newly online, adding a disallow statement to the robots.txt file means that when Google discovers a link to the page, then it will list the page as a URL-only entry in the SERPs. Again, the only way to stop that (and have no listing) is by using the meta tag instead of the robots.txt method.

WebSavvy
10-10-2005, 17:18/05:18PM
If Google has already indexed them, you need to add a noindex, nofollow to the robots meta between the <head></head>

If Google has not already indexed them you can block the files by using this format in robots.txt

User-agent: Googlebot
Disallow: /* ApartmentInquiry.aspx?CultureCode=en-GB&RegionId=1&PropertyId=$

However, using method #2 does not mean Google will NOT index the URLs! It just means they won't index the title, description, page content, etc., though they will still list the URL in their db.

Google is the only SE that I am aware of that has this problem. All of the other SEs understand that if you don't want a URL indexed and have listed it in robots.txt file with DISALLOW -- they will leave it alone and not even index the URI.

Why Google doesn't understand this very simple thing, only they can answer.