View Full Version : robot.txt
jovanie
18-01-2003, 02:37/02:37AM
I am very new to all this still trying to get my site straight actually.
But I keep getting an error message for "robot.txt" on my site reports. I have placed "<META NAME="ROBOTS" CONTENT="INDEX, FOLLOW">" between the keyword and description tag thinking tha would fix it but it hasn't. Can anyone please help me out.
Greatly appreciated
Dan0
18-01-2003, 13:49/01:49PM
robots.txt is a separate text file that you place on your server, in the "root" directory (same place you put index.html). Here is a tutorial on making one:
http://www.devnewz.com/2002a/0924.html
ihelpyou
18-01-2003, 20:44/08:44PM
Welcome to the forums jovanie! :hi:
zayin
19-01-2003, 21:04/09:04PM
If I do not have any content that I do not want to hide from robots do I need this robot.txt file?
Dan0
19-01-2003, 21:36/09:36PM
No, you don't need one. If you would rather not have the 404 error showing up in your logs, you can do a one-line robots.txt file that says:
User-agent: * Disallow:
This tells all spiders that they can crawl whatever they want.
menj
08-02-2003, 04:13/04:13AM
I tried using the META robot tag, but it doesn't seem to work. Since then I've switched to putting the robots.txt file in the root directory of my website, as can be seen here:
http://www.bismikaallahuma.org/robots.txt
User-agent is for defining the various search engine robots visiting the site, whereby I have marked with an * for "all robots". The Disallow command is to block the search engine robots from listing the defined html or directory files.
Blue
08-02-2003, 13:37/01:37PM
As an addendum to this thread:
There are some webmasters that like to specify certain robots as being disallowed from crawling their site.
These may include "spammy" search engines, adult-related robots, and the like.
There are examples out there on the net in which you can get lists of these "bad" robots.
Whether or not these "bad" robots actually obey the robots.txt protocol is another matter.
TrueBlue
14-04-2003, 20:14/08:14PM
Good Evening:
Is there a statement that you can add into the robot.txt file that would disallow spidering of any pages like https://www.sample.com/secured/example.asp
Thank you in advance.
Chuck
Kal
14-04-2003, 20:35/08:35PM
Yes there is. Read the tutorial linked above. :)
Alan Perkins
22-04-2003, 07:20/07:20AM
The official robots.txt resource is at http://www.robotstxt.org - lots of good info there.Is there a statement that you can add into the robot.txt file that would disallow spidering of any pages like https://www.sample.com/secured/example.asphttps sites have their own robots.txt files. If you don't want any of your https content to be read, useUser-agent: *
Disallow: /in the root of your https server. Ensure you don't accidentally publish this file to the root of your http server!
robwatts
22-04-2003, 08:00/08:00AM
Just thought I'd add a useful tip into the pot.
Not having a robots.txt file can actually have its advantages too.
If you are one of these people that likes to know when a particular bot has been around you can quickly find out by looking at your error_log.
error logs are usually much much smaller than the main site log file and in the absence of any realtime stats package, or other log file analysis tools are very simple to flick through and find what you need.
Most bots worth having around will request a robots.txt file, those that do not aren't worth considering (unless you want to find their owners and give them a rollocking)
[Tue Apr 22 06:55:16 2003] [error] [client 216.39.50.159] File does not exist: /www/yourdomain.tld/www/robots.txt
The example above is a line taken from an error log this morning, which indicates that a bot claiming to be from google (IP address) paid me a visit and requested the robots.txt file.
If I wanted to know if the bot had spidered new content or had spidered certain files I could then look at my main log in more depth.
raj80
31-08-2005, 10:30/10:30AM
Hi ,
When i analyzed my Site for a Meta Tag has given proper or not, by some company tool analyzer.,they listed that.
"WARNING! The robots.txt file for this site blocks the above URL from being indexed by the robot(s) with the name of:"
URL_Spider_Pro
CherryPicker and many more...
can u explain wat it means.........
Cheers,
Rajesh.M :cheers:
g1smd
31-08-2005, 10:43/10:43AM
They are just listing all the bots that are blocked from the site, so that you can check that the list is OK.
Many of those bots really do need to be blocked, as they are no help to your site.
raj80
31-08-2005, 11:55/11:55AM
Originally posted by g1smd
Many of those bots really do need to be blocked, as they are no help to your site.
Can u explain it clearly, coz i'm new to this one...So what the needs to be applicable to crawl my site from SE by when creating the robots.txt file and then what and how the process is going on??
Cheers
Rajesh.M:cheers:
Blue
31-08-2005, 12:21/12:21PM
Hi Rajesh! :hi:
Search engines and other Internet entities have programs that send 'robots' or 'bots' or 'spiders' out following links on web pages.
In most search engines cases, the purpose of these bots are to take note of the existance of the web page it is following links to and from so that the search engine can then apply it's algorithm to the page in question, index it in it's database and then serve it up to searchers in it's SERPS.
The robots.txt file on any given website can include some code that tells these bots whether any given file or folder should be allowed to be spidered and indexed.
Be aware that some SE's bots do not follow the robots.text protocol and will disobey the rules you have set forth there.
More info here (http://www.robotstxt.org/).
vBulletin® v3.7.3, Copyright ©2000-2009, Jelsoft Enterprises Ltd.