View Full Version : Bypassing robots.txt?
seoRank
17-03-2003, 22:31/10:31PM
I was just wondering, if a crawler is visiting your site's inner page like - xyz.com/dir/dir/index.htm through a direct link from another site, will the crawler first read (and respect) the exclusion tag in robots.txt if this particular page is set to be excluded?
Hope
18-03-2003, 07:10/07:10AM
From my experience, it would depend on the spider. Some will automatically hit the page and then ask for robots. Others will spider the page/site the first time, then return at a later date to do a full spidering and at that time look for the robots page. Then there are the spiders taht don't read robots info at all.
seoRank
19-03-2003, 09:29/09:29AM
hmm... any idea which one does what hope?
Hope
20-03-2003, 07:14/07:14AM
actually I never really paid that much attention. I have always wanted most of the site indexed, so it didn't matter. Most spiders dont do a deep crawl the first time they hit a site. They make note of the site and return at a later date.
I do remember that for a while alta vista didn't read robots.txt or robots meta tags. I don't know if they are reading them or not. To be honest, i don't think i have seen the alta vista spider in ages. :eek:
seoRank
21-03-2003, 03:24/03:24AM
Thanks. You are dead right about spiders not doing the deep crawl in their first visit. As a practice, I only submit the main URL and the site-map URL to the search engines and expect them tocrawl the rest of the sites from the site-map links. I see most of the search engines have indexed only the site-map and the home page of the site.
The reason we usually want to exclude pages is the ones not meant for casual surfer - e.g. - logreport pages, bank-wire transfer pages, demo pages, test url's etc.
Thanks
Hope
21-03-2003, 07:17/07:17AM
You should not have to worry about those pages. Most search engines are polite enough to read the robots.txt before the actually index anything. If you are concerned that a spider isn't reading robots.txt, you can alway use the robots meta tag. Don't know how much that will help either. Again it is a matter of how polite they want to be.
scottiecl
21-03-2003, 07:33/07:33AM
Welcome to the forum, Seorank! :hi:
vBulletin® v3.8.3, Copyright ©2000-2010, Jelsoft Enterprises Ltd.