View Full Version : WebPosition **** robots.txt discussion
Susan Goodson
26-06-2003, 10:07/10:07AM
[Moderator Note: This discussion has broken out from this post (http://www.ihelpyouservices.com/forums/showthread.php?s=&postid=98870#post98870)]
Just to clarify, WebPosition does analyze your specific competitor pages as well.
Webmaster T
27-06-2003, 09:22/09:22AM
Originally posted by Susan Goodson
Just to clarify, WebPosition does analyze your specific competitor pages as well. Yes, and that is also another piece of crap no self respecting SEO would even consider!
Matt B
27-06-2003, 09:33/09:33AM
I wouldn't go that far, T. WPG can be a good tool if used properly.
. . . but that was a very awkward plug, IMO.
Susan Goodson
27-06-2003, 10:07/10:07AM
WebMaster T: you are certainly entitled to your opinion. However it is YOUR opinion and yet I have seen you shoving it down people's throat in almost every post you make, regardless of whether the topic is anywhere near related to WebPosition ****. I have seen you sneak in slams in threads that have nothing at all to do with the software. But the fact is WebPosition **** does work and it works well, if you use it properly. I get emails every day from people new to SEO who have been successful with the software. I still get emails from experienced SEOs who said they never would have been able to start mastering SEO without the guidance of the software in the beginning. After a point some may have grown out of needing the Critic or the Page Generator for optimizing but they still use the software for monitoring hits and positions.
All that aside Doug: if I am no longer welcome here just say the word and I won't post here any longer. I recall a while back you personally asked me to start posting here to clarify and answer any questions about WebPosition ****. And at that time I was resistant because I was not sure how I would be received as a company representative, and you assured me I was both welcome and wanted, if you no longer feel that way, just let me know.
Webmaster T
27-06-2003, 10:18/10:18AM
I'd go farther but..................... decency stops me!
This software that has more similarities to an email harvester then it does a legitimate SEO tool!
Proper Usage:
1. don't query Google because it is against their TOS.
2. don't use the rank checker on any SE with a robot.txt
3. don't use the page analyzer on any sites with a robot.txt disallowing it
Of course where robot.txt is concerned you would have to do that yourself because they don't even know of its existance or you'd get that impression by the way it is programmed.
Should it even be able to analyze my pages since it doesn't obey robot.txt and I can't stop them from utilizing my resources in a manner I find objectionable. Proper use would be not using the very feature she just plugged.
Proper usage would be as a doorstop and it ain't very good for that either!:D
Susan Goodson
27-06-2003, 11:09/11:09AM
I think either I am not understanding you or you have obtained some incorrect information. Either way I will do my best to address your issues (at least until I hear back from Doug as to whether he would like me to stay or go):
Proper Usage:
1. don't query Google because it is against their TOS.
*This is true. Any automated query by any software or script is against Google's TOS. We don't lie about it, we state it right in the software and in several of our newsletter articles. We also updated the software to have additional features to slow searches and make them more polite and more respectful of their resources. Use it or don't on Google that is up to each user.
2. don't use the rank checker on any SE with a robot.txt
*This one I am lost on, I am not sure where you got this information or what it is based on, but if you want to clarify for me I will do my best to address it.
3. don't use the page analyzer on any sites with a robot.txt disallowing it
*Again I am not sure what you are saying here. Are you saying you should not run the page analyzer on any page with a robot.txt or are you saying you should not optimize pages without having a robot.txt to disallow some of the engines?
*If it is the former please explain and I will do my best to help. If it is the latter if you create high quality optimized pages there is no reason do disallow any engines. I always tell anyone I deal with who use the Critic that they need to optimize their existing pages and *if* they decide to create additional pages for the purpose of optimization they need to make sure those pages are unique, relevant, quality pages, that are just as vital to their site as any other of their existing pages. They should not be duplicates or anywhere near duplicates. That way there is no need to "hide" them from any engines.
Of course where robot.txt is concerned you would have to do that yourself because they don't even know of its existance or you'd get that impression by the way it is programmed.
*Actually we did an article about exactly how to make a robots.txt file a while back in the newsletter and referenced it in the Critic. However, creating a robots.txt file does not prevent the engines from accessing or reading your pages. It merely keeps them from indexing the content of those protected pages. Therefore you do not want to have anything on your site that you would not want the engines to see. It's that simple.
Should it even be able to analyze my pages since it doesn't obey robot.txt and I can't stop them from utilizing my resources in a manner I find objectionable.
*The Critic only analyzes the specific URL or file you point it to. It does not read any other content and should not use any of your web resources. If you are encountering or did encounter something to that effect please send me details at susan@firstplacesoftware.com and I will take a look, because I promise you the software is designed to give the users full control of what it does.
Webmaster T
27-06-2003, 11:26/11:26AM
Originally posted by Susan Goodson
I have seen you shoving it down people's throat in almost every post you make "in almost every post you make" I have posted over 800 times to this forum if 10% are about FirstPlace software, WPG, or your newsletter then I would be very surprised. Of course stretching the truth to the point of the absurd is a typical marketing strategy from WPG. Who cares if it isn't factual, if it sells product.
Personally I don't care for your implying that I do this when it isn't relevant. For instance I may mention WPG in regards to bad bots that don't follow Robot.txt or mention where some of the spam technique started, but they are relevant examples and information that people should have since many of the problems we encounter here were started by publications that irresponsibly teach and proliferate spam techniques.
If you want to shut me up then just do something about that! Brett knows me, I used to sell WPG until they turned it into a spam machine. He knows **precisely** why I get on them about this. I was saying these things before it was even "in vogue" to say them. If you think this is just recent then look at SeoPros.org, I-search and other places where I think it does some good to express my concerns and inform people about what WPG really is.
We all know where the hallways and doorway page techniques originated don't we. A while ago you did that multiple domain article I posted several times explaining why that is just a bad idea. So IYO, it's fine for me to help clean up your mess but I'm shoving my opinion down peoples throats when I identify the cause? That makes a lot of sense doesn't it?
It is the reckless disregard for others who don't know any better that ticks me off! If you don't like it, well, I already told you how to shut me up. It's pretty simple and nothing personal because I think Brett is good guy just a little misguided in some directions he has taken the software and the way it is promoted.
Alan Perkins
27-06-2003, 11:50/11:50AM
Originally posted by Susan Goodson
However, creating a robots.txt file does not prevent the engines from accessing or reading your pages. It merely keeps them from indexing the content of those protected pagesThat's totally wrong. A robots.txt file *does* prevent compliant robots accessing or reading pages. That's the point of it.
Webmaster T
27-06-2003, 12:30/12:30PM
Originally posted by Susan Goodson
Use it or don't on Google that is up to each user. Wouldn't it be more responsible to not include it since it is evident that WPG know it is wrong and risky to query Google at all. That's like saying a crime has only been committed if you get caught. So don't get caught. That mentality is at the root of most of the problems this industry is encountering with regards to perception of those in it.
*This one I am lost on, I am not sure where you got this information or what it is based on, but if you want to clarify for me I will do my best to address it. WPG is a user agent bot what ever you want to call it, it should be obeying Robot.txt protocals. For more info Robot.txt (http://www.robotstxt.org/wc/robots.html)Most SE have a robots.txt which basically says if you aren't a user browsing we don't want you accessing the results. WPG error messages when they are blocked from access seem to imply it is a problem with the engine, when in fact it is a problem with the software not updating the program to get by blocks. It is within a SE right to block WPG if they choose. The program shouldn't make the engine look bad for it's own limitations. To think otherwise is to think SE are public Billboards to be used however WPG wishes.
I'm not positive but IMO, WPG represents itself as something it's not so this would be contrary to what the SEs want and the protocals for robots and remote user agents. Who do you think pays for managing the server loads and supply the bandwidth to serve the results to WPG users? A post in I-search from the president of Nothern light put the number of requests by WPG at over 200,000 a day. Northern Light was by no means a major engine and that was several years ago so it has to be even higher than that for many.
If WPG did pay, or even took some proactive measures to curb abuse, and redundant requests then I would be content to let this slide. IMO it is just exploitation of SE resources for financial gain. That IMO, is just another kind of spam.*The Critic only analyzes the specific URL or file you point it to. It does not read any other content and should not use any of your web resources.Perhaps but if I own the site you are analyzing then I should be able to stop that if I desire to do so. I am paying the freight afterall since I am paying to serve that page. It isn't the money it is just the fact that I should have full control over how my resources are used. I don't particularly mind if someone wants to come to the page and sift through the code the way I do myself. I resent that it is automated. Perhaps I'm wrong on this point it has been a while since I even looked at what is in WPG, if so I apologize for my ignorance.;)
Advisor
27-06-2003, 13:11/01:11PM
All that aside Doug: if I am no longer welcome here just say the word and I won't post here any longer. I recall a while back you personally asked me to start posting here to clarify and answer any questions about WebPosition ****. And at that time I was resistant because I was not sure how I would be received as a company representative, and you assured me I was both welcome and wanted, if you no longer feel that way, just let me know. Susan, Doug is on vacation this week, but I think I speak for him when I say that you're most certainly still welcome to post here and answer questions that may arise about WPG.
For the record, I still like WPG for my reporting of rankings (although I won't use it on Google due to the TOS concerns).
Jill
Susan Goodson
27-06-2003, 15:05/03:05PM
Thanks Jill :)
WebMaster T, I can see where you are coming from now. Okay you don't like the fact that the software can analyze anyone's pages without permission and search the search engines without permission. And I won't argue that, you are right it can do both without permission.
But isn't that what you do as a web master as well? Granted on a smaller scale and granted it takes longer. But don't you analyze your competitor's pages? Do you ask their permission to look at their page? Do you search the engines to find out where your sites or client's sites rank? Do you do those searches without paying the engines? Do you click their banner ads or sponsored listings?
The whole premise of the software was to automate the steps that SEMs do manually to help make the processes easier. It was meant to help SEMs both new and experienced with their tasks.
I am not trying to turn you around or convert you in any way ;) I honestly thought that perhaps you had either misunderstood how the software works or you had gotten some misinformation and I was trying to clear it up, while explaining that while entitled to your opinion it does not necessarily make what you say fact nor does it mean that it is correct, or the only opinion on the matter. I wanted to speak our side so at least both opinions (although mine and yours are the extreme ends - most lie somewhere between us) are here.
My reason for posting originally was to correct a piece of misinformation earlier in the thread where someone stated that WPG did not or could not analyze a specific competitor page but instead was trying to reverse engineer the engines' algos and I only wanted to clarify that it can analyze specific competitor pages as well.
Alan Perkins: I know what the Web Robots Standard of Exclusion is and why it was created. However, what I am saying is there are no guarantees with it and I have seen for a fact main stream engines that crawl restricted pages either purposefully or by accident. Here is a quote directly from the Standard's site:
It is not an official standard backed by a standards body, or owned by any commercial organization. It is not enforced by anybody, and there no guarantee that all current and future robots will use it. Consider it a common facility the majority of robot authors offer the WWW community to protect WWW server against unwanted accesses by their robots.
qwerty
27-06-2003, 15:17/03:17PM
Part of the problem is that on Google's webmaster guidelines page (http://www.google.com/webmasters/guidelines.html) it specifically states Don't send automated queries to Google. WPG, in response to that, mimics a normal browser rather than revealing itself to be a bot. You really ought to specifically tell your users that using WPG to query Google is a violation of their guidelines. I haven't used WPG in at least a couple of years, but I don't remember anything specifically telling me that.
Alan Perkins
27-06-2003, 15:22/03:22PM
Originally posted by Susan Goodson
Alan Perkins: I know what the Web Robots Standard of Exclusion is and why it was created. However, what I am saying is there are no guarantees with itWhat you said was "However, creating a robots.txt file does not prevent the engines from accessing or reading your pages. It merely keeps them from indexing the content of those protected pages." That is wrong.
The robots.txt standard, as designed, is nothing to do with indexing. It was designed for all robots, not just search engine spiders. It was designed to prevent robots accessing resources that Webmasters did not want them to access:A Standard for Robot Exclusion (http://www.robotstxt.org/wc/norobots.html)
In 1993 and 1994 there have been occasions where robots have visited WWW servers where they weren't welcome for various reasons. Sometimes these reasons were robot specific, e.g. certain robots swamped servers with rapid-fire requests, or retrieved the same files repeatedly. In other situations robots traversed parts of WWW servers that weren't suitable, e.g. very deep virtual trees, duplicated information, temporary information, or cgi-scripts with side-effects (such as voting).
These incidents indicated the need for established mechanisms for WWW servers to indicate to robots which parts of their server should not be accessed. This standard addresses this need with an operational solution.Sure, some robots attempt to comply and still make mistakes occasionally.
But some robots make no attempt to comply at all. I'm led to believe that the WPG robot, for example, does not publish a unique User Agent name and does not obey robots.txt - is that correct?
Webmaster T
27-06-2003, 17:39/05:39PM
Originally posted by Susan Goodson
WebMaster T, I can see where you are coming from now. Okay you don't like the fact that the software can analyze anyone's pages without permission and search the search engines without permission. And I won't argue that, you are right it can do both without permission.Nice to see that there is some agreement and as I go through your post and give another viewpoint please understand this has never been a vendetta. others may think that is the case but it isn't. But isn't that what you do as a web master as well?Yeup and it would be foolish to dispute that since I've been writing about doing exactly that for many years.Granted on a smaller scale and granted it takes longer.that is my point. I'm not bombing a server with requests I would spend a lot of time on each page.
I will admit I've written a program that will actually compare 4 pages mine and three competitors but it also will not access pages that disallow the spider. Tmeister has a constant user name and an email address to contact me if you want to know what it was doing. Since it has a constant user agent and does obey robot.txt anyone can easily stop it from doing that if they don't like it. If WPG was accessing resources in that manner than, heay more power to you.But don't you analyze your competitor's pages? Do you ask their permission to look at their page?Of course not but that has been standard SEO operating procedure for years. I expect that of my competition and it doesn't bother a lot when I see what I know came from my page. Do you search the engines to find out where your sites or client's sites rank? Do you do those searches without paying the engines? Do you click their banner ads or sponsored listings?no to all of the above and when I want to know placement I go to the SE. Even they wouldn't care if I do that. I do see the banner or text ad which means it does have some branding value for the page view but I admit I haven't clicked a banner on a SE in a long time.
Actually a number of the mods in here myself included see little real valuable information coming from a ranking report. They are for clients not SEOs. Anyone who pays for them were duped and sold a bill of goods. A good SEO does traffic analysis a wannabe provides a ranking report. If a client wants a ranking report and doesn't see how thorough traffic analysis is a competive advantage then I probably a) don't want them as a client B) they can't afford my service.
I agree the others deserve something but I'm not convinced a ranking report is in their best interests. But the kind of service I provide is likely overkill.
Yes I would gladly pay a search engine for that access and have actually enquired about doing so. IMO that is the real unsung value in Inclusion, the reporting! I could have 40 #1s but if they don't drive traffic, or they don't convert what use is #1. It's all relative. I know it was submitted I know some of it places because that is in the logs or tracking software. Does it really matter if a sale/conversion originates from the first or thirtieth result. I know it is better to be first but I've never been real obsessive about ranks for a number of years
Log analysis is where the real **** mine of information resides. However I prefer to write my own tracking for a site built right into the backend of the sites I build. The whole premise of the software was to automate the steps that SEMs do manually to help make the processes easier. It was meant to help SEMs both new and experienced with their tasks. Email harvesters provide the same services to email marketers does that make email harvesters a good netizen or the programs any less of a nuisance to others whose resources are used to harvest addresses?I am not trying to turn you around or convert you in any way ;)Well I was a big supporter of WPG, and even made afew bucks doin' when I was just getting my business off the ground in the past but I was less knowlegeable about these issues and SEO in general. Personally I think there is a lot to be gained by going directly to the source. I get paid by the hour so does it really pay to use a ranking report. Even when I used it I never used the programs that evaluate pages or SE algos IMO they were always wacked and made absolutely no sense.I honestly thought that perhaps you had either misunderstood how the software works or you had gotten some misinformation and I was trying to clear it up, while explaining that while entitled to your opinion it does not necessarily make what you say fact nor does it mean that it is correct, or the only opinion on the matter. Your right it doesn't make it fact the users here decide if what I say is BS, bashing or factual. Believe me they love to call you on it when they think it is any of the above.
There was only one misunderstanding between myself and those I spoke to at FirstPlace, we agreed to disagree on whether it is abusive of SE or not. I see SE as private property, I wouldn't go into my neighbors yard and steal his newspaper to get the news I want or need and I sure as heck don't want him stealing mine.
My reason for posting originally was to correct a piece of misinformation earlier in the thread where someone stated that WPG did not or could not analyze a specific competitor page but instead was trying to reverse engineer the engines' algos and I only wanted to clarify that it can analyze specific competitor pages as well. Well that was me and I was talking about the article which I recall them discussing reverse engineering engines. I know many people don't like that feateure of WPG so it was an example of what IMO, is not a good strategy. Positioning of a site is granular, to a large degree I don't care what the engines algos are I'm looking for what the other sites did poorly that I know I can do better.
The rest I'll leave for Allan with one thought. You don't have to be a good citizen and help that old lady across the street, you do it because it is just the decent thing to do.
Susan Goodson
30-06-2003, 12:09/12:09PM
Allan you are correct we do not obey robots.txt and we do not use a unique user agent.
On the other hand we do not bomb a competitor's web site server with requests for pages to analyze either. When you analyze a competitor page manually you go to their URL that you are interested in, and you look at the page in your browser, you most likely look at their source code, you study it and you decide what you can do and are willing to do to get more traffic or better rankings. You then make those changes to your pages. With WebPosition **** you do pretty much the same thing. It accesses the URL you are interested in as a browser, it analyzes the page, presents you with a statistical analysis of the page, and then you decide what you can and are willing to do to get more traffic or better rankings. You then make those changes to your pages.
WebPosition accesses pages like a browser would. Browsers do not obey robots.txt protocol. It sounds like what you do when analyzing a page is not so different than what WebPosition **** does. If you look at a competitor page in your browser you are doing the same thing WebPosition **** does. You are accessing one page only using your browser, for the purpose of analyzing the page.
In addition to that WebPosition does have the ability to analyze more than one top ranking page, however, in those cases it merely downloads a copy of each page one at a time and analyzes them locally, it does not bombard the server with page requests.
Regarding the ranking reports, some like you may feel that ranking reports do not give you or your clients a full picture of your efforts. Traffic reports can provide additional detailed information and that is why we provide those as well.
I guess our basic difference of opinions comes down to the fact that we think we have the right to provide a suite of tools that can help search engine marketers and web designers automate much of the process that they would do manually. They can choose which modules they use and which they don't. Also with the new features in 2.0 you can, if you or your clients were interested in ranking reports, slow them down to the point that there is no difference from a manual search in regard to their resource usage or the browser request.
Alan Perkins
30-06-2003, 12:31/12:31PM
Originally posted by Susan Goodson
Allan you are correct we do not obey robots.txt and we do not use a unique user agent.Thanks. :)
Any robot writer could say "You can do manually what our robot does automatically, so what's the problem?" The whole point is that there are problems when you do things automatically, and that's why the robots.txt protocol was invented. IMO if you're a robot writer and not signed up to the solution, then you're still part of the problem.
Susan Goodson
30-06-2003, 14:36/02:36PM
Sorry we could not come to terms.
dragonlady7
30-06-2003, 15:36/03:36PM
we do not obey robots.txt
:confused:
I'm pretty new at this. I thought it was a bad, bad, bad thing to ignore that. I thought that a bot not asking for robots.txt was the epitome of poor robot etiquette, on par with killing your pet dog. I've been going through my logs in my rudimentary way and trying to spot bots who don't ask for robots.txt ever, and ban them because they'd have to be up to no good. I was just going to post to ask if anyone knew of any software that could help me, as my eyes were about to fall out. I'd heard all kinds of terrible things about that stuff.
So why would someone come right out and say that they didn't obey it? I thought it was this big deal and saying that would get you drawn and quartered by the anti-spam militants.
:confused: :confused: :confused:
Man... You just start to think you're figuring out what's going on. I'm starting to figure out where the nagging feeling that they don't pay me enough for this was coming from...
polarmate
30-06-2003, 15:47/03:47PM
Hey dragonlady,
My basic understanding of this is that a bot has to identify itself as a unique user-agent before it can be banned. Susan says that WPG does not use a unique user-agent and operates instead like a regular browser. My guess is that you cannot ban WPG from accessing your site and making automated queries. My other guess is that if they did do that, then they would lose a lot of customers simply because a lot of discerning webmasters like you and me would block or ban them from their web sites.
dragonlady7
30-06-2003, 15:58/03:58PM
So... that'd be why it doesn't use a unique user-agent, then?
Ah.
I think I understand this whole debate a lot better now. So WPG isn't doing anything technically illegal, as those on its side say, and those who are against it point out that while it's not technically illegal, it's darn bad manners. And against the TOS of many sites. :/
That's a finer line than I'd want to base a business model on, but maybe I'm just chicken. :(
csmith
30-06-2003, 17:14/05:14PM
Originally posted by Webmaster T
Yes, and that is also another piece of crap no self respecting SEO would even consider!
I suppose, based on your statment that you have never used it... If so I rather doubt that you would have made the statement at all.
Webmaster T
01-07-2003, 02:54/02:54AM
Originally posted by csmith
I suppose, based on your statment that you have never used it... If so I rather doubt that you would have made the statement at all. You'd be wrong! I have used it and sold it and know the version I used as well as anyone. When I was touting it my testimonial was all over the net. I don't make statements like that because I like the sound of my own voice. Since you have made assumptions about me how about a little *** for tat! Based on your statement I would say you are a user of WPG and dependent on it. Never was and never will be I learned the old fashioned way by spending hours studing SERPs and pages that do well. That is one characteristic common to all SEOs who are any good at all. They don't take any shortcuts in implementing their craft or the techniques they use.
Catfish
28-10-2003, 16:11/04:11PM
I use WPG to keep track of my Inktomi listings and I use Shawns Tool for keeping track of my Google rankings. These are the only two databases I worry about right now. Its interesting to see what page critic says about pages but I don't use the advice very often as I have my own formulas. WPG is good for what I use it for. They should be more clear about Google's TOS however. This sentence from the front page of www.webposition.com clearly implys that their software will help you with Google:
A top 10 ranking in a major search engine like AltaVista, Lycos, or Google will often generate more targeted traffic than an expensive banner advertising campaign - and, a good search engine position is like highly targeted advertising that is both FREE, and effective!
Obviously that is a little misleading as Google has expressly said that WPG is against their TOS. I would recommend anyone who wants to keep track of their Google ranking to try Shawns tool at http://www.digitalpoint.com/tools/keywords/
Its 100% free and all you need is an API code which takes 3 minutes to register for at Google (the link is found on Shawn's signup page). I think when more people realize how cool Shawn's code is, it will be the most popular SEO site on the internet. WPG is still the best thing I have found to keep track of my Inktomi listings and in truth, many of my clients are fond of their webposition reports. Thats my 2 cents.
Bernard
28-10-2003, 16:50/04:50PM
Catfish, I'm not sure how WPG queries Inktomi for ranking reports, but since the inktomi site itself does not offer a means to search their index, if WPG is querying Hotbot, MSN or another Inktomi driven engine, it is likely violating that engine's TOS. Hotbot is owned by Lycos and their TOS clearly states:Prohibited Conduct
You agree that you will not use Lycos Network Products and Services to:
...
p. Use automated means, including spiders, robots, crawlers, or the like to download data from any Lycos Network database.
Lycos Terms and Conditions (http://info.lycos.com/legal/legal.asp)
MSN has similar verbiage:Unless otherwise specified, the MSN Sites/Services are for your personal and non-commercial use. You may not modify, copy, distribute, transmit, display, perform, reproduce, publish, license, create derivative works from, transfer, or sell any information, software, products or services obtained from the MSN Sites/Services. Without the advance express written permission of MSN, you may not 'meta-search' the MSN Sites/Services, send, or cause to be sent, any automated queries of any sort to the MSN Sites/Services, or use the MSN Sites/Services in any commercial manner. "Automated queries" shall include but not be limited to using any software that sends queries to any MSN Sites/Services to determine how a web site "ranks" on any MSN Sites/Services.
MSN Terms of Use (http://privacy.msn.com/tou/)
Catfish
28-10-2003, 17:48/05:48PM
Very interesting. I hadn't noticed that. Well, thats a bit of a pain in the ass isn't it??? :thebomb:
vBulletin® v3.8.3, Copyright ©2000-2010, Jelsoft Enterprises Ltd.