View Full Version : Great Job Joeant
stoner3221
05-05-2004, 14:20/02:20PM
Joeant has 85,000 pages indexed by Google. Congrats to you folks, thats quite impressive.
Hbird64
05-05-2004, 17:34/05:34PM
I count 139.000 listings :D See http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=+site:www.joeant.com+joeant
Hugo
stoner3221
05-05-2004, 17:41/05:41PM
Originally posted by stoner3221
Joeant has 85,000 pages indexed by Google. Congrats to you folks, thats quite impressive.
I was using allinurl:"www.joeant.com" site:www.joeant.com
to get the 85,000 but regardless it's very impressive.
ihelpyou
05-05-2004, 17:56/05:56PM
Yes, real nice job! :cheers:
Hbird64
09-05-2004, 16:02/04:02PM
Google must have been busy the last few days. Now I got 123,000 results with allinurl:"www.joeant.com" site:www.joeant.com
Hugo
stoner3221
09-05-2004, 16:12/04:12PM
Very Nice!:cheers:
Hbird64
12-11-2004, 09:10/09:10AM
I hope the last Google update will give us a higher PR.
Since a few days we have 184,000 (http://www.google.com/search?hl=en&lr=&c2coff=1&q=allinurl%3A%22www.joeant.com%22+site%3Awww.joeant.com&btnG=Search) links in Google.
Hugo
stoner3221
12-11-2004, 11:10/11:10AM
Originally posted by Hbird64
I hope the last Google update will give us a higher PR.
Since a few days we have 184,000 (http://www.google.com/search?hl=en&lr=&c2coff=1&q=allinurl%3A%22www.joeant.com%22+site%3Awww.joeant.com&btnG=Search) links in Google.
Hugo
I’m not too far behind you with 152,000 but the trend has been to add a lot of pages then fall again in a few weeks when the low or no content pages get dropped. A couple of the new directories are way high but they drop quickly after the link campaigns stop.
sitetutor
18-02-2005, 02:30/02:30AM
Now it's 241k !
Dave Hawley
18-02-2005, 02:56/02:56AM
The JoeAnt directory brings me more traffic than any other directory, for me.
Also, Google estimates can be waaaay out. I see, going by estimates, that the other SE still have JoeAnt pages in the hundreds.
sitetutor
18-02-2005, 03:07/03:07AM
Times change, Google doesn't like to be figgured out anymore.
polarmate
18-02-2005, 11:07/11:07AM
Knowing your high levels of quality for listings and review, I'd say it's well-deserved. :thumb:
Hbird64
18-02-2005, 15:18/03:18PM
JoeAnt directory brings me more traffic than any other directory
It's always nice to see that someone likes us 8)
Hugo
sitetutor
18-02-2005, 15:36/03:36PM
Hey Hugo ... small world :D
Tygo.com
25-02-2005, 12:33/12:33PM
Good job.. does that include Alexa pages?
Tygo.com
26-02-2005, 08:46/08:46AM
I just looked it up it looks like about 90% of your pages indexed are from Alexa? Just wondering?
Glo
26-02-2005, 12:21/12:21PM
I'm not sure what you are asking. The pages that are indexed by Google have nothing to do with building the directory.
Quadrille
26-02-2005, 12:45/12:45PM
Soory, I'm confused - I just cannot see why all these pages are a "Good Thing" for anyone - I've looked at a few of them, too ...
I'm one of those folk who think the number of pages listed from a directory would approximately equate to the number of categories in that directory. And I can see distinct disadvantages in having pages that really don't serve much of a purpose.
But if I'm missing some benefit for users, I'm willing to learn :)
Tygo.com
26-02-2005, 13:10/01:10PM
Im wondering too since most of the pages are all of their info. pages? Just a curious question? but if it works good for them.
Glo
26-02-2005, 17:37/05:37PM
I don't think I can answer what you are trying to ask because our users do not benefit from JoeAnt's being indexed by Google or any other search engine. JoeAnt benefits by being indexed only in that it can be found by potential users.
If I understand your real question, you both are equating users to Webmasters and/or Web site owners. We do not consider Webmasters/owners as users, though they may use our directory. So, basically, I can not answer your question. I have passed it on to someone who may be able to provide an answer.
JoeAnt
26-02-2005, 18:20/06:20PM
The info. pages come no where near making up 90% of the pages indexed by google. The bulk are from the Keywords searched and actual directory pages. The info. pages also play a VERY small roll in the traffic we receive from Google. I have actually considered taking them down for bandwidth reasons, but continue to get positive responses from users who like the feature.
Quadrille
26-02-2005, 19:01/07:01PM
Originally posted by Glo
If I understand your real question, you both are equating users to Webmasters and/or Web site owners. We do not consider Webmasters/owners as users, though they may use our directory. So, basically, I can not answer your question. I have passed it on to someone who may be able to provide an answer. I am not equating this to users SEO, webmasters or site owners. I am referring to human beings searching the web.
So these thousands of ages are made up out of search results, directory pages and "info pages"?
Still doesn't make sense to me; when I search Google, I find it extremely annoying to follow a top ten link and get another page of links - I search for content! From your response, you seem to be suggesting that people want to find Google packed with secondary searches?
I'm also still confused by the logic of allowing Google to spider 'info pages'.
How is Joant's policy different from Google packing?
polarmate
26-02-2005, 19:10/07:10PM
Originally posted by JoeAnt
The info. pages come no where near making up 90% of the pages indexed by google. The bulk are from the Keywords searched and actual directory pages. The info. pages also play a VERY small roll in the traffic we receive from Google. I have actually considered taking them down for bandwidth reasons, but continue to get positive responses from users who like the feature. I am not a proponent for directory SERPs being indexed by search engines for many reasons. I'd rather find the topic pages as all search scripts are not created alike. For keyword search, I would rather use search engines than directories.
Also, consider what would happen if DMOZ or Google Directory allowed their SERPs to be indexed.
ihelpyou
26-02-2005, 19:18/07:18PM
Oh boy joeant; This is exactly the type of directories I dislike. I also hate seeing serp results of directories in search engines. Why do you think it's okay to get those types of pages indexed, and why does it look good?
Glo
26-02-2005, 19:26/07:26PM
Quadrille, since you quoted what I wrote I'm assuming your response questions are for me but I can't answer your questions as I already stated.
From your response, you seem to be suggesting that people want to find Google packed with secondary searches?
I never said that, in fact I never made any assumptions of what people might want from Google.
How is Joant's policy different from Google packing?
I'm afraid I don't know what you are asking. Google packing means what?
Glo
26-02-2005, 19:31/07:31PM
Okay, I'm in over my head here. I know nothing of SERPs, indexed or otherwise. I'm bowing out of this exchange.
JoeAnt
26-02-2005, 19:49/07:49PM
When we created our (http://www.joeant.com/keywords.php) recent keywords page, google started to follow it. At the time I was under the impression that Google would not index links with programming text (I.E. http://www.joeant.com/DIR/search.php?keywords=Keywords%here) so nothing was done to prevent it. Now that the no follow tag exists, that's something I'll implement shortly.
polarmate
26-02-2005, 19:51/07:51PM
Joeant does not have a robots.txt file. You could use that to disallow the bots. It will save both you a lot of bandwidth.
Connie
26-02-2005, 21:31/09:31PM
I thought this had been discussed before when I read it earlier.
JoeAnt what you are doing is allowing SEs to index search results on your site. Based on my limited understanding on the tech side of this, this will create a lot of pages that are useless duplicate pages in the SERPS. You need to disallow robots from crawling your search results.
I appreciate the traffic I get from JoeAnt. However, when Google knocks you down because of what your doing there won't be any traffic from JoeAnt.
In this last update there seems to be a lot of evidence that Google is cracking down on this kind of stuff. I don't think you had bad intentions in mind but Google can't tell your intentions from the other 1000 directories out there that only exist for Web Masters and for Adsense.
JoeAnt is a respected directory. Please keep it that way. Don't worry about the number of pages in the index. Worry about the quality of the pages in the index.
polarmate
26-02-2005, 21:43/09:43PM
Excellent advice, Connie! :cheers:
Quadrille
26-02-2005, 22:19/10:19PM
Glo:
Apologies for the confusion - the first part of my note was a response to you - just to make clear my concerns were for searchers not site folk, but the rest of my note was directed to JoAnt's note, and I didn't make that clear.
Google Packing is a piece of jargon I made up to describe the practice of creating thousands of unreal, content-free directory pages 'on the fly' in order to get directories wider listing in Google - a practice that has recently led to many directories being delisted by Google. While I am not suggesting that JoAnt is deliberately sabotaging Google (unlike many of the delisted directories!), I do believe that Google may fail to make that distinction.
Which would be catastrophic for JoAnt, and a loss to all of us :)
ihelpyou
27-02-2005, 07:36/07:36AM
Exactly.
Yes JoeAnt; Simply use a robots.txt file to "disallow" those types of pages from being indexed. They will then eventually fall out of Google. We would all hate to see what's happened to many other directories lately happen to you as well. We all know you did not do this purposely, "unlike" many directories out there owned by "seo's". Their stuff is "intentional".
JoeAnt
27-02-2005, 13:13/01:13PM
Can I ask a huge favor from you guys? :hi: I've tried my darnedest to read up on the robot.txt file, but haven't the slightest idea how to make one or even where to put it. It seems several posts offer contradicting methods. If you guys would help me walk through it, I'd greatly appreciate it.
Bernard
27-02-2005, 13:23/01:23PM
http://www.robotstxt.org/wc/norobots.html
User-agent: *
Disallow: /DIR/search
should work I believe.
Create a text file with Notepad, paste that text in it, save it as robots.txt and upload it to the root directory for the domain.
Glo
27-02-2005, 18:57/06:57PM
Bernard, are those info pages actually search pages? I'm not sure they are but then I am clueless with a lot of this kind of stuff. If they are not actual search pages how will that text stop the bots?
Bernard
27-02-2005, 19:32/07:32PM
Glo,
I'm assuming that the pages in question are following the format described by JoeAnt a few posts up:
http://www.joeant.com/DIR/search.php?keywords=Keywords%here
WebSavvy
27-02-2005, 19:43/07:43PM
Joe, it's very easy to do the robots.txt file (we also have one on websavvy too).
Use a text editor, create a text file which you will save as robots.txt
Inside the text file (robots.txt) add the following lines (in blue):
User-agent: *
Disallow: keywords.php
User-agent: *
Disallow: /DIR/search.php
Then afterwards, FTP the robots.txt file into your root folder (public_html) where your index page resides. The robots.txt file needs to be web accessible.
You should be able to see it by going to:
http://www.joeant.com/robots.txt
Using the wildcard as Bernard suggested (e.g., /DIR/search*) isn't supported by robots standard. It is an acceptable method, however not all bots understand this.
I have found this out myself while using wildcard on our add.php pages because they are in every category. I placed /*add.php and Google has stopped getting some of them on and off, but other bots are getting them anyway.
It doesn't matter for these though, as I'm not too concerned over it.
Glo, yes they are "search pages."
When user comes to JoeAnt and does a search for something this search is "recorded" by the keywords.php file as Joe indicated.
This creates the trail which is getting indexed by the bots and is ending up in the search engine database. This is what some less than ethical directories have done in an intentional effort to beef up their Google indexed page counts.
I know this was not, and is not, the case with JoeAnt. I applaud the fact that Joe is taking steps to prevent these pages from ending up in the search engines.
:cheers:
Glo
27-02-2005, 20:10/08:10PM
Thanks for the detailed explanation Deb. It should make the task easy for Jerry to implement. I use robot.txt on my site which is why I asked Bernard that question. It just didn't look right but then I'm no expert either. :D
Bernard
28-02-2005, 10:22/10:22AM
Deb's solution is more precise than what I offered, but just to be clear, I did not suggest using any wildcard and my suggestion is correct with regards to the standard:Disallow
The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html.
WebSavvy
28-02-2005, 10:31/10:31AM
Oh, I know you didn't suggest using wildcards, Bernard. :)
I just gave the extra information about it, because I have used it.
I see lots of files around where it's being used, and it's not really "supported" by robots standard, although it is an acceptable method to use.
I used it on our add.php pages as I stated ->
Disallow: /*add.php
Using * in the path is how wildcards are used. Some bots understand it, and some do not.
Joe said he's been reading around about it, and seeing too much conflicting information, and I had no idea if he read anything with regard to wildcards. So, to be thorough, I simply thought to include this information as well.
Hope I didn't confuse anyone. :)
Alan Perkins
28-02-2005, 15:19/03:19PM
There is a slight error with what Deb posted. You need this code in your robots.txt file:
User-agent: *
Disallow: /keywords.php
User-agent: *
Disallow: /DIR/search.php
WebSavvy
28-02-2005, 22:58/10:58PM
Yep. 'Ole eagle eyes Alan. :)
I missed the forward slash on keywords.php
I wrote:
User-agent: *
Disallow: keywords.php
when it should have been:
User-agent: *
Disallow: /keywords.php
That's what I get for giving out tech advice when I'm tired. Too many long hours these past few days. :(
Connie
28-02-2005, 23:33/11:33PM
That's what I get for giving out tech advice when I'm tired. Too many long hours these past few days.
We all get caught up in that. :)
JoeAnt
28-02-2005, 23:37/11:37PM
See why it got complicated for me? :scattered
If there are no other corrections by tomorrow, I'll have it up tomorrow evening. I really appreciate the help you guys.
Let's hope the other directories follow the advice given here in this thread...*Cough*Tygo.com*Cough* ;)
WebSavvy
28-02-2005, 23:48/11:48PM
LOL @ Joe!
Ours has never had anything where SERPs would be followed by accident because we don't do a top searches/keywords list.
Even though we don't, our search.php page has been blocked via robots.txt ever since websavvy came online back in '98.
Bruce from WOW has his blocked too, and there are a few niche market ones I know of that block their's too. However, you won't find too many directory owners who will take responsibility and block it.
In fact, I've heard from some other directory owners "Why should I block it? My directory search results being in Google brings traffic to the sites I have listed and to my directory."
My response, "Isn't that what your category pages exist for?"
From the perspective of a searcher, I don't want to go to Google and do a search for something and get some SERP from a directory where someone else was looking for the same thing. I want CONTENT and I want the SITE not the SERP from a directory. To me, that is useless.
[edited typo]
Quadrille
01-03-2005, 01:42/01:42AM
Originally posted by Alan Perkins User-agent: *
Disallow: /keywords.php
User-agent: *
Disallow: /DIR/search.php
Just out of interest, how would that be different from:User-agent: *
Disallow: /keywords.php
Disallow: /DIR/search.php
WebSavvy
01-03-2005, 01:52/01:52AM
The difference is, that bots belonging to major SEs can follow stacked directives in robots.txt whereas bots belonging to minor SEs might not have that capability.
This would cause you to run the risk of having Disallowed directories indexed by a bot from a minor SE because it doesn't have the ability to sort through stacked directives in robots.txt.
I myself, do not stack the directives in robots.txt for this very reason.
Quadrille
01-03-2005, 01:59/01:59AM
It's not often that I get to learn something new before 7 am - Thanks! ;)
WebSavvy
01-03-2005, 04:19/04:19AM
Hey Joe, I just noticed that Ranking-Manager (http://websitemanagementtools.com/ranking-manager/engines.php) has your directory in his software now!
You won't have any luck with him in trying to get him to remove your directory. We were in it too. Even our Lawyer sent notification to him, which he ignored.
His software ran up over 40 GIG BW on my server in one month. I ended up writing a script to stop him. He tried to fix his lame crap two or three times after that, but it didn't help him. :D
If you'd like a copy of the codes, and how to install it into your search script, please PM me and I will gladly supply you with this.
Alan Perkins
01-03-2005, 04:27/04:27AM
Originally posted by Quadrille
It's not often that I get to learn something new before 7 am - Thanks! ;) No, actually you were right. This should be the code:
User-agent: *
Disallow: /keywords.php
Disallow: /DIR/search.php
You should not have two separate records for the same user agent. I didn't notice that the two records had the same user agent name.
WebSavvy
01-03-2005, 04:56/04:56AM
Alan, there's no User-agent name there -- it's a wildcard (e.g., * )
From everything that I've read about this, the recommendation is to not stack directives but to instead list the Disallow on a per file/folder basis.
The reasoning provided in the aforementioned practice was based upon suggestion that not all bots have the ability to sort through stacked directives, unless this has since changed?
Alan Perkins
01-03-2005, 05:14/05:14AM
By "not stacking directives", Deb, I think you are referring to multiple user-agent lines, something like this:
User-agent: Slurp
User-agent: Googlebot
User-agent: MSNBot
Disallow: /keywords.php
Disallow: /DIR/search.php
That may cause a problem with some bots, although it is valid.
There is definitely no problem with multiple Disallow lines - it's what you're supposed to do and all bots understand it. :)
What you're suggesting is valid is actually invalid:
from A Standard for Robot Exclusion (http://www.robotstxt.org/wc/norobots.html)
If the value is '*', the record describes the default access policy for any robot that has not matched any of the other records. It is not allowed to have multiple such records in the "/robots.txt" file
WebSavvy
01-03-2005, 05:54/05:54AM
Yep, Alan, that's exactly what I meant. I was just now thinking about that when I realized that I didn't post it correctly.
I've had maybe 6 hours sleep over two days (too much work to do) and so, am not real on top of things right now.
Anyway, yes ... what I meant to post (and know is correct)
To disallow or allow a wildcard user-agent where no specific name is specified for the directive it's done as:
User-agent: *
Disallow: /folder1
Disallow: /folder2
Specific agents must be in this format:
User-agent: LinkWalker
Disallow: /
User-agent: Walk-Alot-Bot
Disallow: /
You can stack directives for agents as in this manner (although it may create problems -- therefore I do not do it):
User-agent: LinkWalker
User-agent: Walk-Alot-Bot
Disallow: /
What you posted above in your reply, Alan, is exactly what I meant by stacking directives (for user agents).
OK, today I will get some sleep ... LOL
Alan Perkins
01-03-2005, 06:33/06:33AM
User-agent: *
Disallow: /keywords.php
Disallow: /DIR/search.php
ihelpyou
01-03-2005, 07:56/07:56AM
Nice thread.
This will help many out there I'm sure. :)
JoeAnt
04-03-2005, 21:38/09:38PM
All done. Thanks for the help guys!
ihelpyou
05-03-2005, 05:01/05:01AM
Very good JoeAnt!
Alan Perkins
05-03-2005, 13:17/01:17PM
Originally posted by JoeAnt
All done. Thanks for the help guys! I get a 404 from www.joeant.com/robots.txt
ihelpyou
05-03-2005, 14:47/02:47PM
lol So do I.
It helps to put your robots file into the right file on the server.
WebSavvy
05-03-2005, 14:58/02:58PM
He sent me a PM yesterday, and he's sick right now. Maybe he just didn't feel well enough and thought he'd put it up after he's feeling better.
Get well soon, Joe. :)
JoeAnt
05-03-2005, 16:41/04:41PM
I put it in the folder JoeAnt's index file is in and all attributes are correct. Any ideas what went wrong?
Dave Hawley
05-03-2005, 17:11/05:11PM
Joe, it cannot go in any "folder", it must be uploaded to the root.
See: http://www.robotstxt.org/wc/exclusion.html where you can check it after uploading.
WebSavvy
05-03-2005, 17:14/05:14PM
If your index page is in /home/joeant/public_html/
.... then robots.txt goes there too.
It must be a text file and have the .txt extension.
If you have mod_rewrite enabled to do dynamic folders, on a UNIX server with SUExec installed, it causes mod_rewrite to run as a CGI module, which therefore requires apache handlers to be added (or you can set txt/html to parse as php).
If you've chosen the latter, it will cause any .txt extension files to produce either a 404 error message, or if a 200 response, the file usually wraps and makes it difficult at best for the bots to read/obey/follow your directives.
polarmate
05-03-2005, 18:05/06:05PM
Check the name of the file and ensure that it is all in lower case.
JoeAnt
05-03-2005, 20:53/08:53PM
It's definitely in the correct folder and is all lower case. As for the mod_rewrite, I couldn't tell ya. I tried removing "Options ExecCGI Includes" from the directives, but the robots.txt file still wasn't showing. That was the only thing I noticed under the directives that mentions CGI. Since I don't use Perl or CGI, I figured I could remove it. I can give my host a call on Monday and see what they think.
Thanks for the get well message. Thought I was over it, but woke up feeling like poo this morning. Nyquil, Chloriseptic and UNIX don't mix. :sleep:
spectregunner
05-03-2005, 22:07/10:07PM
Nyquil, Chloriseptic and UNIX don't mix.
Maybe, maybe not, but there is a school of thought that Neosporin cures everything.
First you get a very large tube......
polarmate
05-03-2005, 22:10/10:10PM
Are you sure, Frank?
I thought it was Windex (http://www.imdb.com/title/tt0259446/)...
:p
John Brown
05-04-2005, 09:04/09:04AM
Joeant has 85,000 pages indexed by Google. Congrats to you folks, thats quite impressive.
That's not nearly as impressive as Yahoo indexing 880,000 pages (http://search.yahoo.com/search?p=site%3awww.joeant.com&ei=UTF-8&n=20&fl=0&dups=0&xargs=&fr=sfp&dups=1) !
I am not sure if congratulations are due, as some entries are not 'right', according to the discussion here (http://www.ihelpyouservices.com/forums/showthread.php?s=&threadid=15302). What do the expert think?
Quadrille
05-04-2005, 10:21/10:21AM
I'm no expert in these matters, but I suspect it's because Yahoo! has yet to read the new robots.txt - it'll probably be correct by 2013!
JoeAnt
05-04-2005, 10:25/10:25AM
The robots.txt file isn't visible yet. :( I'm working on it.
<edit>I must have been extremely sick. I found the problem after making every attempt to get the file shown. It was a ridiculous mistake on my part. I accidentally loaded the file in the JoeAnt.com directory from my old server which I still use to host other sites and that still has my old JA content. Stupid mistake, but it's up and working now. If you guys can check it to make sure it's done properly, I'd greatly appreciate it.</edit>
I just want to go on the record and say that indexing search results is SPAM! We don't agree with others doing it and regret having our results spidered and indexed. Please know that we did not do this on purpose.
JoeAnt
05-04-2005, 11:58/11:58AM
While I had the Rackspace.com Tech. on the phone, I also solved another one of our issues with the Charset. We're now able to display those special characters from Europe. :cheers: :umbre: 8)
sitetutor
06-04-2005, 07:57/07:57AM
Nice! I believe that Joe Ant has a ton of potential. Live up to it :)
vBulletin® v3.8.3, Copyright ©2000-2010, Jelsoft Enterprises Ltd.