PDA

View Full Version : Phase II of My World Domination Plan


MazY
25-08-2001, 16:46/04:46PM
I asked this question some time ago in the SE newsgroups, only to be hit with a resounding "I dunno".

As we all know, Google happily indexes PDF files. But what would happen if the content (the actual text) of the PDF was the same as the web pages.

I ask because I want to make PDF files of each of my web pages, for reasons that are best known only to my bizarre mind.

Anyone think it would be flagged as duplication? I can't see it myself but I'm damned if I want to be the one to test that theory.

ihelpyou
25-08-2001, 17:02/05:02PM
errr. yes they would.

Googlebot is programmed to look for "duplication" in any form. Since Google now indexes PDF, the Googlians would certainly program Googlebot to do the same thing as they do for all pages.

IMO

ihelpyou
25-08-2001, 17:03/05:03PM
World Domination Plan 11 is blown up on first strike. :D

MazY
25-08-2001, 17:11/05:11PM
Not that I doubt you but I have decided to write to Google to get the definitive answer for once and for all. It is a vital part of my world domination plan so I better get it right!

ihelpyou
25-08-2001, 17:12/05:12PM
Thinking about it some more, when Google spots "duplication", normally one of the pages is dropped, not both. Problem is, it could be either one.

Let us know if you get a reply back.

MazY
25-08-2001, 17:15/05:15PM
I just thought of a good example as to why I should doubt you. Some months ago, I designed a site for a client. For reasons that I won't go in to, they wanted both PDF and Word 2000 versions for their pages.

Link (http://www.google.com/search?sourceid=navclient&q=sulphate+heave)

The particular site is listed at #1 and #2

Be sure to look at the site map where you will see the differing downloads.

ihelpyou
25-08-2001, 17:18/05:18PM
The link is broke. I have no doubt that duplicates could last awhile, but I would be very hesitant to test it out unless Google actually said "sure, no problem".

ihelpyou
25-08-2001, 17:24/05:24PM
Yes. I can certainly see the advantages of having both in the index. I would be scared to try it though. :ignore:

MazY
25-08-2001, 17:29/05:29PM
I fixed the link.

I should also say that I have, in my time, come across quite a few sites that have a "Download this article in PDF Format". Now it could be that they are using an exclude to prevent these from being indexed.

Hmmm. Methinks I shall have to wait until I hear from Google to find out for sure.

MazY
25-08-2001, 17:30/05:30PM
I should also point out that neither the web page nor the PDF files in question have been dropped by Google.

ihelpyou
25-08-2001, 17:33/05:33PM
Yep. But I would worry about them eventually being dropped,... at least one of them. Sometimes it does take awhile. Then, it could be either one.

Get the Direct answer from the Googlians themselves as they are the only one's who know for sure.

Mel
27-08-2001, 01:56/01:56AM
Hi MazY

I assume (perhaps incorrectly) that you want to make "printer friendly' pages of your site pages.

If that is the case the files would be so much different that I think Google seeing them as duplicates is not a problem, since I would guess that most of your graphics, navigation links, and the content between your <head> tags would be quite different and in any case would result in a different file size.

ihelpyou
27-08-2001, 07:22/07:22AM
yes. If they are printer friendly, no problem. These forums have many printer friendly in the index. I do not worry about that at all. I do not think Maz is talking about that though.

MazY
27-08-2001, 10:44/10:44AM
Originally posted by Mel
Hi MazY

I assume (perhaps incorrectly) that you want to make "printer friendly' pages of your site pages.

If that is the case the files would be so much different that I think Google seeing them as duplicates is not a problem, since I would guess that most of your graphics, navigation links, and the content between your <head> tags would be quite different and in any case would result in a different file size.


I think you've got it. What I would be duplicating is the text that the visitor reads on the page. No HTML code at all.

To my mind that in itself would make diference enough to warrant it being classed as non-duplicate?

ihelpyou
27-08-2001, 10:50/10:50AM
By gosh Maz, you should be fine with that! :up:

JuniorHarris
27-08-2001, 12:32/12:32PM
Superb analysis...I think he's alive Igor!~ ;)

Both documents would have to be stripped of their respective rendering and compared at their natural word content. Google's good, but I'm not sure that good?!?

ihelpyou
27-08-2001, 12:38/12:38PM
Compared with what the new index is revealing, Google is very bad right now.

Sheesh, pathetic ranking of doorways and cloaked. One site has 2 top ten ranks on the same keyword phrase using the exact same domain. Pathetic.

The thing is, Google has been told in prior months about the sites in question but they choose to ignore all of the requests to get the crap out of the rankings. I am shocked.

ihelpyou
01-09-2001, 20:04/08:04PM
hey Maz, question for you. I get referrals from Google for a slew of different phrases, as you can imagine. The latest was this:

http://www.google.com/search?q=how+yo+get+listed+on+google%3F&btnG=Google+Search

You can see that the "printthread" page is with the regular page as well. This happens a BUNCH. Should I worry about this. I am paranoid but do not think I have to worry.

My worries would be much less if it was not for the fact that the forums come up 1 and 2 on many different things.

MazY
01-09-2001, 21:53/09:53PM
Absolutely not. In fact, I would be a happy camper if I were you.

Whilst not on Google so much these days, there was time not so long ago on Lycos that for about three phrases, I owned nine out of the top ten positions. My worst case (or best case, whichever way you look at it) was something like 23 out of 30!

I have had similar circumstances to what you describe on Google too but not so much. Never have I had any negative effect from it.

You know me - I'm all for that "catch as many phrases" as you can. I used to be a big believer in "If I own 90% of the top ten then that narrows my competition down quite a lot." After asking many clients views however, they said that they didn't like sites that appear like that. They see it as cheating, even though it isn't and even though they wouldn't know of it was or not, that is how they perceived it to be.

I would not worry unless you really began to swamp the positions then you may just want to make some tweaks before the SE Gods do it for you.

Just shows though - how influencial a typo can be. "how yo get listed on google?" Run the same search with the correct spelling and yer nowhere about...

I think that's answered yer question?

ihelpyou
01-09-2001, 22:03/10:03PM
LOL. Yea but the thing is I am not trying to get the ranks much at all. They simply come. If there were problems, I would have to put a "deny" Googlebot in all the "printthread" Url's, which I do not wish to do. Sometimes those printthread threads are ranked by themselves as well.

sheesh, never had this kind of problem before. Wait til the PageRank goes even higher, which it will over time.

Leads me to another thought. The 'profile' pages are Never spidered and indexed. The fact that these forums allow a signature file is very important to me because of this. All members can reap the benefits of it especially when the PR goes way up over time. Can only serve to give all a boost in some way where if I just allowed the Url to be in the profile, members would get zero for it.

oh yea, I noticed the "yo" as well. You would be amazed at all the different spellings and phrases these forums are coming up for. The things people type in the box truly boggles the mind. :)

MazY
01-09-2001, 22:07/10:07PM
Originally posted by ihelpyou
LOL. Yea but the thing is I am not trying to get the ranks much at all.

They are my favourite ones. When I check the stats at the end of each month, I look at the search phrases and absolutely marvel at some of the ones that come up. Obviously if it's only one a month then it's not much use but I have one phrase that I get used loads and in no way did I optimise for it! But I'm happy to have it.

Once I verify that my "competition" is finally dead and buried in the rankings next month after following me for so long, I shall be including a sig in the forums.