PDA

View Full Version : how does a database driven web site get indexed?


DoubleV
13-02-2003, 12:34/12:34PM
i understand the spiders have problems following "?" in the URLS, but from what i have read here, google is slowly but surely starts to index more and more of those kinds of URLs. but what about other se? also, if the entire content of the site is db driven (i.e. the only thing that is hard-coded into the page is a "template" and all the text and appropriate images are db derived), does it mean that a spider can read your databade??? or how else would those kinds of pages get indexed?

Matt B
13-02-2003, 14:16/02:16PM
The spider follows your links, as you have links to pages generated from your database, the spider follows them. It is good to have a few static pages with links, like a site map.

There is a lot of misinformation out there, especially concerning spiders and dymanic sites. From my perspective, the problems have been with the site, not the SE. Most good SE's can follow the dynamic links - but only as well as the links are written. Some code-generating software can wreak havoc with spiders and they'll never get through.

DoubleV
13-02-2003, 14:53/02:53PM
Originally posted by SEO Guy
The spider follows your links, as you have links to pages generated from your database, the spider follows them.
uhhh, i see. i was thinking that the spider will only see what the actual code of your page is, amd not the "end result" of what that page generates.
Most good SE's can follow the dynamic links - but only as well as the links are written.
not sure i get the point here. could you please explain?
somebody asked why is this site (http://www.displayconcepts.net/) hardly appearing in SEs. I told them they need more contemt, better title tags, some text on the home page, text links, but with my limited knowledge of SEO I am not sure how their totally db-driven site influences their search engine positioning. it is a domino/lotus site. :confused:

Matt B
13-02-2003, 15:13/03:13PM
Originally posted by DoubleV
uhhh, i see. i was thinking that the spider will only see what the actual code of your page is, amd not the "end result" of what that page generates.

Spiders read through the code and follow the links published in the code. I'm not sure what you mean by the "end result."

You have 13 pages indexed in Google. My immediate guess would fault the javascript navigation as the main culprit of blocking the SE's, not the dynamic links. But, that is just my first guess without looking into it deeper.

DoubleV
13-02-2003, 15:17/03:17PM
yep - that's what I told these guys - to at least put tezt links at the bottom of their pages.

polarmate
13-02-2003, 15:20/03:20PM
A link to a site map might help lead Google to all the pages if the javascript menus pose a problem.

I also read in other threads that adding a noscript tag with the navigational links could also help but there was a bit of controversy about it with some folks saying it could be perceived as spam while others did not think so.

DoubleV
13-02-2003, 15:21/03:21PM
Originally posted by SEO Guy
Spiders read through the code and follow the links published in the code. I'm not sure what you mean by the "end result."
is a DB query a link???
let's say you have a .php page. in it you have code to connect to the database, send a query, and the, again, using the php code insert the returned query result into the html code to be sent to the browser. so where is the "link" component in here?

DoubleV
13-02-2003, 15:27/03:27PM
Originally posted by SEO Guy
You have 13 pages indexed in Google.
i forgot how to check this

Matt B
14-02-2003, 09:02/09:02AM
Originally posted by DoubleV
i forgot how to check this

allinurl: www.domain.com site:www.domain.com

Also,
A DB query is not necessarily a link. I'm not sure I understand what you are getting at. Maybe I just need another cup of coffee.

Webmaster T
15-02-2003, 08:09/08:09AM
Originally posted by DoubleV
or how else would those kinds of pages get indexed?

Ok, when you read the page are you reading the database or the page. A spider is a browser so to speak. Luckily it isn't netscape or we'd be in a real pickle!:cheers:

There are browser archives where you can get the very first browsers ever made. Take a look at a page using that, it is pretty close to what a SE sees. A SE makes an HTTP request exactly the same as your browser. When the server receives the request it responds by first putting the page together, calls to the DB and your template. It puts it all together and sends it back to the browser/SE as an HTTP response the browser/SE receives this and renders the html the server assembled before sending the response. In an nutshell that is how anything gets anything over the internet ie: http protocols.

The querystring or qstring as I'll call it, passes parameters to the database using those same protocols discussed above. A query is executed and voila you're data driven!

If it's the last thing I will do it is destroy the myth that Data driven sites aren't indexable, especially by google! If google doesn't find a page it is seldom the result of qstrings alone.
I used them for the first time in 1998.

I, because of this myth was of the opinion I would have no problem. I did but it wasn't what i expected! It was the double pages I now had because I was using the scripts to do printer friendly pages with all images removed. Google can index PDF is a silly qstring going to stop it?:rolleyes

Google may however shy away from too many pages with qstrings out of fear of being in a bad program (endless loop, nasty and can do big damage). However yesterday I was discussing visibility with a client and we decided to try a tool I'd found and see what was indexed. We found a competitor who has 33,000+ pages indexed.

Only two engines are potential problems. AV because it is old and dumb as a fencepost!!:D

INKY potentially a problem, easily fixed by using Inclusion. I'm not sure about all the partners but Postion Pro by position technologies will for sure index links with qstrings.

Visibility problems with Google are mostly poor link architecture ie: feed the po' boy he's starvin' for spider food! I'm often asked what's the secret to my success on Google. It is simple I use what I call the two click rule. Once on the site every page should be reachable in two clicks. I don't care if you have 50,000 pages.
Hint that's why they call them index or indexing pages as I prefer to call them:)

Good link architecture doesn't require a silly sitemap, every page is a sitemap if you are doin' it right. SE will always reward you for feeding them well!

JuniorHarris
01-07-2003, 13:35/01:35PM
Always important to remember that a search engine can and only will index that which is returned in the HTML stream (be it user browser or spider). Database driven sites typically query the database and build pages on the server, delivering the completed page (template and data) to the requestor. Regardless of what occurs server side, the user/spider will only see the resulting [completed] page.

As far as query strings are concerned, yes it is true that some engines are getting better at indexing these. But the use of query strings may not only effect search engines, but users too!

Much easier for a user to remember (and share) a url such as
fruitstand.com/apples/macintosh/
then it would be to remember
fruitstand.com?product=apples&type=macintosh. :eyes:

WebBug (http://www.cyberspyder.com/webbug.html) is a great utility to see exactly what is sent to the web server and exactly what the web server sends back. No affiliation here, I just use the heck out it!