PDA

View Full Version : essential reading on search engine development


runarb
05-06-2004, 11:46/11:46AM
What is the essential reading on search engine development?

I have found help in these:

Books:
Modern Information Retrieval
Ricardo Baeza-Yates, Berthier Ribeiro-Neto
http://www.sims.berkeley.edu/~hearst/irbook/

Managing Gigabytes
Ian Witten, Ian H. Witten, Allistair Moffat, Timothy C. Bell
http://www.cs.mu.oz.au/mg/


Webbooks:
INFORMATION RETRIEVAL
C. J. van RIJSBERGEN
http://www.dcs.gla.ac.uk/Keith/Preface.html


Articles:
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Larry Page, Sergei Brin
http://citeseer.ist.psu.edu/brin98anatomy.html

Building a Distributed Full-Text Index for the Web
Sergey Melnik, Sriram Raghavan, Beverly Yang, Hector Garcia-Molina
http://citeseer.ist.psu.edu/478324.html

The PageRank Citation Ranking: Bringing Order to the Web
Larry Page,Sergey Brin, R. Motwani, T. Winograd
http://citeseer.ist.psu.edu/page98pagerank.html


Search Engines and Web Dynamics
Knut Magne Risvik, Rolf Michelsen
http://citeseer.ist.psu.edu/risvik02search.html

Focused Crawling Using Context Graphs
M. Diligenti, F.M. Coetzee, S. Lawrence, C.L. Giles, M. Gori
http://citeseer.ist.psu.edu/diligenti00focused.html

Authoritative Sources in a Hyperlinked Environment
Jon M Kleinberg
http://citeseer.ist.psu.edu/kleinberg99aut...horitative.html (http://citeseer.ist.psu.edu/kleinberg99authoritative.html)

WebSavvy
05-06-2004, 14:53/02:53PM
If you want to run your own search engine, you're not going to learn what you need from some book. Every situation is different, and the books just don't apply to that. That's not to say that there isn't good information in them, however. :)

The best thing you can do is firstly, decide what programming environment you want to develop in. Will it be Perl, PHP, XML, Python, ASP, or something else? Do you have any programming knowledge? If yes, then you should be solid. If no, then you'll need to find someone who can program for you.

Then you need to decide what type of database you plan to use. Will it be flatfile (text), MySQL (if MySQL -- MyISAM? BLOB? something else?), SQL, Postgres, or something else?

Do you have a name picked out for this engine? Is the domain name available? Is it an easy name without any hyphens and preferably a .com TLD?

Have you decided on what type of search engine you want to run? Will it be subject-specific or general? Will you create a directory or just offer a database search?

If you create a directory there's a whole list of things you'll need to know how to do and complete the research on it before you begin the work.

If you offer a search database only, you will need some type of algorithm in order to deliver relevant results. No one will use the search engine if the results are worthless, doesn't matter how much you have indexed.

That's just a small bit, to get you focused on the right track. The books you've listed above, aren't going to tell you that. It's just stuff you learn when you're in that business.

If you have any other questions, just ask. :)

Hope this info helps. :)