View Full Version : Write a meta-search engine
ttpong
02-10-2001, 12:23/12:23PM
I am a student. Recently, I need to do a project of writing a meta-search engine. I have read serval papers and have a general knowlege on meta-search engine.
But I do not know how to start my project. For example, what language should I use to write.
Can anyone help me?
Thx a lot~~
ihelpyou
02-10-2001, 12:26/12:26PM
Welcome to the forums ttpong! :hi:
I cannot help you with that question but we have some members here who are programmers and who may be able to steer you in the right direction.
JuniorHarris
02-10-2001, 13:02/01:02PM
Welcome ttpong!~ :hi:
I imagine the language would depend on the server platform, are you interested in using Apache, Linux, Microsoft? The key to a great meta-search engine will be speed, so whatever the choice it must be one which executes efficiently and allows linear scalability. (whoops, does that rule out Microsoft? <lol>) You might be able to leverage whatever programming experience you already have, if you are familiar with any number of web and/or server programming languages.
ttpong
02-10-2001, 13:24/01:24PM
Thx for help~~~
I want to ask, does any references that is talking about how to write a meta-search engine?
As I come across many papers, but it just mentioned on the architecture of its system, not talking about how to translate the query into suitable forms, how to send the queries to different search engine...etc
Where can I find such references?
JuniorHarris
02-10-2001, 13:36/01:36PM
By the time the information is written down it is likely to change!~ You need to understand how each individual engine handles searches, then write your requests to pass your queries in the same format. For example to search on AltaVista the URL would be http://www.altavista.com/sites/search/web?q=search+terms (http://www.altavista.com/sites/search/web?q=search+terms). Your code most parse the same url, but substitute the search terms with whatever your meta-searcher uses.
Spider Man
02-10-2001, 14:06/02:06PM
I echo what JH says. Also keep in mind that it is not just the search query you need to know but normally also the format of their results page to extract the data (depending on just what you want to do). You will almost certainly need to look at the data for yourself rather than reading any papers about it - but it tends to be fairly obvious if you do view->source on a results page in your browser. I'd limit it to a max of 5 engines so it's not too much work and so you don't have to use multi-threading to keep a reasonable speed. Then any language should do - my personal preference is Perl because it's good at this sort of pattern matching, but definitely do it in whatever you already know.
Have fun!
ttpong
03-10-2001, 00:50/12:50AM
Thanks~
Indeed, my project is related to book searching and need to search book information from different places (such as library) on the web.
Now I understand that how I can send the queries to different search engines.
But I still wonder the situation that how I can get the searching result from the different search engines and analyse it?
JuniorHarris
04-10-2001, 09:14/09:14AM
The algorithm for *your* engine would depend entirely upon you...it might gather the top 10 results from each engine then combine them all together and possibly rank those listings highest which appear most often (and near the top) for each engine. Possibly you could extrapolate the position on each engine and accumulate this for providing positioning on your meta results. For example if a listing had a number one position on each engine you could accumulate the position and divide it by the number of engines for your position. (number 1 * 10 = 10 / 10 (engines) = 1st position on yours. Likewise if the listing was number 10: (number 10 * 10 = 100 / 10 = 10th position on your engine. Granted in reality it may not be this simple, but hopefully this helps and you get the idea.
ttpong
04-10-2001, 13:09/01:09PM
Thx..
But I want to know, after I send the query to the search engine, how can I get back the result and in what form?
The result will be in html format and send back to me? or in other format?
=)
JuniorHarris
04-10-2001, 13:12/01:12PM
It depends on the language and method used to query the engine. There are a number of methods available for returning the entire page into a string, which then must be parsed by the program itself.
ttpong
04-10-2001, 13:30/01:30PM
Indeed, I do not have any knowledge about search engine or meta serach engine before doing this project.
After reading some papers, I still cannot have a clear concept on the whole process of seaching.
The first step, surely the user enter the keyword or sth at the web interface, after that, the query will be sent to ???
and then the query will be translated to suitbale form (but in where to do thus task?) and sent to the search engine ....., and the search engine will return the result (how the search engine return? I can control it? ), at last we will analyse the result and rank it and return to the users.
Is it the process like that? And the different parts need to be run in where?
Thx~
ihelpyou
04-10-2001, 13:56/01:56PM
hey pong,... I would say that if you are not a programmer, you will have a hard time with any project on this.
I would never attempt any project such as that. You have to have programming skills.
ttpong
05-10-2001, 13:10/01:10PM
There may be misunderstanding.
I do not mean that I do not have programming skills. What I mean that I have not done any project related to search engine before.
Indeed I am a student studying computer related subjects. My project just needS to do a simple library meta search engine, which can connect to some universities' library seaching system and gets the result.
So now I just wonder how the whole process of searching is...
(especially after sending HTTP request to a particular search engine...how can I get the result? by what method?)
JuniorHarris
05-10-2001, 15:38/03:38PM
ttpong you might want to review ASPTear (http://www.alphasierrapapa.com/IisDev/Components/AspTear/), (it is for ASP) but is not meant as a solution, but merely provided to possibly add insight.
vBulletin® v3.8.3, Copyright ©2000-2010, Jelsoft Enterprises Ltd.