I’m in the process of creating a metasearch engine and I’m stuck! Using php I send a query to 3 search engines and pull the top 10 urls from each one. I then store these urls in a 2d array with a corresponding score for aggregation purposes ie. the 1st result gets 20pts, 2nd gets 18pts etc.
so in the following example I query the search engines with ‘php’ and get these results:
Blockquote
Blekko
Array ( [url] => php.about.com/ [score] => 20 ) Array ( [url]
=> php.net/ [score] => 18 ) Array ( [url] => en.wikipedia.org/wiki/PHP [score] => 16 ) Array ( [url] =>
http://www.phpbuilder.com/ [score] => 14 ) Array ( [url] =>
blekko.com/ws/http://php.about.com/+/seo [score] => 12 ) Array
( [url] => http://www.w3schools.com/php/default.asp [score] => 10 )
Array ( [url] => phpnuke.org/ [score] => 8 ) Array ( [url] =>
http://www.symfony-project.org/ [score] => 6 ) Array ( [url] =>
http://www.phpconference.co.uk/ [score] => 4 )Entireweb
Array ( [url] => phpnuke.org/ [score] => 20 ) Array ( [url] =>
http://www.aardvarktopsitesphp.com/ [score] => 18 ) Array ( [url] =>
http://www.php.net/ [score] => 16 ) Array ( [url] =>
http://www.php.net/downloads.php [score] => 14 ) Array ( [url] =>
php.net/manual [score] => 12 ) Array ( [url] =>
http://www.php.net/manual/en/ [score] => 10 ) Array ( [url] =>
http://www.php.net/docs.php [score] => 8 ) Array ( [url] =>
http://www.php.net/license/ [score] => 6 ) Array ( [url] =>
http://www.phplinkdirectory.com/ [score] => 4 )Bing
Array ( [url] => http://www.php.net/ [score] => 20 ) Array ( [url] =>
en.wikipedia.org/wiki/PHP [score] => 18 ) Array ( [url] =>
http://www.php.net/downloads.php [score] => 16 ) Array ( [url] =>
http://www.w3schools.com/php/default.asp [score] => 14 ) Array (
[url] => windows.php.net/download [score] => 12 ) Array (
[url] => windows.php.net/ [score] => 10 ) Array ( [url] =>
http://www.tizag.com/phpT/ [score] => 8 ) Array ( [url] =>
wiki.php.net/ [score] => 6 ) Array ( [url] =>
qa.php.net/ [score] => 4 ) Array ( [url] =>
http://www.php.com/ [score] => 2 )
What I’d like to do is combine all these results, remove duplicate
urls but add the scores and create a new list with the aggregated
results that might look something like:
Array ( [url] => http://www.php.net/ [score] => 54 )
Array ( [url] => en.wikipedia.org/wiki/PHP [score] => 34 )
Array ( [url] => http://www.w3schools.com/php/default.asp [score] =>24 )
etc.
I’m just looking for the most efficient way to achieve this, any advice would be very much appreciated. Thanks
1- You can
trimurls after that you can understand thatwww.php.netandphp.netare the same website (alsowww.php.netandphp.net/downloads.phpare the same).2- Give more points for returning results from Bing. You know that Bing is most semantic search motor.
3- You can catch titles and save them to arrays, it is a personal recommandation.