i wrote a simle webcrawler. i fetched all the websites and but them on

Question

0

Asked: May 22, 20262026-05-22T03:07:12+00:00 2026-05-22T03:07:12+00:00

i wrote a simle webcrawler. i fetched all the websites and but them on

0

i wrote a simle webcrawler. i fetched all the websites and but them on my hdd.
now i want to analyse them, so i could write a simple interface like http://www.google.de and search for information in my fetched pages.

the problem is how to find out the important informations in a “fast” way. so the calculation is important. it could be realtime or after the fetch. my idea is to write a dictonary with a list of english words and count the entrys… or what to do ? i need lecture how to extract information and compress them. but i dont know where to look.

the crawler is based on c++ with mysql where the links stored.

i hope my question is clearly. 😀

btw sry for my bad english but there istn a board like this in german 😛

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T03:07:13+00:00

The science of Information Retrieval (IR) is a complicated one.

Have you looked at any of the standard texts? Like:

Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze (Jul 7, 2008) – http://www.amazon.com/Introduction-Information-Retrieval-Christopher-Manning/dp/0521865719/ref=sr_1_1?s=books&ie=UTF8&qid=1305573574&sr=1-1

Information Retrieval: Implementing and Evaluating Search Engines by Stefan Büttcher, Charles L. A. Clarke and Gordon V. Cormack (Jul 23, 2010) – http://www.amazon.com/Information-Retrieval-Implementing-Evaluating-Engines/dp/0262026511/ref=sr_1_3?s=books&ie=UTF8&qid=1305573574&sr=1-3

Search for ‘information retrieval’ on Amazon for more.

You might also take a look at my answer to Design Question for Notification System which outlines a general architecture for spidering websites for search.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i wrote a simle webcrawler. i fetched all the websites and but them on

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply