I am planning to develop a web-based application which could crawl wikipedia for finding

Question

0

Asked: May 22, 20262026-05-22T18:02:32+00:00 2026-05-22T18:02:32+00:00

I am planning to develop a web-based application which could crawl wikipedia for finding

0

I am planning to develop a web-based application which could crawl wikipedia for finding relations and store it in a database. By relations, I mean searching for a name say,’Bill Gates’ and find his page, download it and pull out the various information from the page and store it in a database. Information may include his date of birth, his company and a few other things. But I need to know if there is any way to find these unique data from the page, so that I could store them in a database. Any specific books or algorithms would be greatly appreciated. Also mentioning of good opensource libraries would be helpful.

Thank You

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T18:02:33+00:00

If you haven’t already, you should have a look at DBpedia. Many categories of wiki articles have “Infoboxes” for the kinds of information you describe, and they’ve made a database out of it:

http://en.wikipedia.org/wiki/DBpedia

You might also leverage some of the information in Metaweb’s Freebase (which overlaps and I believe may even integrate the info from DBpedia.) They have an API for querying their graph database, and there’s a Python wrapper for it called freebase-python.

UPDATE: Freebase is no more; they were acquired by Google and eventually folded into the Google Knowledge Graph. There is an API but I don’t think they have anything like the formal sync’ing Freebase had with public sources like Wikipedia. I’m personally disappointed in how this looks to have turned out. :-/

As for the natural language processing bit, if you do make headway on that problem you might consider these databases as repositories for any information you do mine.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am planning to develop a web-based application which could crawl wikipedia for finding

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply