I want to create a tool, which generates statistics of how often a certain word or phrase occured in blogs, forums, social media and news sites, i. e. something like this:
20.11.2011;football;800302
21.11.2011;football;1000000
etc.
Every day this tool would do a search and then save the number of mentions of the search item on a particular day.
How can I implement this (make a Google/Yandex search programmatically) in Java or Ruby?
There is Google Blog Search API (http://code.google.com/apis/blogsearch/), but it is deprecated now.
If you have specific sites in mind, then you can scrape it once in a day, but if you are looking for broader set of sites as mentioned in your post, boy, thats a tough one. I would try to use Google Trends- http://www.google.com/trends?q=football or Google Blog Search http://www.google.com/search?q=football&tbm=blg.
It will save you a lot of trouble. Otherwise, you may need to write your own crawler and index very very large amount of data. You may want to look at Nutch http://nutch.apache.org/ and Lucene http://lucene.apache.org in that case.