I’m writing an RSS reader in python as a learning exercise, and I would really like to be able to tag individual entries with keywords for searching. Unfortunately, most real-world feeds don’t include keyword metadata. I currently have about 60,000 entries in my test database from about 600 feeds, so manually tagging is not going to be effective. So far I have only been able to find two solutions:
1: Use Natural Language Toolkit to extract keywords:
- Pros: flexible; no dependencies on external services;
- Cons: can only index the article summary, not the article; non-trivial: writing a high quality keyword extraction tool is a project in itself;
2: Use the Google Adwords API to fetch keyword suggestions from the article url:
- Pros: Super high quality keywords; based on entire article text; easy to use;
- Cons: Not free(?); Query rate limits unknown; I’m terrified of getting my account banned and not being able to run adwords campaigns for my commercial sites;
Can anyone offer any suggestions? Are my fears about getting my adwords account banned unfounded?
You can use delicious suggested tags API.
An example of how to use the api via python http://www.michael-noll.com/projects/delicious-python-api/
An other alternative is Open Calais