I’ll try to explain in detail what I need:
I’m parsing an RSS feed in Python using feedparser. This feed has, of course, a list of items, with title, link and description just like a common RSS feed.
In the other hand I have a list of strings with some keywords I need to find in the item’s description.
What I need to do is find the item which has the most keyword matches
Example:
RSS feed
<channel>
<item>
<title>Lion</title>
<link>...</link>
<description>
The lion (Panthera leo) is one of the four big cats in the genus
Panthera, and a member of the family Felidae.
</description>
</item>
<item>
<title>Panthera</title>
<link>...</link>
<description>
Panthera is a genus of the Felidae (cats), which contains
four well-known living species: the tiger, the lion, the jaguar, and the leopard.
</description>
</item>
<item>
<title>Cat</title>
<link>...</link>
<description>
The domestic cat is a small, usually furry, domesticated,
carnivorous mammal. It is often called the housecat, or simply the
cat when there is no need to distinguish it from other felids and felines.
</description>
</item>
</channel>
Keyword list
['cat', 'lion', 'panthera', 'family']
So in this case, the item with most (unique) matches is the first one, because it contains all 4 keywords (doesn’t matter it says ‘cats’ instead of just ‘cat’, I just need to find the literal keyword inside the string)
Let me clarify that even if some description contained the ‘cat’ keyword 100 times (and none of the other keywords), this will not be the winner, because I’m looking for the most keywords contained, not the most times a keyword appears.
Right now, I’m looping over the rss items and doing it “manually”, counting the times a keyword appears (but I’m having the problem mentioned in the above paragraph).
I’m very new at Python and I come from a different kind of language (C#), so I’m sorry if this is pretty trivial.
How would you approach to this problem?
1 Answer