Let’s pretend that I have a site where the users create topics and write threads on Fruit.
To keep the users informed of all Fruit conversations on the entire web, I collect tweets related to the specific topic and create threads based on the contents of the tweet.
It’s really important that the tweets are relevant to the topic, obviously. Let’s say that a user creates a topic called Apples and Oranges. I pull all tweets that contains the keywords Apples an/or Oranges.
The problem that I’m having is that some twitter users write a tweet that includes the keywords Apples, Oranges, Pears, for example, and it gets collected and posted as a thread to the Apples and Oranges discussion topic. This makes the users angry!
So what I need is a way to filter out any tweet that includes fruit words other than Apples and/or Oranges.
For example, if a twitter user writes “I love Apples, Oranges, Pears, and Grapes” then that tweet should not be included.
Now you can only make the Twitter search query so sophisticated. So the exclusion logic will have to be performed in Ruby after the tweets are collected.
Programmatically, how would you go about solving this?
Determine the words that are related to the topic name. Pears, grapes, etc. You can then exclude tweets that use these related words.
One way to do this is using Google Sets.
NOTE: I am in the unfortunate position of not fully condoning my own solution due to this service not having an official API (as awesome as this would be!). Though if you are going to use this strategy then I would suggest storing the Google Set result.