I am working on this project where I wish to classify the general mood of a Twitter user from his recent tweets. Since the tweets can belong to a huge variety of domains how should I go about it ?
I could use the Naive Bayes algorithm (like here: http://phpir.com/bayesian-opinion-mining) but since the tweets can belong to a large variety of domains, I am not sure if this will be very accurate.
The other option is using maybe sentiment dictionaries like SentiWordNet or here. Would this be a better approach, I don’t know.
Also where can I get data to train my classifier if I plan to use the Naive Bayes or some other algorithm ?
Just to add here, I am primarily coding in PHP.
It appears you could use
SentiWordNetas the classifier data if you are focused on a word-by-word approach. It is how simpleBayesian spam filtersworks; it focuses on each word.The advantage here is that while many of the words in
SentiWordNethave multiple meanings, each with differentpositive/objective/negativescores, you could experiment with using the scores of the other words in the tweet to narrow in on the most appropriate meaning for each multi-meaning word, which could give you a more accurate score for each word and for the overall tweet.