i need an approach or an algorithm to pre-calculate an users interest based on his tweets..
the user connects his account with his twitter account and after retrieving his tweets for the first time i will have to pre-calculate his tastes and interests..
as this user continues to use the my system i will have to make those predictions more accurate..
is there an algorithm or a mathematical model which will help in this requirement?
please provide – existing research links or open source code or examples which will help me to get started..
You can use Machine-Learning for this task.
One possible machine learning algorithm is Bag Of Words with k-nearest neighbors:
Create a training set [users which you know what their interest are], and use the Bag Of Words [preferably with n-grams] to “learn” the training set.
When a new user arrives – have the words/n-grams extracted as features – and find the k nearest neighbors to determine what the interests are.
To get improvement over time – you can have some additional explicit feedback – users can click on agreement/disagreement for what the algorithm said. You can later use this information to extend the size of your training set – which will probably result in more accurate decisions.
This is a standrad algorithm for learning “features” between sets of sentences/words, so you should at least use it as a guideline.
There is also an open source project that might help you: Apache Mahout.