Is there any open source engine project which can classify online text or article. or check the similarity of two article.
for example
1.i have ten different text or article. and then engine are able to classify this article into different fields, like sport, entertainment, political.
2.two articles describe the same event. the engine are able to put them together. treat them as same article
thank you
You can try using Alchemy API. Though not open source, there is a free usage tier. Their topic categorization and concept tagging might be useful in case of your example 1. In case of example 2, any of the classifers like Bayes, naive bayes etc can be used although with training. Weka is also a widely used tool.