I need to analyze a users’ post and categorize it. For example: I have to categorize every post as a “buy” post or a “sell” post based on the text – “I’m looking to sell my house” is categorized as “sell”. The problem is that often its not so simple – “I’m looking to get rid of my old house” also needs to be categorized as “sell”. “I’m looking for a house” becomes “buy”. I also would like to categorize these posts based on the item in question – for example, the post above would be categorized as “buy” and as “house”.
Can anyone recommend a good approach / good framework / technique when it comes to analyzing and understanding user input?
Thanks.
You’re right; it’s a hard thing to do.
Yahoo! has a Term Extraction API/Web service you can use. It’s a pretty good way to use language analysis on your own text without writing a million lines of code to do it yourself. I haven’t used it, so I’ve no idea how well it works with similar meanings, as your question asks.