So I know this is a kind of a large topic, but I need to accept a chunk of text, and extract the most interesting keywords from it. The text comes from TV captions, so the subject can range from news to sports to pop culture references. It is possible to provide the type of show the text came from.
I have an idea to match the text against a dictionary of terms I know to be interesting somehow.
Which libraries for Haskell can help me with this?
Assuming I do have a dictionary of interesting terms, and a database to store them in, is there a particular approach you’d recommend to matching keywords within the text?
Is there an obvious approach I’m not thinking of?
I’d stem the words in the chunks and then search for all terms in the dict
just two random libs:
stem http://hackage.haskell.org/packages/archive/stemmer/0.2/doc/html/NLP-Stemmer-C.html
search http://hackage.haskell.org/packages/archive/sphinx/0.2.1/doc/html/Text-Search-Sphinx.html