We are building a database of scientific papers and performing analysis on the abstracts. The goal is to be able to say “Interest in this topic has gone up 20% from last year”. I’ve already tried key word analysis and haven’t really liked the results. So now I am trying to move onto phrases and proximity of words to each other and realize I’m am in over my head. Can anyone point me to a better solution to this, or at very least give me a good term to google to learn more?
The language used is python but I don’t think that really affects your answer. Thanks in advance for the help.
It is a big subject, but a good introduction to NLP like this can be found with the NLTK toolkit. This is intended for teaching and works with Python – ie. good for dabbling and experimenting. Also there’s a very good open source book (also in paper form from O’Reilly) on the NLTK website.