Situation:
I wish to perform a Deep-level Analysis of a given text, which would mean:
- Ability to extract keywords and assign importance levels based on contextual usage.
- Ability to draw conclusions on the mood expressed.
- Ability to hint on the education level (word does this a little bit though, but something more automated)
- Ability to mix-and match phrases and find out certain communication patterns
- Ability to draw substantial meaning out of it, so that it can be quantified and can be processed for answering by a machine.
Question:
What kind of algorithms and techniques need to be employed for this?
Is there a software that can help me in doing this?
When you figure out how to do this please contact DARPA, the CIA, the FBI, and all other U.S. intelligence agencies. Contracts for projects like these are items of current research worth many millions in research grants. 😉
That being said you’ll need to process it in layers and analyze at each of those layers. For items 2 and 3 you’ll find training an SVM on n-tuples (try, 3) words will help. For 1 and 4 you’ll want deeper analysis. Use a tool like NLTK, or one of the many other parsers and find the subject words in sentences and related words. Also use WordNet (from Princeton)
to find the most common senses used and take those as key words.
5 is extremely challenging, I think intelligent use of the data above can give you what you want, but you’ll need to use all your grammatical knowledge and programming knowledge, and it will still be very rough grained.