I would like to calculate the frequency of function words in Python/NLTK. I see two ways to go about it :
- Use Part-Of-Speech tagger and sum up on POS tags which constitute to function words
- Create a list of function words and perform a simple look up
The catch in the first case is that, my data is noisy and I don’t know(for sure) which POS tags constitute as function words. The catch in the second case is I don’t have a list and since my data is noisy the lookup won’t be accurate.
I would prefer the first to the second or any other example which would throw me more accurate results.
I just used the LIWC English 2007 dictionary ( I paid for the same) and performed a simple lookup as of now. Any other answers are most welcome.
I must say I am a little surprised by the impulsiveness of a couple of answers here. Since, someone asked for code. Here’s what I did :
Anyone who has done some code in python would tell you that performing a look up or extracting words with specific POS tags isn’t rocket science. To add, tags(on the question) of NLP(Natural Language Processing) and NLTK(Natural Language ToolKit) should be enough indication to the astute minded.
Anyways, I understand & respect sentiments of people who reply here since most of it is free but I think the least we can do is show a bit of respect to question posters. As it’s rightly pointed out help is received when you help others, similarly respect is received when one respect’s others.