I am looking to recognize simple phrases like the ones what happens in google calendar
but rather than parsing Calendar Entries I have to parse Sentence related to finance, accounting and to do lists. So For example I have to parse sentences like
I spent 50 dollars on food yesterday
I need to mark an separate the info as Reason : 'food' , Cost : 50 and Time: <Yesterday's Date>
My question is do I go in for a full fledged Natural Language Processing like
given in these Questions and use Something like GATE
Machine Learning and Natural Language Processing
Natural Language Processing in Ruby
Ideas for Natural Language Processing project?
https://stackoverflow.com/a/3058063/492561
Or is it better to Write simple grammars using Something like AntLR and try to recognize it .
Or should I go really low and just Define a syntax and use Regular expressions .
Time is a Constraint , I have about 45 – 50 Days , And I don’t know how to use AntLR or NLP libraries like GATE.
Preferred languages : Python , Java , Ruby (Not in any particular order)
PS : This is not home-work , So please Don’t tag it as so.
PPS : Please try to give an answer with Facts on why using a particular method is better.
even if a particular method may not fit inside the time constraint please feel free to share it because It might benefit someone else .
You could look at named entity recognition indeed. From your question I understand your domain is pretty well defined, so you can identify the (few?) entities (dates, currencies, money amount, time expressions, etc.) that are relevant for you. If the phrases are very simple, you could go with a rule-based approach, otherwise it’s likely to get too complex too soon.
Just to get yourself up and running in a few sec, http://timmcnamara.co.nz/post/2650550090/extracting-names-with-6-lines-of-python-code is an extremely nice example of what you could do. Of course I would not expect an high accuracy from just 6 lines of python, but it should give you an idea of how it works:
The core idea is on line 3 and 4: on line 3 it split text in sentences and iterates them.
On line 4, it splits the sentence in tokens, it runs “part of speech” tagging on the sentence, and then it feeds the pos-tagged sentence to the named entity recognition algorithm. That’s the very basic pipeline.
In general, nltk is an extremely beautiful piece of software, and very well documented: I would look at it. Other answers contain very useful links.