I am planning to learn natural language processing this year.
But when I start reading introductory books on this topic, I found that I miss a lot of points relating mainly to mathematics.
So I’m here searching for what I should learn before I can learn nlp, well, more smoothly?
Thanks in advance.
There are two main approaches to NLP right now – one is the language-based approach detailed by Jurafsky and Martin (Speech and Language Processing) and the other is a probability and statistics-based approach (Foundations of Statistical Natural Language Processing).
Most people that I’ve talked to tend to prefer the latter as far as ease of ramping up and useful results. So I would recommend going over probability theory first and then tackling an NLP book (like the second one I linked to, which I am actually using on a project right now with pretty good results).
While I agree with laura that formal language theory is highly useful, I actually think that currently if you just want to get into the actual NL parts of NLP, you can leave formal languages for later as there are enough tools that will do your lexical analysis / parsing / tokenizing / text transformations that you can use those rather than roll your own.
Here is a book describing three such tools – I own it and recommend it as a good introduction to all three. Building Search Applications: Lucene, LingPipe, and Gate
Edit: in response to your question, I would say that the first step would be to get a thorough grounding in the basics of probability (the first 3-5 chapters of any undergrad prob/stats book should be fine), and then from there look up new topics as they come up in the NLP book. For instance, yesterday I had to learn about t-values or something (I’m bad with names) because they happened to be relevant to determining incidence of collocation.