I’m currently working on my thesis, and the application is going to use natural language question answering. I’ve read about several ideas and followed discussions about natural language question answering, but I can’t seem to find good answers.
Question: How do I get answers from PDF, plain text, or MS Word file?
If I want to search for a topic in a PDF file I would use Ctrl+F to find the topic/idea, but it wouldn’t return all the details; just like a table of contents, it would give the starting page and end page of a chapter. That’s what I want for the logic. It would determine where the chapter ends without using pages or numbers. Is there any algorithm capable of doing that?
I have used iTextPDF to read PDF file Contents.