I would like to know the procedure to adopt to parse and obtain text content from Microsoft word (.doc and .docx) documents . programming language used should be plain ‘C’ (should be gcc).
Are there any libraries that already do this job,
extension : can i use the same procedure to parse text from Microsoft power point files also ?
Microsoft Word documents are an enormous beast – you definitely don’t want to be writing this code yourself. Look into using an existing free Word library such as antiword or wvWare.