I am using itextsharp 5.1.1 to extract all text to count all words in it with the following code
public static string GetTextFromAllPages(String pdfPath)
{
PdfReader reader = new PdfReader(pdfPath);
StringWriter output = new StringWriter();
for (int i = 1; i <= reader.NumberOfPages; i++)
output.WriteLine(PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy()));
return output.ToString();
}
but for different languages(en,fr,..) and inputs files it mostly gives wrong result from real value i expect
iTextSharp (http://sourceforge.net/projects/itextsharp/) has a robust API for manipulating pdf’s.