I’d like to know if there’s some PDF library in Microsoft .NET being able of extracting text by giving coordinates.
For example (in pseudo-code):
PdfReader reader = new PdfReader();
reader.Load("file.pdf");
// Top, bottom, left, right in pixels or any other unit
string wholeText = reader.GetText(100, 150, 20, 50);
I’ve tried to do so using PDFBox for .NET (that one working on top of IKVM) with no luck, and it seems to be very outdated and undocumented.
Perhaps anyone has a good sample of doing so with PDFBox, iTextSharp or any other open-sourced library, and he/she can give me a hint.
Thank you in advance.
Well, thank you for your effort anyone.
I got it using Apache’s PDFBox on top of IKVM compilation, and this is the final code:
And it works like a charm.
Thank you anyway and I hope my own answer will help others. If you need further details, just comment out here and I’ll update this answer.