I have this PDF file which is arranged in 5 columns.
I have looked and looked through Stack Overflow (and Googled crazily) and tried all the solutions (including the last resort of trying Adobe Acrobat itself).
However, for some reason I cannot get those 5 columns in csv/xls format – as I need them arranged. Usually when I export them, the format is horrible and all the entries are arranged line by line with some data loss.
http://www.2shared.com/document/PagE4A1T/ex1.html
Here is a link to an excerpt of the file above, but I am really getting frustrated and am running out of options.
iText (or iTextSharp) could do this, if you can give it the boundaries of those 5 columns, and are willing to deal with some overhead (namely reparsing the page’s text for each column)
Each line of text should be separated by
\n, so it becomes a simple matter of string parsing.If you wanted to not reparse the whole page for each column, you could probably come up with a custom implementation of
FilteredTextRenderListenerthat would take multiple listener/filter pairs. You could then parse the whole thing once rather than once for each column.