We are already working in MS-WORD 2007 with C#4.0(WIN FORM Application) by using open XML representation of the MS-WORD 2007 for splitting,Aggregate the word document.Now, We extending our work to supporting PDF Files.So, i would like to know if there is any tool available for getting internal structure(XML BASED) of the PDF file likewise Open XML representation for MS-OFFICE 2007?
Please enlighten me on this…?
Does the PDF contain any marked content? Otherwise there is no XML structure you can extract