I’m working with docx docs, and I need to parse a document into sections on the basis of headings styled with the “heading 1” style. So if I had a doc like this (markup is pseudocode):
<doc>
<title style>Doc Title</title style>
<heading1>First Section</heading1>
...
<heading2>Second Section</heading2>
...
<heading3>Third Section</heading3>
...
</doc>
I’d want to break this into a doc with four sections, the first being the content that precedes the first section. I figure that this is probably pretty simple once you’re familiar with Open XML, but I am not.
TIA.
Wow…not even any views on this question all day. Well, I figured it out and thought I’d share the wealth. I can’t share the code directly, but it’s just three nested loops, one looping through the paragraphs, then the paragraph runs, then the styles. The XPath for each of those is:
Once you find a run with the style you like, you pop back up a level to get the first run, which will contain the styled text. From there on, it’s just Comp Sci 101 stuff. I think the real breakthrough was to not even try to mess with the Open Xml SDK (aside from the IO Packaging stuff), and go straight to XML manipulation.