I’m searching for strings inside a word document using the Open XML Office SDK 2.0 and list those.
MatchCollection Matches;
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(txtLocation.Text, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regex = new Regex(@"\(.*?\)");
Matches = regex.Matches(docText);
}
int i = 0;
while (i < Matches.Count)
{ Label lb = new Label();
lb.Text = Matches[i].ToString();
lb.Location = new System.Drawing.Point(24, (28 + i * 24));
this.panel1.Controls.Add(lb);
i++;
}
The problem is that sometimes it returnes the right string, eg: (HelloWorld) but sometimes its something totally different with tags like: < w:rFonts w:ascii=”Arial” w:hAnsi=”Arial” w:cs=”Arial”/ >
How do I get rid of those?
Found out what I had to do, run the string to another Regex.Replace.
This one replaces all <> tags (so XML/HTML)