I converted .docx word file (xml content) to text with this code (in C#):
private string ReadNode(XmlNode node)
{
if (node == null || node.NodeType != XmlNodeType.Element)
return string.Empty;
StringBuilder sb = new StringBuilder();
foreach (XmlNode child in node.ChildNodes)
{
if (child.NodeType != XmlNodeType.Element) continue;
switch (child.LocalName)
{
case "t": // Text
sb.Append(child.InnerText.TrimEnd());
string space = ((XmlElement)child).GetAttribute("xml:space");
if (!string.IsNullOrEmpty(space) && space == "preserve")
sb.Append(' ');
break;
case "tab":// Tab
sb.Append("\t");
break;
case "p":// Paragraph
if (ReadNode(child).Trim() != "")
{
sb.Append(ReadNode(child));
sb.Append(Environment.NewLine);
}
break;
default:
sb.Append(ReadNode(child));
break;
}
}
return sb.ToString();
}
How can I read “Line Numbers” of page content in my code(similar read “p” or “tab”)?
Please see the image file(https://i.stack.imgur.com/OVx3O.jpg) :

Edit:
I’m afraid that the XML doesn’t store that information. The XML simply stores the general layout of the text, so you would have to try to replicate the layout and then see where each piece of text would fall. That’s not very easy. Explain your problem (why you are trying to do this) in further detail, perhaps we can come up with another solution which doesn’t require getting the line numbers?
The information you need is under one of the other “xmlData” nodes
Full xml below: