I converted .docx word file (xml content) to text with this code (in C#):

Question

0

Asked: June 5, 20262026-06-05T14:08:56+00:00 2026-06-05T14:08:56+00:00

I converted .docx word file (xml content) to text with this code (in C#):

0

I converted .docx word file (xml content) to text with this code (in C#):

private string ReadNode(XmlNode node)
{
    if (node == null || node.NodeType != XmlNodeType.Element)
        return string.Empty;

    StringBuilder sb = new StringBuilder();
    foreach (XmlNode child in node.ChildNodes)
    {
        if (child.NodeType != XmlNodeType.Element) continue;
        switch (child.LocalName)
        {
            case "t":                           // Text
                sb.Append(child.InnerText.TrimEnd());

                string space = ((XmlElement)child).GetAttribute("xml:space");
                if (!string.IsNullOrEmpty(space) && space == "preserve")
                    sb.Append(' ');
                break;

            case "tab":// Tab
                sb.Append("\t");
                break;
            case "p":// Paragraph
                if (ReadNode(child).Trim() != "")
                {
                    sb.Append(ReadNode(child));
                    sb.Append(Environment.NewLine);                            
                }
                break;
            default:
                sb.Append(ReadNode(child));
                break;
        }
    }
    return sb.ToString();
}

How can I read “Line Numbers” of page content in my code(similar read “p” or “tab”)?

Please see the image file(https://i.stack.imgur.com/OVx3O.jpg) :
LineNumbers in docx file.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T14:08:57+00:00

Edit:

I’m afraid that the XML doesn’t store that information. The XML simply stores the general layout of the text, so you would have to try to replicate the layout and then see where each piece of text would fall. That’s not very easy. Explain your problem (why you are trying to do this) in further detail, perhaps we can come up with another solution which doesn’t require getting the line numbers?

The information you need is under one of the other “xmlData” nodes

See "<Pages>2</Pages>"

Full xml below:

  <pkg:part pkg:name="/docProps/app.xml" pkg:contentType="application/vnd.openxmlformats-officedocument.extended-properties+xml" pkg:padding="256">
    <pkg:xmlData>
      <Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
        <Template>Normal.dotm</Template>
        <TotalTime>0</TotalTime>
        <Pages>2</Pages>
        <Words>341</Words>
        <Characters>1948</Characters>
        <Application>Microsoft Office Word</Application>
        <DocSecurity>0</DocSecurity>
        <Lines>16</Lines>
        <Paragraphs>4</Paragraphs>
        <ScaleCrop>false</ScaleCrop>
        <HeadingPairs>
          <vt:vector size="2" baseType="variant">
            <vt:variant>
              <vt:lpstr>Title</vt:lpstr>
            </vt:variant>
            <vt:variant>
              <vt:i4>1</vt:i4>
            </vt:variant>
          </vt:vector>
        </HeadingPairs>
        <TitlesOfParts>
          <vt:vector size="1" baseType="lpstr">
            <vt:lpstr/>
          </vt:vector>
        </TitlesOfParts>
        <Company/>
        <LinksUpToDate>false</LinksUpToDate>
        <CharactersWithSpaces>2285</CharactersWithSpaces>
        <SharedDoc>false</SharedDoc>
        <HyperlinksChanged>false</HyperlinksChanged>
        <AppVersion>14.0000</AppVersion>
      </Properties>
    </pkg:xmlData>
  </pkg:part>

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I converted .docx word file (xml content) to text with this code (in C#):

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply