Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8117663
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T04:14:52+00:00 2026-06-06T04:14:52+00:00

XML File Sample <GateDocument> <GateDocumentFeatures> … </GateDocumentFeatures> <TextWithNodes> <Node id=0/> MESSAGE SET <Node id=19/>

  • 0

XML File Sample

<GateDocument> 
  <GateDocumentFeatures>
    ...
  </GateDocumentFeatures>
  <TextWithNodes>
    <Node id="0"/>
    MESSAGE SET
    <Node id="19"/> 
    <Node id="20"/>
    1. 1/1/09 - sample text 1
    <Node id="212"/>
    sample text 2
    <Node id="223"/>
    sample text 3
    ...
    <Node id="160652"/>
  </TextWithNodes>
  <AnnotationSet></AnnotationSet>
  <AnnotationSet Name="SomeName">
    ...
  </AnnotationSet>
</GateDocument>

Just to start off, this is the first I’m coding in Python and dealing with XML, so sorry if I miss really obvious things!

My goal is to extract the sample text at specific node ids.

First attempt – I used minidom, which did not give me the correct methods in dealing with the extraction (http://stackoverflow.com/questions/11122736/extracting-text-from-xml-node-with-minidom) due to this weird format of the node ids in self-closing tags.

Second attempt – I took up suggestions in looking at lxml, I have successfully extracted the text to something like this:

['\n\t\t','n\t\tMESSAGE SET\n\t\t','\n\t\t','\n\t\t1. 1/1/09 - sample text 1,....,'\n\t']

With some clean up, I think I can get the text fine, however, I lose the associated node id value.

with the code:

from lxml import etree
from StringIO import StringIO
xmlfile = ('C:\...AnnotationsXML.xml')
xmldoc = etree.parse(xmlfile)  
print xmldoc.xpath("//TextWithNodes/text()")

So I guess my questions is:

  1. Is there a way to extract the above without the \n\t\t? I read that it is the space formating (ie tab) but I am not sure where the <Node id = 0> went.
  2. Is there perhaps a better or more efficient method in extraction for this file?

Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T04:14:53+00:00Added an answer on June 6, 2026 at 4:14 am
    In [1]: from lxml import etree
    
    In [2]: tree = etree.parse('awful.xml')
    
    In [3]: data = {int(node.attrib['id']): node.tail.strip()
       ...: for node in tree.xpath('//TextWithNodes/Node') if node.tail.strip()}
    
    In [4]: data
    Out[4]: 
    {0: 'MESSAGE SET',
     20: '1. 1/1/09 - sample text 1',
     212: 'sample text 2',
     223: 'sample text 3'}
    

    strip is used to get rid of stuff like \t\n and tail takes the text after the tag.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have file sample.xml in my project set as content. I am running this
I have a bit of xml file named Sample.xml which is shown below <?xml
I have been provided an XSD and a sample Xml file that contains the
I'm trying to read in some sample data from an XML file in a
I've an xml file(Sample.xml) which has the following structure <Root> <Child ChildName=Ms_7> <MissingSiblings> <Sibling
I've an xml file (Sample.xml) which has the following structure <RootElement> <Children> <Child Name=FirstChild
I've an xml file (Sample.xml) which has the following structure <RootElement> <Child Name=FirstChild Start=1
I've an xml file Sample.xml <RootElement> <Children> <Child Name=FirstChild Start=0 End=2> <Sibling Name=Test1 />
I have a sample xml file that looks like this: <Books> <Category Genre=Fiction BookName=book_name
The podcast howto on the Apple website shows a sample XML file, which refers

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.