I try to get the whole content between an opening xml tag and it’s

Question

0

Asked: June 6, 20262026-06-06T02:56:54+00:00 2026-06-06T02:56:54+00:00

I try to get the whole content between an opening xml tag and it’s

0

I try to get the whole content between an opening xml tag and it’s closing counterpart.

Getting the content in straight cases like title below is easy, but how can I get the whole content between the tags if mixed-content is used and I want to preserve the inner tags?

<?xml version="1.0" encoding="UTF-8"?>
<review>
  <title>Some testing stuff</title>
  <text sometimes="attribute">Some text with <extradata>data</extradata> in it.
  It spans <sometag>multiple lines: <tag>one</tag>, <tag>two</tag> 
  or more</sometag>.</text>
</review>

What I want is the content between the two text tags, including any tags: Some text with <extradata>data</extradata> in it. It spans <sometag>multiple lines: <tag>one</tag>, <tag>two</tag> or more</sometag>.

For now I use regular expressions but it get’s kinda messy and I don’t like this approach. I lean towards a XML parser based solution. I looked over minidom, etree, lxml and BeautifulSoup but couldn’t find a solution for this case (whole content, including inner tags).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T02:56:55+00:00

from lxml import etree
t = etree.XML(
"""<?xml version="1.0" encoding="UTF-8"?>
<review>
  <title>Some testing stuff</title>
  <text>Some text with <extradata>data</extradata> in it.</text>
</review>"""
)
(t.text + ''.join(map(etree.tostring, t))).strip()

The trick here is that t is iterable, and when iterated, yields all child nodes. Because etree avoids text nodes, you also need to recover the text before the first child tag, with t.text.

In [50]: (t.text + ''.join(map(etree.tostring, t))).strip()
Out[50]: '<title>Some testing stuff</title>\n  <text>Some text with <extradata>data</extradata> in it.</text>'

Or:

In [6]: e = t.xpath('//text')[0]

In [7]: (e.text + ''.join(map(etree.tostring, e))).strip()
Out[7]: 'Some text with <extradata>data</extradata> in it.'

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I try to get the whole content between an opening xml tag and it’s

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply