Both programs are reading the same XML file. First program copies all data between

Question

0

Asked: May 26, 20262026-05-26T07:26:11+00:00 2026-05-26T07:26:11+00:00

Both programs are reading the same XML file. First program copies all data between

0

Both programs are reading the same XML file. First program copies all data between <text></text> tags. And second program copies limited data from <text></text> tags.

I want to only limited data. So is it possible to use this statement in first program:

m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', t.text)

First Program

from lxml import etree
doc = etree.parse('file.xml')
def first(seq,default=None):
  for item in seq:
    return item
  return default
    NSMAP=dict(mw="http://www.mediawiki.org/xml/export-0.5/")
for i,page in enumerate(doc.xpath('/mw:mediawiki/mw:page',namespaces=NSMAP)):
  text = first(page.xpath('./mw:revision/mw:text/text()',namespaces=NSMAP))
  id = first(page.xpath('./mw:id/text()',namespaces=NSMAP))
  title = first(page.xpath('./mw:title/text()',namespaces=NSMAP))
  print " %s"  % (text)

Second Program

import re
from xml.etree import ElementTree
with open('file.xml') as f:
    xml = ElementTree.parse(f)
    for t in xml.findall('//{http://www.mediawiki.org/xml/export-0.5/}text'):
    print '===================='
    m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', t.text)
    if m:
        print m.group(1)

UPDATE: please help me. Is there any other alternative?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T07:26:12+00:00

Editorial Team

2026-05-26T07:26:12+00:00Added an answer on May 26, 2026 at 7:26 am

I don’t see any reason why you wouldn’t be able to do the following at the end of your first program:

m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', text)
if m:
    print m.group(1)

As per what you describe, your text variable should contain all the text, and your regexp should then be able to filter out the necessary parts from that.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Both programs are reading the same XML file. First program copies all data between

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply