I am trying to strip XML tags from a document using Python, a language I am a novice in. Here is my first attempt using regex, whixh was really a hope-for-the-best idea.
mfile = file("somefile.xml","w")
for line in mfile:
re.sub('<./>',"",line) #trying to match elements between < and />
That failed miserably. I would like to know how it should be done with regex.
Secondly, I googled and found: http://code.activestate.com/recipes/440481-strips-xmlhtml-tags-from-string/
which seems to work. But I would like to know is there a simpler way to get rid of all xml tags? Maybe using ElementTree?
Try this: