I am trying to parse some XML, looking for elements with the tag name

Question

0

Asked: June 17, 20262026-06-17T09:21:36+00:00 2026-06-17T09:21:36+00:00

I am trying to parse some XML, looking for elements with the tag name

0

I am trying to parse some XML, looking for elements with the tag name “ip” Ultimately I need a list of strings with IP addresses in them. Here is what I have tried:

def parseHosts(xmldoc):
  hostsNode = xmldoc.firstChild
  xmlList = hostsNode.getElementsByTagName("ip")

  ipList = []
  for ip in xmlList:
    ipList.append(ip.childNodes[0].nodeValue)

  print ipList
>>>[u'172.16.60.92', u'172.16.60.89', u'\n              ', u'172.16.60.90', u'172.16.60.91', u'172.16.60.93']

That’s OK. but I need a list of strings of IP addresses… I don’t want nodes that are empty. just a nice list of addresses like this:

['172.16.60.1', '172.16.60.5', 172.16.60.100']

I have tried a bit of regex with a list comprehension

  regex = re.compile(r'172\.16\.[0-9]*\.[0-9]*')
  [m.group(0) for l in ipList for m in [regex.search(1)] if m]

But I get the following error

File "myParser.py", line 47, in parseHosts
[m.group(0) for l in ipList for m in [regex.search(1)] if m]
TypeError: expected string or buffer

and try as I might I can not find out with type ipList is using type(ipList) nor can I figure out how to make this stuff a string.

Also… getting rid of that Unicode stuff would be good.

Clearly I have gone off the deep end here somewhere, but I am not sure where to look.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T09:21:36+00:00

Let’s go back to your original code. It ends up with this in ipList:

[u'172.16.60.92', u'172.16.60.89', u'\n              ', u'172.16.60.90', u'172.16.60.91', u'172.16.60.93']

The only problem here is that it includes strings full of whitespace, as well as strings with IP addresses in then, right?

So, let’s just filter it after the fact:

In [51]: ipList = [u'172.16.60.92', u'172.16.60.89', u'\n              ', u'172.16.60.90', u'172.16.60.91', u'172.16.60.93']

In [52]: ipList = [ip for ip in ipList if ip.strip()]

In [53]: ipList
Out[53]: 
['172.16.60.92',
 '172.16.60.89',
 '172.16.60.90',
 '172.16.60.91',
 '172.16.60.93']

And you’re done!

Why does this work? Well, ip.strip() will remove all whitespace from the left and right sides. Sticking the result into an if statement, it will be true if there’s anything left, and false if there’s nothing left.

But obviously you can just move the same condition back into the original loop, putting it before the append call, with exactly the same effect:

def parseHosts(xmldoc):
  hostsNode = xmldoc.firstChild
  xmlList = hostsNode.getElementsByTagName("ip")

  ipList = []
  for ip in xmlList:
    ipstr = ip.childNodes[0].nodeValue
    if ipstr.strip():
      ipList.append(ipstr)

But that whole ipList part is obviously just a long-winded version of a list comprehension, so:

def parseHosts(xmldoc):
  hostsNode = xmldoc.firstChild
  xmlList = hostsNode.getElementsByTagName("ip")
  ipList = [ip.childNodes[0].nodeValue for ip in xmlList
            if ip.childNodes[0].nodeValue.strip()]

As for your attempt to fix this:

[m.group(0) for l in ipList for m in [regex.search(1)] if m]

Whenever it isn’t immediately obvious what a nested list comprehension is doing, break it into two comprehensions.

But let’s rewrite that as an explicit loop. Not only does this make it even easier to understand, it makes it a lot easier to debug:

result = []
for l in ipList:
    for m in [regex.search(1)]:
        if m:
            result.append(m.group(0))

When you run this, you’ll get an exception on the third line, and it should be obvious why.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse some XML, looking for elements with the tag name

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply