I am trying to parse some XML, looking for elements with the tag name “ip” Ultimately I need a list of strings with IP addresses in them. Here is what I have tried:
def parseHosts(xmldoc):
hostsNode = xmldoc.firstChild
xmlList = hostsNode.getElementsByTagName("ip")
ipList = []
for ip in xmlList:
ipList.append(ip.childNodes[0].nodeValue)
print ipList
>>>[u'172.16.60.92', u'172.16.60.89', u'\n ', u'172.16.60.90', u'172.16.60.91', u'172.16.60.93']
That’s OK. but I need a list of strings of IP addresses… I don’t want nodes that are empty. just a nice list of addresses like this:
['172.16.60.1', '172.16.60.5', 172.16.60.100']
I have tried a bit of regex with a list comprehension
regex = re.compile(r'172\.16\.[0-9]*\.[0-9]*')
[m.group(0) for l in ipList for m in [regex.search(1)] if m]
But I get the following error
File "myParser.py", line 47, in parseHosts
[m.group(0) for l in ipList for m in [regex.search(1)] if m]
TypeError: expected string or buffer
and try as I might I can not find out with type ipList is using type(ipList) nor can I figure out how to make this stuff a string.
Also… getting rid of that Unicode stuff would be good.
Clearly I have gone off the deep end here somewhere, but I am not sure where to look.
Let’s go back to your original code. It ends up with this in
ipList:The only problem here is that it includes strings full of whitespace, as well as strings with IP addresses in then, right?
So, let’s just filter it after the fact:
And you’re done!
Why does this work? Well,
ip.strip()will remove all whitespace from the left and right sides. Sticking the result into anifstatement, it will be true if there’s anything left, and false if there’s nothing left.But obviously you can just move the same condition back into the original loop, putting it before the
appendcall, with exactly the same effect:But that whole
ipListpart is obviously just a long-winded version of a list comprehension, so:As for your attempt to fix this:
Whenever it isn’t immediately obvious what a nested list comprehension is doing, break it into two comprehensions.
But let’s rewrite that as an explicit loop. Not only does this make it even easier to understand, it makes it a lot easier to debug:
When you run this, you’ll get an exception on the third line, and it should be obvious why.