Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8117159
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T04:04:21+00:00 2026-06-06T04:04:21+00:00

Thanks to the brilliant help on my XML parsing problem I got to a

  • 0

Thanks to the brilliant help on my XML parsing problem I got to a point where I am lost in how XML elements are actually processed (with lxml).

My data is the output of a nmap scan, made up of many records like the ones below:

<?xml version="1.0"?>
<?xml-stylesheet href="file:///usr/share/nmap/nmap.xsl" type="text/xsl"?>
<nmaprun scanner="nmap" args="nmap -sV -p135,12345 -oX 10.232.0.0.16.xml 10.232.0.0/16" start="1340201347" startstr="Wed Jun 20 16:09:07 2012" version="5.21" xmloutputversion="1.03">
  <host>
    <status state="down" reason="no-response"/>
    <address addr="10.232.0.1" addrtype="ipv4"/>
  </host>  
  <host starttime="1340201455" endtime="1340201930">
    <status state="up" reason="echo-reply"/>
    <address addr="10.232.49.2" addrtype="ipv4"/>
    <hostnames>
      <hostname name="host1.example.com" type="PTR"/>
    </hostnames>
    <ports>
      <port protocol="tcp" portid="135">
        <state state="open" reason="syn-ack" reason_ttl="123"/>
        <service name="msrpc" product="Microsoft Windows RPC" ostype="Windows" method="probed" conf="10"/>
      </port>
      <port protocol="tcp" portid="12345">
        <state state="open" reason="syn-ack" reason_ttl="123"/>
        <service name="http" product="Trend Micro OfficeScan Antivirus http config" method="probed" conf="10"/>
      </port>
    </ports>
    <times srtt="890" rttvar="2835" to="100000"/>
  </host>
</nmaprun>

I am looking at generating a line when

  • port 12345 is open or
  • port 135 is open and 12345 is open

I use the following code for this, which I commented with my understanding of how things go:

from lxml import etree
import time

scanTime = str(int(time.time()))
d = etree.parse("10.233.85.0.22.xml")

# find all hosts records
for el_host in d.findall("host"):
    # only process hosts UP
    if el_host.find("status").attrib["state"] =="up":

         # here comes a piece of code which sets the variable hostname
         # used later - that part works fine (removed for clarity)

         # get the status of port 135 and 12345
         Open12345 = Open135 = False
         for el_port in el_host.findall("ports/port"):
             # we are now looping thought the <port> records for a given <host>
             if el_port.attrib["portid"] == "135":
                Open135 = el_host.find("ports/port/state").attrib["state"] == "open"
             if el_port.attrib["portid"] == "12345":
                Open12345 = el_host.find("ports/port/state").attrib["state"] == "open"
                # I want to get for port 12345 the description, so I search
                # for <service> within a given port - only 12345 in my case
                # I just search the first one as there is only one
                # this is the place I am not sure I get right
                el_service = el_host.find("ports/port/service")
                if el_service.get("product") is not None:
                   Type12345 = el_host.find("ports/port/service").attrib["product"]

         if Open12345:
            print "%s %s \"%s\"\n" % (scanTime,hostname,Type12345)
         if not Open12345 and Open135:
            print "%s %s \"%s\"\n" % (scanTime,hostname,"NO_OfficeScan")

The place I am not sure of is highlighted in the comments. With this code I always match Microsoft Windows RPC, like if I was within the record for port 135 (it comes first in the XML file, before port 12345).

I am sure that the problem is somewhere in the way I understand the find function. It probably matches everything, independently of the place I am in. In other words there is no recursion (as far as I can tell).

In that case how can I code the concept of “get the service name when you are in the record for port 12345”?

Thank you.



EDIT & SOLUTION:

I found the problem in my code. I repost the whole script if someone someday stumbles upon this problem (the output comes from nmap so it could be interesting for someone to reuse – this it to explain the big chunk of code below 🙂 :

#!/usr/bin/python

from lxml import etree
import time
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("file", help="XML file to parse")
args = parser.parse_args()


scanTime = str(int(time.time()))
d = etree.parse(args.file)

f = open("OfficeScanComplianceDSCampus."+scanTime,"w")
print "Parsing "+ args.file

# find all hosts records
for el_host in d.findall("host"):
    # only process hosts UP
    if el_host.find("status").attrib["state"] =="up":
         # get the first hostname if it exists, otherwise IP
         el_hostname = el_host.find("hostnames/hostname")
         if el_hostname is not None:
            hostname = el_hostname.attrib["name"]
         else:
              hostname = el_host.find("address").attrib["addr"]

         # get the status of port 135 and 12345
         Open12345 = Open135 = False
         for el_port in el_host.findall("ports/port"):
             # we are now looping thought the <port> records for a given <host>
             if el_port.attrib["portid"] == "135":
                Open135 = el_port.find("state").attrib["state"] == "open"
             if el_port.attrib["portid"] == "12345":
                Open12345 = el_port.find("state").attrib["state"] == "open"
                # if port open get info about service
                if Open12345:
                   el_service = el_port.find("service")
                   if el_service is None:
                      Type12345 = "UNKNOWN"
                   elif el_service.get("method") == "probed":
                      Type12345 = el_service.get("product")
                   else:
                        Type12345 = "UNKNOWN"


         if Open12345:
            f.write("%s %s \"%s\"\n" % (scanTime,hostname,Type12345))
         if not Open12345 and Open135:
            f.write("%s %s \"%s\"\n" % (scanTime,hostname,"NO_OfficeScan"))
         if Open12345 and not Open135:
            f.write("%s %s \"%s\"\n" % (scanTime,hostname,"Non-Windows with 12345"))

f.close()

I will also explore the xpath idea given by Dikei and Ignacio Vazquez-Abrams.

Thank you everyone!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T04:04:22+00:00Added an answer on June 6, 2026 at 4:04 am

    This should be easy with xpath

    from lxml import etree
    d = etree.parse("10.233.85.0.22.xml")
    
    d.xpath('//port[@portid="12345"]/service/@name') // return name of service in portid=12345
    d.xpath('//port[@portid="12345"]/service/@product') // return product in port with portid=12345
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Thanks for your help! I'd like to output all companyName entries that have uploads
Thanks to all that responded to my previous thread. There is still a problem
I've had some brilliant help before and I'm hoping you can get me out
Thanks for all your help - see blow original question, and my edit following
Thanks to the answer on this stackoverflow question I was able to get the
Thanks for taking the time to read this. I have an unknown number of
Thanks to suggestions from a previous question , I'm busy trying out IronPython, IronRuby
Thanks to this question (click me!) , I have the Source property of my
thanks guys.i managed to complete it.million thanks again specially for DAVID,WM-EDDIE and S.LOTT.also STACKOVERFLOW
Thanks for looking. All sincerely helpful answers are voted up. I use a password

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.