I am trying to write an xml parser using BeautifulSoup4 in Python. For some

Question

0

Asked: June 14, 20262026-06-14T17:40:02+00:00 2026-06-14T17:40:02+00:00

I am trying to write an xml parser using BeautifulSoup4 in Python. For some

0

I am trying to write an xml parser using BeautifulSoup4 in Python. For some reason, the document is not being parsed correctly. My xml document is shown below:

<module id="BrainParser_1" name="Brain Parser" package="CCB" version="1" location="pipeline://cranium.loni.ucla.edu//usr/local/loniWorkflows/BrainParser/brainparser.sh" sourceCode="" icon="/9j/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAx&#xA;NDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIy&#xA;MjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAAUCABIAFYEASIAAhEBAxEBBCIA/8QAHwAAAQUB&#xA;AQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEG&#xA;E1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVW&#xA;V1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLD&#xA;xMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAA&#xA;AAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKR&#xA;obHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hp&#xA;anN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU&#xA;1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADgQBAAIRAxEEAAA/APn+iip7OzuNQuktbWIyzPna&#xA;g74GaEr7AfP9FFTWlrcX13Fa2sLzTysEjjQZLE9AKgrQt9C1e7RHt9KvplcblaO3dgw9Rgc16B4a&#xA;+HEMRabWV8+YAGOBThM57nv9OnXrXqn9qXs0cYitoItoAYfxAVr7GS1krDguZ6ENTQWlzckC3t5Z&#xA;STgCNC3P4V7b4G/Z/urp/tfi5mtoBgpaQyDe/wDvEfdH05r6B03TLLSLGKy0+2jtraJdqRxrgAV8&#xA;9ReCPEstuJho9wiE4/e4jOfoxBqZ/AHihOulP0zgSxk4+gavdLi6vLxwFRTHG3zAnlvpVdXeS7Mi&#xA;mQHaRsJxjkVzc75mmd6wtNpO7Piyy8CeK9RjaS18Pai6qcEmBl5/HFTS/DnxlDGXfw3qO0ekJP6C&#xA;vtiivnm90vUNNYre2VxbnO3MsZUE+xPXpVSvoiQSi7Ek6qdw2qAeQKwbjwV4d1DfPLb7JWbBaNiv&#xA;P0BxSVXuTLBv7LPga5tLmymaK6t5YJFJBSRCpBHXg1DX3lf6RpuqhBqFhbXQT7vnxB9v0zXm3iD4&#xA;BeFtYv3u7OW40wuOYoMGPPqAeleK0V1viXwHfaJMjWm++tpSQpjQl0PoQPbuP04zyVaJp7HHOEoO&#xA;0j5Wors/Hfw01vwHPGb3y7izmJEV1BkqT6EHoa4yiiiimSFFFFFeheAfDl/DqbXs0fljycKpHPJB&#xA;59On61j+BPDsWu6yDdjNpERkf327L/j+Fe7f2cLCAqkAU45wMZrXDTh7ZRY5QfJzBXtPwM8A6u/i&#xA;ay8UXdp5WlxRO8MjnmRiCowPxJyfSuP+FvgP/hPPE5trh3i0+1Tzbl1HLDOAgPYn+QNfXmnafa6T&#xA;p1vp9lEIbW3QRxRgk7VH1qpFGJEwo4PJO7BBqwLRGVGk+ZsYB3GlihRY0IiCuevFXFUAYx+dduJn&#xA;Zcpyyk1sWqKKKpfZ0VsopU98d6rOmbrlN3yHI/EVoSSOr/LCSv8AeJwKzLzUbKzuA11c+UNh+6hP&#xA;ce1cKUW3cuNSv0bCiiika2idlk2srIeOTxSNa20h3SoOOcbsA0kOq6Ze5W21GNpOyyDbmpZFPCug&#xA;3Hseh+hrN04S2NoYqvSd5aoKKKKr77SSNkj4fkKM859q8+8Z+B1W2fUdMjzIgL3MYJZnJPUDPGOe&#xA;K9F6bX8pQQeMmkvY/PtpYThCw6gdawlF02elRqxxVN3RFcWtvdwmK5gjmjPVJFDD8jXzf8W/g6uh&#xA;RTeIfDyE6cCXubXOTBk/eX/Z56dvp0+laZNDFcQvDNGskTqVdHGQwPUEV840V0fjTQG0LWiFyYbg&#xA;eYhPr/EPz5/Gitk7q5584uEnFnwHRXU/EPwz/wAIl431DS0B8hX8yAnHMbcjp+X4UV6H8KbTytLg&#xA;uI7VJS5Z23yY+YMVBHHoBXq8tw723l3FkrBh9zzOn6V5L8M3lutCt4AuYIy6MUPzA7i3Pp1FelOi&#xA;6fZbXmZs9CzEkn8a5pR97Vfn/md9GHPFPoe5/s8aDDZ+DrjWt264v52Q+iohwB9c5Nex15J+z3q0&#xA;F34Ak05eJ7K6cOM9Q/zA/wAx+Fet1l392bOVfMgKpn5D5nB/SqsPie3Fy24RKAMAl/8A61WHhOrQ&#xA;lTNmDJBXPJrDvfB8P2bdbA+d1ILcEelelzVKkFzJHNUwnvc0Aooop03iS8vdTEFvLGsZPGwZJ/HF&#xA;T3Fi10+Jo95ZDy0hPcV0fhXw1pNrp0LtbpLKwBeRxyD6D0qfxDDpmmPFMpEQdSDlsAciuOnioSm4&#xA;2/r7zBxaCiiq97fWmm2j3V9cxW1un3pZXCqPqTXj82l3KXzQpEy/N8vBrtNAi1JLBkvox5an5PMb&#xA;Bq9Jf2zvsjCM4XcGyOlXre3ZYxPII2Y9Af51tzLt+YKLloixRWLY+L/DmpXy2VjrlhcXLDcIop1Z&#xA;iPbBraqk8MzrtCqCefv9P0pJ7SWTZI7bAvQ7+v6VppFJO5AAIz/DVXVpbbTIhLe3iKg5IbgDFRVb&#xA;kr8unz/zOrB01SvGUtWFFFZuta/pfh3T5b7Vb2K2gjUsS7cn2A6k/SvJ/inJvu9PQkEosgODn+7R&#xA;XM+K9ZOueILi5wvloTFFtOcoCcH8c5/GilHYxru9Rs+Zv2gYynxMZtynfZxHAOSOo5/KiuJ8Z+I5&#xA;PFfi3UdZcFVnlPlKf4Yxwo/ICirXhbxrqHhVZo7dVlhlO4oTghumQcHt/IV6/wCG7q91OC1vNYjM&#xA;d26bjEc/KO3HbPX2zivnuvVvAXjBLkRWN9Ltu4k2xs3/AC0UD19R3/P1xtRjHnu9zbD1HflbJ/B3&#xA;jvW/A95PcaRLHiddssUqbkb0OPUZr6c+EniPxF4o8JNqHiGFFYzFbeVU2GVMD5sfXIz3r4+r3z4J&#xA;/FS3tLaPwtr1yIkU4srmQ4UD/nmx7ex/D0r1WO1SOTzE2At17Zqjr2rjSrXfFbNNKRgYX5V+ppLb&#xA;V7dwCJFLMcdeallKzRyb04btXY9jrautD6DorO0/XtJ1WWSOw1K1uZInKOsUoYgjrx+NaNaPhC6m&#xA;u9CiuLlRukYtx6dq5T4xrbP4RuHcHzlCCPn/AKaJn9M12dmY7OwiijUKir0HavDvil40ttbuY9N0&#xA;ydJ7WMZkmXOC2TlRkcjgHIyDnrXz1C8q115nmy2CvJv2hkVvhzExkKlb6PC5+9w1eqXNzDZ2stzc&#xA;OI4YkLu56KoGSa+Vfi38Uo/HUtvYaZFLFpVsxfMoAaWTpnHYAdPqa85imlglWWGR45F6OjEEfiK6&#xA;+z+J/iO2jjilnjuIl4YOgDMPTI6flXG0V6SbWxmeZRSyQSrLFI0ciHKuhwQfUGu3sPjD4608Rquu&#xA;yzLH/DcIr5+pIyfzrhaK7rUvijq93amGzQWbN1kV9xH04GP1rkbvVtSv023moXdwvpNMzj9TVOin&#xA;KTluC0PRda+NvjTWtNFkb2OzBPzy2amOR/bdnj8MVwl9qd/qciyX99c3bqMBriVpCB6ZJqrRRRRR&#xA;UgFFFFFKrMjq6MVZTkEHBBoooAKKKK1tE8RXmiah9qTE+SS6SsxBJIJPX73HU5r1Ox8f6Zqdzawo&#xA;0m52zIhQgoP5H8KKKpTlyuPc0jVlFWRp+H9dvPDeu2mr2DKLi2feoblW9QfYivqHwT8afD/i27i0&#xA;6ZJNO1GReEmI8tz6K3r7ECiio/HvxEhg06TTNMaX7VOm1pB8vlL3OeufTH1+vjNFFY06UaatEhu5&#xA;xvxt+KUfk3PhHR2Vy4AvLlW+7zny1x34GT74r59oooooorQQUUUUUUUUAFFFFFFFFABRRRX/2Q==" posX="80" posY="70" rotation="1">
    <authors>
        <author fullName="Mubeena Mirza" email="" website="" />
    </authors>
    <executableAuthors>
        <author fullName="Zhuowen Tu" email="" website="" />
        <author fullName="Bruce Liu" email="" website="" />
    </executableAuthors>
    <metadata>
        <data key="__creationDateKey" value="Tue Sep 11 10:28:28 PDT 2007" />
    </metadata>
    <input id="BrainParser_1.Structure" name="Structure" description="0: segmentation sub-cortical structures&#xA;1: sulci detection" required="false" enabled="true" order="0" prefix="-p" prefixSpaced="true" prefixAllArgs="false">
        <format type="Enumerated" cardinality="1">
            <enumeration>0</enumeration>
            <enumeration>1</enumeration>
            <enumeration>2</enumeration>
        </format>
        <values>
            <value>2</value>
        </values>
    </input>
    <input id="BrainParser_1.Testing" name="Testing" description="0: perform segmentation/detection&#xA;1: perform training&#xA;" required="false" enabled="true" order="1" prefix="-r" prefixSpaced="true" prefixAllArgs="false">
        <format type="Enumerated" cardinality="1">
            <enumeration>0</enumeration>
            <enumeration>1</enumeration>
        </format>
        <values>
            <value>0</value>
        </values>
    </input>
    <input id="BrainParser_1.SourceFile" name="Source File" description="In testing, it points to the source file in training, it points directory in which the training volumes are saved.&#xA;" required="true" enabled="true" order="2">
        <format type="File" cardinality="1">
            <fileTypes>
                <filetype name="Analyze Image" extension="img" description="Analyze Image">
                    <need>hdr</need>
                </filetype>
                <filetype name="Analyze Image" extension="img" description="Analyze Image file">
                    <need>hdr</need>
                </filetype>
            </fileTypes>
        </format>
    </input>
    <output id="BrainParser_1.TargetFile" name="Target File" description="In testing, it points to the target file in training, it points directory in which the trained classifiers are saved.&#xA;" required="true" enabled="true" order="3">
       <format type="File" cardinality="1">
           <fileTypes>
               <filetype name="Analyze Image" extension="img" description="Analyze Image">
                    <need>hdr</need>
               </filetype>
            </fileTypes>
        </format>
    </output>
    <input id="BrainParser_1.ModelsDirectory" name="Models Directory" description="Directory of trained models." required="false" enabled="true" order="4" prefix="-m" prefixSpaced="true" prefixAllArgs="false">
        <format type="Directory" cardinality="1" />
        <values>
            <value>pipeline://cranium.loni.ucla.edu//usr/local/loniWorkflows/BrainParser/56_Structure</value>
        </values>
    </input>
    <input id="BrainParser_1.NumberofStructures" name="Number of Structures" description="Only effective in training." required="false" enabled="false" order="5" prefix="-n" prefixSpaced="true" prefixAllArgs="false">
        <format type="Number" cardinality="1" />
        <values>
            <value>1</value>
        </values>
    </input>
    <input id="BrainParser_1.NumberofIterations" name="Number of Iterations" required="false" enabled="false" order="6" prefix="-t" prefixSpaced="true" prefixAllArgs="false">
        <format type="Number" cardinality="1" />
    </input>
    <input id="BrainParser_1.SmoothnessFactor" name="Smoothness Factor" description="Defalut=0.5, typical 0.0~2.0." required="true" enabled="true" order="7" prefix="-s" prefixSpaced="true" prefixAllArgs="false">
        <format type="Number" cardinality="1" />
        <values>
            <value>2.0</value>
        </values>
    </input>
</module>

The Python code I’ve written is shown below:

if __name__ == '__main__':
    soup = BeautifulSoup (
        open('test.xml'),
        'lxml'
    )

    for e in soup.find_all('module',attrs={'name':'Brain Parser'}):
        for i in e.find_all('input'):
            print i.prettify()

And this is the result:

<input description="0: segmentation sub-cortical structures 1: sulci detection" enabled="true" id="BrainParser_1.Structure" name="Structure" order="0" prefix="-p" prefixallargs="false" prefixspaced="true" required="false"/>

<input description="0: perform segmentation/detection 1: perform training" enabled="true" id="BrainParser_1.Testing" name="Testing" order="1" prefix="-r" prefixallargs="false" prefixspaced="true" required="false"/>

<input description="In testing, it points to the source file in training, it points directory in which the training volumes are saved. " enabled="true" id="BrainParser_1.SourceFile" name="Source File" order="2" required="true"/>

<input description="Directory of trained models." enabled="true" id="BrainParser_1.ModelsDirectory" name="Models Directory" order="4" prefix="-m" prefixallargs="false" prefixspaced="true" required="false"/>

<input description="Only effective in training." enabled="false" id="BrainParser_1.NumberofStructures" name="Number of Structures" order="5" prefix="-n" prefixallargs="false" prefixspaced="true" required="false"/>

<input enabled="false" id="BrainParser_1.NumberofIterations" name="Number of Iterations" order="6" prefix="-t" prefixallargs="false" prefixspaced="true" required="false"/>

<input description="Defalut=0.5, typical 0.0~2.0." enabled="true" id="BrainParser_1.SmoothnessFactor" name="Smoothness Factor" order="7" prefix="-s" prefixallargs="false" prefixspaced="true" required="true"/>

As you can see, it thinks that input has no child elements, but this is not the case. I did some poking around, and it seems that elements like value and format are parsed as children of the module element. Can anybody help with this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T17:40:03+00:00

You are calling BeautifulSoup with "lxml", which tells it to use the lxml parser and parse the input as HTML. (In HTML, input tags are self-closing and don’t have children, so your string is not valid HTML. BeautifulSoup does its magic HTML fixing and decides that you meant the input tag to close itself immediately, which is why you are not seeing any children.)

You want to call it with "xml", which tells it that the input is an XML document.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to write an xml parser using BeautifulSoup4 in Python. For some

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply