Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5994069
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T23:45:56+00:00 2026-05-22T23:45:56+00:00

I’ve tried to parse an XML file with XML::Simple and XML::Twig with the same

  • 0

I’ve tried to parse an XML file with XML::Simple and XML::Twig with the same result. The other fields in the file works just fine.

The file in question can be retrieved here:

curl -s "http://apps.nlm.nih.gov/medlineplus/services/mpconnect_service.cfm?mainSearchCriteria.v.cs=2.16.840.1.113883.6.103&mainSearchCriteria.v.c=130"

Is this a problem with the parser or the file? The output was the same with both parsers. The HTML-tags in the string is stored in the XML

Input field (inside xml-tags named ‘summary’):

<summary type="html">&lt;p&gt;Toxoplasmosis is a disease caused by the parasite &lt;em&gt;Toxoplasma gondii&lt;/em&gt;. More than 60 million people in the U.S. have the parasite.  Most of them don't get sick. But the parasite causes serious problems for some people. These include people with weak immune systems and babies whose mothers become infected for the first time during pregnancy. Problems can include damage to the brain, eyes and other organs.&lt;/p&gt;&#xd;^I&#xd;&lt;p&gt;You can get toxoplasmosis from &lt;/p&gt;&#xd;&lt;ul&gt;&#xd;&lt;li&gt;^IWaste from an infected cat&lt;/li&gt;&#xd;&lt;li&gt;^IEating contaminated meat that is raw or not well cooked &lt;/li&gt;&#xd;&lt;li&gt;^IUsing utensils or cutting boards after they've had contact with raw meat &lt;/li&gt;&#xd;&lt;li&gt;^IDrinking infected water &lt;/li&gt;&#xd;&lt;li&gt;^IReceiving an infected organ transplant or blood transfusion&lt;/li&gt;&#xd;&lt;/ul&gt;&#xd;&lt;p&gt;Most people with toxoplasmosis don't need treatment. There are drugs to treat it for pregnant women and people with weak immune systems. &lt;/p&gt;&#xd;&#xd;&lt;p class="NLMattribution"&gt;Centers for Disease Control and Prevention&lt;/p&gt;</summary>

Output after XML-parsing:

<p>Toxoplasmosis is a disease caused by the parasite <em>Toxoplasma gondii</em>. More than 60 million people in the U.S. have the parasite.  Most of them don't get sick. But the parasite causes serious problems for some people. These include people with weak im<p class="NLMattribution">Centers for Disease Control and Prevention</p>to treat it for pregnant women and people with weak immune systems. </p>her organs.</p>

Solution to the problem:
The XML files contains a carriage return “ ” which causes problems for the parsers. After I downloaded the XML files I removed the carriage returns with the following line:

sed -i 's/&#xd;//g' *.xml

The parsers now works as expected.

Update:
The carriage return does not affect the parser, only the output which appears truncated and mixed up. Removing it did however solve my problem.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T23:45:57+00:00Added an answer on May 22, 2026 at 11:45 pm

    I do get some weird results when parsing the curl as a pipe (using XML::Twig->new->parse( curl -s "http://..." |): the content appears truncated, changes from call to call…

    Things look better if I parse a file created from the curl result, or XML::Twig’s native parseurl method, then the result is constant, and what you want:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    use XML::Twig;
    
    my $twig    = XML::Twig->new->parseurl( "http://apps.nlm.nih.gov/medlineplus/services/mpconnect_service.cfm?mainSearchCriteria.v.cs=2.16.840.1.113883.6.103&mainSearchCriteria.v.c=130" );
    my $summary = $twig->first_elt( 'summary');
    
    print $summary->text, "\n";
    

    Honestly I have no idea why this happens. I’ll try looking into it a little more, but I suspect there is nothing I can do: if the problem shows up in both XML::Simple and XML::Twig, then it’s probably at a lower level of the stack, XML::Parser or expat and their interaction with curl.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have just tried to save a simple *.rtf file with some websites and
We are using XSLT to translate a RIXML file to XML. Our RIXML contains
i want to parse a xhtml file and display in UITableView. what is the
I'm parsing an XML file, the creators of it stuck in a bunch social
In my XML file chapters tag has more chapter tag.i need to display chapters
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I want use html5's new tag to play a wav file (currently only supported
Seemingly simple, but I cannot find anything relevant on the web. What is the
I have a French site that I want to parse, but am running into

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.