Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8984285
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T21:03:49+00:00 2026-06-15T21:03:49+00:00

I have a large XML file (around 400MB) that I need to ensure is

  • 0

I have a large XML file (around 400MB) that I need to ensure is well-formed before I start processing it.

First thing I tried was something similar to below, which is great as I can find out if XML is not well formed and which parts of XML are ‘bad’

$doc = simplexml_load_string($xmlstr);
if (!$doc) {
    $errors = libxml_get_errors();

    foreach ($errors as $error) {
        echo display_xml_error($error);
    }

    libxml_clear_errors();
}

Also tried…

$doc->load( $tempFileName, LIBXML_DTDLOAD|LIBXML_DTDVALID )

I tested this with a file of about 60MB, but anything a lot larger (~400MB) causes something which is new to me “oom killer” to kick in and terminate the script after what always seems like 30 secs.

I thought I may need to increase the memory on the script so figured out the peak usage when processing 60MB and adjusted it accordingly for a large and also turn the script time limit off just in case it was that.

set_time_limit(0);
ini_set('memory_limit', '512M');

Unfortunately this didn’t work, as oom killer appears to be a linux thing that kicks in if memory load (even the right term?) is consistently high.

It would be great if I could load xml in chunks somehow as I imagine this will reduce the memory load so that oom killer doesn’t stick it’s fat nose in and kill my process.

Does anyone have any experience validating a large XML file and capturing errors of where it’s badly formed, a lot of posts I’ve read point to SAX and XMLReader that might solve my problem.

UPDATE
So @chiborg pretty much solved this issue for me…the only downside to this method is that I don’t get to see all of the errors in the file, just the first that failed which I guess makes sense as I think it can’t parse past the first point that fails.

When using simplexml…it’s able to capture most of the issues in the file and show me at the end which was nice.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T21:03:50+00:00Added an answer on June 15, 2026 at 9:03 pm

    Since the SimpleXML and DOM APIs will always load the document into memory, using a streaming parser like SAX or XMLReader is the better approach.

    Adpating the code from the example page, it could look like this:

    $xml_parser = xml_parser_create();
    if (!($fp = fopen($file, "r"))) {
        die("could not open XML input");
    }
    
    while ($data = fread($fp, 4096)) {
        if (!xml_parse($xml_parser, $data, feof($fp))) {
            $errors[] = array(
                        xml_error_string(xml_get_error_code($xml_parser)),
                        xml_get_current_line_number($xml_parser));
        }
    }
    xml_parser_free($xml_parser);
    

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a large xml file (40 Gb) that I need to split into
I have a very large XML file that I need to parse so I
I have a large XML file (many MBs) that I cannot afford to download
I have a large xml file that looks like this: 20120124 07:30:15.301, saving to
I have a request that returns a large xml file. I have the file
I have a large XML file (converted to JSON) that consists of multiple repeating
I have a large xml file (1Gb). I need to make many queries on
I have a large XML file that I now want to parse. The XML
I have a function, that gets a large XML file, then parses it, and
I have a pretty large file that I need to parse in php, so

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.