Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9005169
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T01:04:29+00:00 2026-06-16T01:04:29+00:00

I would like to speed up the process of validating a batch of XML

  • 0

I would like to speed up the process of validating a batch of XML files against the same single XML schema (XSD). Only restrictions are that I am in a PHP environment.

My current problem is that the schema I would like to validate against includes the fairly complex xhtml schema of 2755 lines (http://www.w3.org/2002/08/xhtml/xhtml1-transitional.xsd).
Even for very simple data this takes a long time (around 30 seconds pr. validation).
As I have thousands of XML files in my batch, this doesn’t really scale well.

For validating the XML file I use both of these methods, from the standard php-xml libraries.

  • DOMDocument::schemaValidate
  • DOMDocument::schemaValidateSource

I am thinking that the PHP implementation fetches the XHTML schema via HTTP and builds some internal representation (possibly a DOMDocument) and that this is thrown away when the validation is completed. I was thinking that some option for the XML-libs might change this behaviour to cache something in this process for reuse.

I’ve build a simple test setup which illustrates my problem:

test-schema.xsd

<xs:schema attributeFormDefault="unqualified"
    elementFormDefault="qualified"
    targetNamespace="http://myschema.example.com/"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:myschema="http://myschema.example.com/"
    xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <xs:import
        schemaLocation="http://www.w3.org/2002/08/xhtml/xhtml1-transitional.xsd"
        namespace="http://www.w3.org/1999/xhtml">
    </xs:import>
    <xs:element name="Root">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="MyHTMLElement">
                    <xs:complexType>
                        <xs:complexContent>
                            <xs:extension base="xhtml:Flow"></xs:extension>
                        </xs:complexContent>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

test-data.xml

<?xml version="1.0" encoding="UTF-8"?>
<Root xmlns="http://myschema.example.com/" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://myschema.example.com/ test-schema.xsd ">
  <MyHTMLElement>
    <xhtml:p>This is an XHTML paragraph!</xhtml:p>
  </MyHTMLElement>
</Root>

schematest.php

<?php
$data_dom = new DOMDocument();
$data_dom->load('test-data.xml');

// Multiple validations using the schemaValidate method.
for ($attempt = 1; $attempt <= 3; $attempt++) {
    $start = time();
    echo "schemaValidate: Attempt #$attempt returns ";
    if (!$data_dom->schemaValidate('test-schema.xsd')) {
        echo "Invalid!";
    } else {
        echo "Valid!";
    }
    $end = time();
    echo " in " . ($end-$start) . " seconds.\n";
}

// Loading schema into a string.
$schema_source = file_get_contents('test-schema.xsd');

// Multiple validations using the schemaValidate method.
for ($attempt = 1; $attempt <= 3; $attempt++) {
    $start = time();
    echo "schemaValidateSource: Attempt #$attempt returns ";
    if (!$data_dom->schemaValidateSource($schema_source)) {
        echo "Invalid!";
    } else {
        echo "Valid!";
    }
    $end = time();
    echo " in " . ($end-$start) . " seconds.\n";
}

Running this schematest.php file produces the following output:

schemaValidate: Attempt #1 returns Valid! in 30 seconds.
schemaValidate: Attempt #2 returns Valid! in 30 seconds.
schemaValidate: Attempt #3 returns Valid! in 30 seconds.
schemaValidateSource: Attempt #1 returns Valid! in 32 seconds.
schemaValidateSource: Attempt #2 returns Valid! in 30 seconds.
schemaValidateSource: Attempt #3 returns Valid! in 30 seconds.

Any help and suggestions on how to solve this issue, are very welcomed!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T01:04:30+00:00Added an answer on June 16, 2026 at 1:04 am

    You can safely substract 30 seconds from the timing values as overhead.

    Remote requests to W3C servers are being delayed because most libraries do not reflect caching the documents (even the HTTP headers suggest that). But read your own:

    The W3C servers are slow to return DTDs. Is the delay intentional?

    Yes. Due to various software systems downloading DTDs from our site millions of times a day (despite the caching directives of our servers), we have started to serve DTDs and schema (DTD, XSD, ENT, MOD, etc.) from our site with an artificial delay. Our goals in doing so are to bring more attention to our ongoing issues with excessive DTD traffic, and to protect the stability and response time of the rest of our site. We recommend HTTP caching or catalog files to improve performance.

    W3.org tries to keep requests low. That is understandable. PHP’s DomDocument is based on libxml. And libxml allows to set an external entity loader. The whole Catalog support section is interesting in this case.

    To solve the issue in question, setup a catalog.xml file:

    <?xml version="1.0"?>
    <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
        <system systemId="http://www.w3.org/2002/08/xhtml/xhtml1-transitional.xsd"
                uri="xhtml1-transitional.xsd"/>
        <system systemId="http://www.w3.org/2001/xml.xsd"
                uri="xml.xsd"/>
    </catalog>
    

    Save a copy of the two .xsd files with the names given in that catalog file next to the catalog (relative as well as absolute paths file:///... do work if you prefer a different directory).

    Then ensure your systems environment variable XML_CATALOG_FILES is set to the filename of the catalog.xml file. When everything is setup, the validation just runs through:

    schemaValidate: Attempt #1 returns Valid! in 0 seconds.
    schemaValidate: Attempt #2 returns Valid! in 0 seconds.
    schemaValidate: Attempt #3 returns Valid! in 0 seconds.
    schemaValidateSource: Attempt #1 returns Valid! in 0 seconds.
    schemaValidateSource: Attempt #2 returns Valid! in 0 seconds.
    schemaValidateSource: Attempt #3 returns Valid! in 0 seconds.
    

    If it still takes long, it’s just a sign that the environment variable is not set to the right location. I have handled the variable as well as some edge cases as well in a blog post:

    • Using Catalogs for Validation with PHP’s DOMDocument and Libxml2.

    It should take care of diverse edge cases, like filenames containing spaces.

    Alternatively it is possible to create a simple external entity loader callback function that uses a URL => file mapping for the local file-system in form of an array:

    $mapping = [
         'http://www.w3.org/2002/08/xhtml/xhtml1-transitional.xsd'
             => 'schema/xhtml1-transitional.xsd',
    
         'http://www.w3.org/2001/xml.xsd'                          
             => 'schema/xml.xsd',
    ];
    

    As this shows, I’ve placed a verbatim copy of these two XSD files into a subdirectory called schema. The next step is to make use of libxml_set_external_entity_loader to activate the callback function with the mapping. Files that exist on disk already are preferred and loaded directly. If the routine encounters a non-file that has no mapping, a RuntimeException will be thrown with a detailed message:

    libxml_set_external_entity_loader(
        function ($public, $system, $context) use ($mapping) {
    
            if (is_file($system)) {
                return $system;
            }
    
            if (isset($mapping[$system])) {
                return __DIR__ . '/' . $mapping[$system];
            }
    
            $message = sprintf(
                "Failed to load external entity: Public: %s; System: %s; Context: %s",
                var_export($public, 1), var_export($system, 1),
                strtr(var_export($context, 1), [" (\n  " => '(', "\n " => '', "\n" => ''])
            );
    
            throw new RuntimeException($message);
        }
    );
    

    After setting this external entity loader, there isn’t any longer the delay for the remote-requests.

    And that’s it. See Gist. Take care: This external entity loader has been written for loading the XML file to validate from disk and to "resolve" the XSD URIs to local filenames. Other kind of operations (e.g. DTD based validation) might need some code changes / extension. More preferable is the XML catalog. It also works for different tools.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to check the load speed of each page in a particular
I would like to know if there is a difference in speed between computing
I am just coming up to speed on WPF and would like to create
I have an existing Ant project and would like to speed up the build
Would like to parse IPv4 address from exit-addresses . Format of the file: ExitNode
Would like a for loop in jquery so that: For every hover_link: show hidden
Would like someone to take a look at my script and tell me where
Would like to be able to set colors of headings and such, different font
Would like to know what a programmer should know to become a good at
Would like to make anapplication in Java that will not automatically parse parameters used

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.