Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7071207
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T05:39:36+00:00 2026-05-28T05:39:36+00:00

I have been trying to parse webpages by use of the HTML DOMObject in

  • 0

I have been trying to parse webpages by use of the HTML DOMObject in order to use them for an application to scan them for SEO quality.

However I have run into a bit of a problem. For testing purposes I’ve written a small HTML page containing the following incorrect HTML:

<head>
<meta name="description" content="randomdesciption">
</head>
<title>sometitle</title>

As you can see the title is outside the head tag which is the error I am trying to detect.

Now comes the problem, when I use cURL to catch the response string from this page then send it to the DOM document to load it as HTML it actually fixes this by ADDING another <head> and </head> tags around the title.

<head>
<meta name="description" content="randomdesciption">
</head>
<head><title>sometitle</title></head>

I have checked the cURL response data and that in fact is not the problem, somehow the PHP DOMDocument during the execution of the loadHTML() method fixes the html syntax.

I have also tried turning off the DOMDocument recover, substituteEntities and validateOnParse attributes by setting them to false, without success.

I have been searching google but I am unable to find any answers so far. I guess it is a bit rare for some one that actually want the broken HTML not being fixed.

Anyone know how to prevent the DOMDocument from fixing my broken HTML?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T05:39:36+00:00Added an answer on May 28, 2026 at 5:39 am

    UPDATE: as of PHP 5.4 you can use HTML_PARSE_NO_IMPLIED

    $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED);
    

    Original answer below

    You cant. In theory there is a flag HTML_PARSE_NO_IMPLIED for that in libxml to prevent adding implied markup, but its not accessible from PHP.

    On a sidenote, this particular behavior seems to depend on the LIBXML_VERSION used.

    Running this snippet:

    <?php
    $html = <<< HTML
    <head>
    <meta name="description" content="randomdesciption">
    </head>
    <title>sometitle</title>
    HTML;
    
    $dom = new DOMDocument;
    $dom->loadHTML($html);
    $dom->formatOutput = true;
    echo $dom->saveHTML(), LIBXML_VERSION;
    

    on my machine will give

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html>
    <head><meta name="description" content="randomdesciption"></head>
    <title>sometitle</title>
    </html>
    20707
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have been trying to use the HTML Agility Pack to parse HTML into
OK I have been trying to parse a html tag which in it contains
I have been trying to parse the bottom table on this site using different
I have been trying to us CALayers as sprites in an iPhone application I'm
I have been trying for days to figure out how to parse this JSON
I have been trying to use an intent to open a link to a
http://pastebin.com/rXbeKqAa Hi all I have been trying to parse the above JSON into a
I have been trying to parse Json into A ListView but when i do
ok this is driving me crazy. I have been trying to parse a xml
I have been trying to load a html file in a webview in a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.