Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6547375
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T11:49:25+00:00 2026-05-25T11:49:25+00:00

I am trying to learn using DOMDocument for parsing HTML code. I am just

  • 0

I am trying to learn using DOMDocument for parsing HTML code.

I am just doing some simple work, I already liked gordon’s answer on scrap data using regex and simplehtmldom and based my code on his work.

I found documentation on PHP.net not that good due to limited information, almost no examples, and most specifics were based on parsing XML.

<?php
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTMLFile('http://www.nu.nl/internet/1106541/taalunie-keurt-open-sourcewoordenlijst-goed.html');
libxml_clear_errors();

$recipe = array();
$xpath = new DOMXPath($dom);
$contentDiv = $dom->getElementById('page'); // would have preferred getContentbyClass('content') (unique) in this case.

# title
print_r($xpath->evaluate('string(div/div/div/div/div/h1)', $contentDiv));

# content (this is not working)
#print_r($xpath->evaluate('string(div/div/div/div['content'])', $contentDiv)); // if only this worked
print_r($xpath->evaluate('string(div/div/div/div)', $contentDiv));
?>

For testing purposes I am trying to get the title (between h1 tags) and content (HTML) of a nu.nl news article.

As you can see I can get the title, although I am not even that happy with that evaluate string since it just happens to be the only h1 tag on that div-level.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T11:49:25+00:00Added an answer on May 25, 2026 at 11:49 am

    Here is how you could do it with DOM and XPath:

    $dom = new DOMDocument;
    libxml_use_internal_errors(true);
    $dom->loadHTMLFile('http://www.nu.nl/…');
    libxml_clear_errors();
    
    $xpath = new DOMXPath($dom);
    echo $xpath->evaluate('string(id("leadarticle")/div/h1)');
    echo $dom->saveHtml(
        $xpath->evaluate('id("leadarticle")/div[@class="content"]')->item(0)
    );
    

    The XPath string(id("leadarticle")/div/h1) will return the textContent of the h1 that is a child of a div that is the child of the element with the id leadarticle.

    The XPath id("leadarticle")/div[@class="content"] will return the div with the class attribute content that is a child of the element with the id leadarticle.

    Because you want the outerHTML of the content div you’ll have to fetch the entire node and not just the content, hence no string() function in the XPath. Passing a node to the DOMDocument::saveHTML() method (which is only possible as of 5.3.6) will then serialize that node back to HTML.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to learn some basic ajax using Django. My simple project is an
I was trying to learn how to profile a simple python program using hotshot,
I have some javascript that I'm trying to retool using jQuery to learn the
Alright, I'm just trying to learn about using Contact information, but I'm a bit
I'm trying to learn some graphics programming using C. What would be the best
This problem should be simple enough, i am trying to learn animation using javascript
I just started using resharper and I am trying to learn all the hotkeys
I'm using m2eclipse and trying to learn some tapestry. I'm trying to update my
I'm trying to learn how to run C# and C++ code together using Mono
I'm trying to learn how methods work in C# (Also using the XNA Framework).

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.