Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8808743
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T02:41:53+00:00 2026-06-14T02:41:53+00:00

Please see the edit at the bottom: I’m using XPath to scrape some data

  • 0

Please see the edit at the bottom:

I’m using XPath to scrape some data from a site. Im wondering if I’m perhaps using too many foreach() loops, and could traverse through the hierarchy in a simpler way. I feel I may be using too many queries, and that there may be a better way just using one

The hierarchy looks something like this.

<ul class='item-list'>
    <li class='item' id='12345'>
        <div class='this-section'>
            <a href='http://www.thissite.com'>
                <img src='http://www.thisimage.com/image.png' attribute_one='4567' attribute-two='some-words' />

        </div>
        <small class='sale-count'>Some Number</small>
    </li>
    <li class='item' id='34567'>
    <li class='item' id='48359'>
    <li class='item' id='43289'>
</ul>

So I did the following:

$dom = new DOMDocument;
@$dom->loadHTMLFile($file);
$xpath = new DOMXPath($dom);

$list = $xpath->query("//ul[@class='item-list']/li");

foreach($list as $list_item)
{
$item['item_id'][] = $list_item->getAttribute('id');

$links = $xpath->query("div[@class='this-section']//a[contains(@href, 'item')]", $list_item);

foreach($links as $address)
{
    $href = $address->getAttribute('href');
    $item['link'][] = substr($href, 0, strpos($href, '?'));
}

$other_data = $xpath->query("div[@class='this-section']//*[@attribute-one]", $list_item);

foreach($other_data as $element)
{
    $item['cost'][] = $element->getAttribute('atribute-one');
    $item['category'][] = $element->getAttribute('attribute-two');
    $item['name'][] = $element->getAttribute('attribute-three');        

}

$sales = $xpath->query(".//small[@class='sale-count']", $list_item);

foreach($sales as $sale)
    $item['sale'][] = substr($sale->textContent, 0, strpos($sale->textContent, ' '));
 }

Do I need to constantly re-query to work my down the hierarchy, or is there a simpler way to accomplish this?

EDIT
So it seems I am indeed using too many foreach loops. For every one I take out, I am save a ton of memory. So my question becomes.

One I have parent element (in this case the <li>), is there not a way to pick elements and attributes out without re-querying and looping through the results? I need to eliminate as many of these xpath subqueries, and foreach loops as I can.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T02:41:54+00:00Added an answer on June 14, 2026 at 2:41 am

    Sure, you could use DOMElement::getElementsByTagName() instead:

    $images = $list_item->getElementsByTagName( 'img');
    

    As for which is more efficient, you’d have to benchmark it. You have the speed comparison between a relative XPath query, or a preorder traversal of the <li>‘s node tree.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

PLEASE SEE THE EDIT BELOW, THE REPORT SEEMS TO USE CACHED DATA? I cant
Please see bottom edit for where I am currently at, thank you. I have
Edit - please see bottom. I've got a primefaces calendar component on my page.
[EDIT: Problem solved. Please see my answer below.] In my app I call the
Please see below: I tried using Absolute layout, but that's deprecated. I appreciate your
Please see the code snippet below : #include <iostream> using namespace std; int main()
(Please see update at bottom) I've looked at dozens of questions and haven't been
Edit Please see also How do I properly clean up Excel interop objects? .
Please see this image: Can someone explain the difference? Edit Let me indicate what
EDIT Thanks for the prompt responses. Please see what the real question is. I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.