If there are other classes written to do this, a link would be awesome.

Question

0

Asked: May 27, 20262026-05-27T04:34:07+00:00 2026-05-27T04:34:07+00:00

If there are other classes written to do this, a link would be awesome.

0

If there are other classes written to do this, a link would be awesome. If not, how can I do it with PHPCrawl?

Is it possible to store specific information from a crawled site based upon a set of rules specific to the site? Ex., [div.wantThis, img#defaultPicture] is the array returned for site A and only [div.shortTextContent] is the array returned for site B?

In PHPCrawl, how can I get this information out of the $page_data array?

Needs

Must be able to target only certain elements.

Able to read the data storage rule from a variable (which could be an array specifying the element(s) to target).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T04:34:07+00:00

What you are asking is how to parse specific content from site A and some other specific content from site B using PHPCrawl.

For site specific parsing style following if-else approach can be followed:

for url in urls:
    content = crawl(url)
    if(url of type 1?):
        extract_style1(content)
    else-if(url of type 2?):
        extract_style2(content)
    else:
        extract_styledefault(content)

For specific content extracting following algo can be used:

Note: There are spectrum of parsing techniques avaliable, I am implmeneting HTML DOM Parsing here..

// Create DOM from your PHP Crawl Data Source
$html = $page_data[source]

// Find all images 
foreach($html->find('img') as $element) 
       echo $element->src . '<br>';

// Find all links 
foreach($html->find('a') as $element) 
       echo $element->href . '<br>';

Reference:

HTML DOM
PHPCrawl Example

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

If there are other classes written to do this, a link would be awesome.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply