I have been coding for a while now but just can’t seem to get

Question

0

Asked: May 30, 20262026-05-30T02:05:59+00:00 2026-05-30T02:05:59+00:00

I have been coding for a while now but just can’t seem to get

0

I have been coding for a while now but just can’t seem to get my head around regular expressions.

This brings me to my question which is the following: is it bad practice to use PHP’s explode for breaking up a string of html code to select bits of text? I need to scrape a page for various bits of information and due to my horrific regex knowledge (In a full software engineering degree I had to write maybe one….) I decided upon using explode().

I have provided my code below so someone more seasoned than me can tell me if it’s essential that I use regex for this or not!

public function split_between($start, $end, $blob)
{
    $strip = explode($start,$blob);
    $strip2 = explode($end,$strip[1]);
    return $strip2[0];
}

public function get_abstract($pubmed_id)
{
    $scrapehtml = file_get_contents("http://www.ncbi.nlm.nih.gov/m/pubmed/".$pubmed_id);
    $data['title'] = $this->split_between('<h2>','</h2>',$scrapehtml);
    $data['authors'] = $this->split_between('<div class="auth">','</div>',$scrapehtml);
    $data['journal'] = $this->split_between('<p class="j">','</p>',$scrapehtml);
    $data['aff'] = $this->split_between('<p class="aff">','</p>',$scrapehtml);
    $data['abstract'] = str_replace('<p class="no_t_m">','',str_replace('</p>','',$this->split_between('<h3 class="no_b_m">Abstract','</div>',$scrapehtml)));
    $strip = explode('<div class="ids">', $scrapehtml);
    $strip2 = explode('</div>', $strip[1]);
    $ids[] = $strip2[0];
    $id_test = strpos($strip[2],"PMCID");
    if (isset($strip[2]) && $id_test !== false)
    {
        $step = explode('</div>', $strip[2]);
        $ids[] = $step[0];
    }
    $id_count = 0;
    foreach ($ids as &$value) {
        $value = str_replace("<h3>", "", $value);
        $data['ids'][$id_count]['id'] = str_replace("</h3>", "", str_replace('<span>','',str_replace('</span>','',$value)));
        $id_count++;
    }

    $jsonAbstract = json_encode($data);

    echo $this->indent($jsonAbstract);
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T02:06:00+00:00

I highly recommend you try out the PHP Simple HTML DOM Parser library. It handles invalid HTML and has been designed to solve the same problem you’re working on.

A simple example from the documentation is as follows:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images 
foreach($html->find('img') as $element) 
       echo $element->src . '<br>';

// Find all links 
foreach($html->find('a') as $element) 
       echo $element->href . '<br>';

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have been coding for a while now but just can’t seem to get

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply