When capturing the content of a webpage by CURL or file_get_contents , What is

Question

0

Asked: May 26, 20262026-05-26T17:21:32+00:00 2026-05-26T17:21:32+00:00

When capturing the content of a webpage by CURL or file_get_contents , What is

0

When capturing the content of a webpage by CURL or file_get_contents, What is the easiest way to remove inline javascrip codes. I am thinking of regex to remove everything between tags; but regex is not a reliable method for this purpose.

Is there a better way to parse an html page (just removing javascript codes)? If regex is still the best option, what is the most reliable command to do so?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T17:21:32+00:00

Editorial Team

2026-05-26T17:21:32+00:00Added an answer on May 26, 2026 at 5:21 pm

You can make use of DOMDocument and its removeChild() function. Something like the following should get you going.

<?php

$doc = new DOMDocument;
$doc->load('index.html');

$page = $doc->documentElement;

// we retrieve the chapter and remove it from the book
$scripts = $page->getElementsByTagName('script');
foreach($scripts as $script) {
   $page->removeChild($script);
}

echo $doc->saveHTML();
?>

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When capturing the content of a webpage by CURL or file_get_contents , What is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply