I’m merging two HTML files together, and as such, they have duplicate <head> </head>, <html> </html> and <body> </body> tags. Is there a way to get preg_replace remove only the second batch of duplicate tags, so the content of the second document blend together without any problems?
If not with preg_replace, is there another way of doing this?
Conceptual Information:
In this instance, there are two files. There will be more eventually.
Each file starts off like this:
<html>
<head>
<style type='text/css'>
(Template Data)
</style>
</head>
<body>
My script is taking those files (which live in some directory), and creating a NEW file that combines both outputs. However, the result of this is something along the lines of:
<html>
<head>
<style type='text/css'>
(Template Data)
</style>
</head>
<body>
blah blah blah
<html>
<head>
<style type='text/css'>
(Template Data)
</style>
</head>
<body>
blah blah blah 2
This creates duplicate tags. The desired output would be:
<html>
<head>
<style type='text/css'>
(Template Data)
</style>
</head>
<body>
Blah blah blah
Blah blah blah 2
Essentially cutting out the head data for all of the HTML files outside of the first one processed through a while loop
Thanks so much!
You can just apply the tag removal on the second HTML before you merge it, then merge the first HTML with the stripped second HTML.
Here’s a pseudocode example if you have more HTMLs to merge: