A PHP regex / PHP DOM / PHP XPath question.
Given the following HTML with inline CSS:
<p style='text-indent: 22px; font-weight: bold; line-height: 1em; color: #FFF'>
How do I remove the ‘line-height’ and ‘color’ CSS properties, and leave text-indent and font-weight untouched, so the resultant HTML is:
<p style='text-indent: 22px; font-weight: bold;'>
The HTML file could be potentially hundreds of lines, with various nesting of tags and other attributes applied to any tag.
Note that the ‘style’ attribute may be applied to other tags than <p>
I am aware there are approaches using both PHP DOM and regex – my current thinking was using something along these lines:
$elements = $xPath->query('//*[@style="color"]');
foreach ($elements as $element) {
//remove style='color'
}
Many thanks
EDIT
Here’s my solution:
https://github.com/sabberworm/PHP-CSS-Parser
To create:
$dom = new DOMDocument;
@$dom->loadHTML('<?xml encoding="UTF-8">' . $html);
$xPath = new DOMXPath($dom);
$elements = $xPath->query('//p|//span');
foreach($elements as $element){
$oParser = new CSSParser("p{" . $element->getAttribute('style') . "}");
$oCss = $oParser->parse();
foreach($oCss->getAllRuleSets() as $oRuleSet) {
$oRuleSet->removeRule('line-');
$oRuleSet->removeRule('margin-');
$oRuleSet->removeRule('font-');
}
$css = $oCss->__toString();
$css = substr_replace($css, '', 0, 3);
$css = substr_replace($css, '', -1, 1);
$element->setAttribute('style', $css);
}
$src = $dom->saveHTML();
Definitely use proper HTML and CSS parsers rather than regexes. For the XPath query, use the
containsfunction to find the nodes to alter:Then use a CSS parser to remove the properties you don’t want.