here what i want to do : i have a string containing HTML tags and i want to cut it using the wordwrap function excluding HTML tags.
I’m stuck :
public function textWrap($string, $width)
{
$dom = new DOMDocument();
$dom->loadHTML($string);
foreach ($dom->getElementsByTagName('*') as $elem)
{
foreach ($elem->childNodes as $node)
{
if ($node->nodeType === XML_TEXT_NODE)
{
$text = trim($node->nodeValue);
$length = mb_strlen($text);
$width -= $length;
if($width <= 0)
{
// Here, I would like to delete all next nodes
// and cut the current nodeValue and finally return the string
}
}
}
}
}
I’m not sure i’m doing it in the right way at the moment. I hope it’s clear…
EDIT :
Here an example. I have this text
<p>
<span class="Underline"><span class="Bold">Test to be cut</span></span>
</p><p>Some text</p>
Let’s say I want to cut it at the 6th character, I would like to return this :
<p>
<span class="Underline"><span class="Bold">Test to</span></span>
</p>
As I wrote in a comment, you first need to find the textual offset where to do the cut.
First of all I setup a
DOMDocumentcontaining the HTML fragment and then selecting the body which represents it in the DOM:Then I use my
TextRangeclass to find the place where the cut needs to be done and I use theTextRangeto actually do the cut and locate theDOMNodethat should become the last node of the fragment:This regular expression finds the offset where to cut things in the textual representation made available by
$range. The regex pattern is inspired by another answer which discusses it more detailed and has been slightly modified to fit this answers needs.As it can be possible that there is nothing to cut (e.g. the
bodywill become empty), I need to deal with that special case. Otherwise – as noted in the comment – all following nodes need to be removed:The rest is straight forward: Query the xpath, remove the nodes and output the result:
The full code example is available on viper codepad incl. the
TextRangeclass. The codepad has a bug so it’s result is not properly (Related: XPath query result order). The actual output is the following:So take care you have a current libxml version (normally the case) and the output
foreachat the end makes use of a PHP functionsaveHTMLwhich is available with that parameter since PHP 5.3.6. If you don’t have that PHP version, take some alternative like outlined in How to get the xml content of a node as a string? or a similar question.When you closely look in my example code you might notice that the cut length is quite large (
$width = 17;). That is because there are many whitespace characters in front of the text. This could be tweaked by making the regular expression drop any number of whitespace in fron t of it and/or by trimming theTextRangefirst. The second option does need more functionality, I wrote something quick that can be used after creating the initial range:That would remove the needless whitespace on left and right inside your HTML fragment. The
TextRangeTrimmercode is the following:Hope this is helpful.