I’m working in PHP and I want to create a function that, given a text of arbitrary length and height, returns a restricted version of the same text with a maximum of 500 characters and 10 lines.
This is what I have so far:
function preview($str)
{
$partialPreview = explode("\n", substr($str, 0, 500));
$partialPreviewHeight = count($partialPreview);
$finalPreview = "";
// if it has more than 10 lines
if ($partialPreviewHeight > 10) {
for ($i = 0; $i < 10; $i++) {
$finalPreview .= $partialPreview[$i];
}
} else {
$finalPreview = substr($str, 0, 500);
}
return $finalPreview;
}
I have two questions:
- Is using
\nproper to detect new line feeds? I know that some
systems use\n, other\r\nand others\r, but\nis the most
common. - Sometimes, if there’s an HTML entity like
"(quotation mark) at
the end, it’s left as", and therefore it’s not valid HTML. How
can I prevent this?
It depends where the data is coming from. Different operating systems have different line breaks.
Windows uses
\r\n, *nix (including mac OS) uses\n, (very) old macs used\r. If the data is coming from the web (e.g. a textarea) it will (/ should) always be\r\n. Because that’s what the spec states user agents should do.Before cutting the text you may want to convert html entities back to normal text. By using either
htmlspecialchars_decode()orhtml_entity_decodedepending on your needs. Now you won’t have the problem of breaking the entities (don’t forget to encode it again if needed).Another option would be to only break the text on whitespace characters rather than a hard character limit. This way you will only have whole words in your “summary”.
I’ve created a class which should deal with most issues. As I already stated when the data is coming from a textarea it will always be
\r\n, but to be able to parse other linebreaks I came up with something like the following (untested):