I have a String variable containing text file (such as .html) using fopen()
and next im going to strip_tags() so that i can use that untaged text for an article preview, but before that, I need to get h1 nodeValue, and as well count characters of it, so i can replace the zero in the code below with that value, and end with 150+ that value.
$f = fopen($filepath,"r");
$WholeFile = fread($f, filesize($filepath));
fclose($f);
$StrippedFile=strip_tags($WholeFile);
$TextExtract = mb_substr("$StrippedFile", 0,150);
What is the best way for me to go?
Is a parser the answer? Since this is only situation [so far] I will be extracting values from html tags
If you are certain of the content of the file you are processing, and know that the title is in H1, you could potentially slice the string you are getting at the
</h1>location (usingstrstr()for example although there are a plethora of ways to do that), into two strings.You can then strip tags on the first one to get the title and strip tags on the second one to get the content. This is assuming your file ONLY has a single h1 containing the title, before the dom element that contains the content of the article.
Keep in mind this is not the best way to parse a wide range of articles online, for a more general solution I’d look into a dedicated parser class.
Here is a code sample :
Code sample