do you know if there is any function (PHP) which clean up some HTML code (got with cURL) and filter the visible text (the one the browser is going to show).
Thanks
do you know if there is any function (PHP) which clean up some HTML
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
This is harder than you’d think. An obvious simple solution is to run strip_tags() over it, but that would simply remove tags and leave all text content intact, including embedded javascript and CSS, as well as all text inside elements that are normally hidden (e.g. by setting
display: noneon them). You could try some regex magic to filter out the parts you’re not interested in, but regular expressions on HTML are generally a bad idea for anything nontrivial. The ultimate solution is, I’m afraid, to use a proper HTML parser and then pull the actual text out of the resulting DOM tree – by the time you have that, you’ll be pretty close to implementing a web browser.