lets say I have an html document
how can I remove every thing from the document
I want to remove the HTML tags
I want to remove any special character
I want to remove everything except letters
and extract the text
Thanks
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
You can use strip_tags and preg_replace to accomplish this:
[^a-z]will match any character other than A to Z, the+sign specifies that it should match any sequence length of such characters and the/i-modifier specifies that it’s a case insensitive search. All matched characters will be replaced with an empty string leaving only the characters left.If you want to keep spaces you can use
[^a-z ]instead and if you want to keep numbers as well[^a-z0-9 ]. This allows you to whitelist all allowed characters and discard the rest.