I have a string vector which contains html tags e.g
abc<-""welcome <span class=\"r\"><a href=\"abc\">abc</a></span> Have fun!""
I want to remove these tags and get follwing vector
e.g
abc<-"welcome Have fun"
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Try
what this says is ‘substitute every instance of < followed by anything that isnt a > up to a > with nothing”
You cant just do
gsub("<.*>","",abc)because regexps are greedy, and the .* would match up to the last > in your text (and you’d lose the ‘abc’ in your example).This solution might fail if you’ve got > in your tags – but is
<foo class=">" >legal? Doubtless someone will come up with another answer that involves parsing the HTML with a heavyweight XML package.